Sequential annotation and chunking of Chinese discourse structure

Published in SIGHAN Workshop on Chinese Language Processing @ ACL, 2015

Frances Yung, Kevin Duh, Yuji Matsumoto

Download paper here

We propose a linguistically driven approach to represent discourse relations in Chinese text as sequences. We observe that certain surface characteristics of Chinese texts, such as the order of clauses, are overt markers of discourse structures, yet existing annotation proposals adapted from formalism constructed for English do not fully incorporate these characteristics. We present an annotated resource consisting of 325 articles in the Chinese Treebank. In addition, using this annotation, we introduce a discourse chunker based on a cascade of classifiers and report 70% top-level discourse sense accuracy.