Event Date: Tuesday, 19 July, 2016
Location: Via Santa Maria, 36, Pisa, PI, Italia [2nd floor seminar room]
Speaker: Chu-Ren Huang (Hong Kong Polytechnic University)
Title: What You Need to Know about Chinese for Chinese Language Processing
Abstract: In this condensed version of the ACL2015 tutorial to introduce essential knowledge of Chinese linguistics for Chinese language processing, I will focus on the Chinese writing system and the lemmatization delimma. An ontology driven writing system which has remained stable allows sub-lexical semantic processing but also has interesting consequences in the morpho-lexical properties of the language. The existence of multiple structure dependent devices for instantiation of compounds renders comprehensive lemmatization impossible without parsing, hence the dilemma. To conclude, I will briefly introduce our recent work on Chinese synaesthesia to demonstrate how comprehensive account of a single phonomenon depend on synergizing knowledge of different linguitic modules.
Selected References
Huang, Chu-Ren, Barbara Meisterernst, and Zhuo Jing-Schmidt. (Eds.). In Preparation (2017). Routledge Handbook on Chinese Applied Linguistics. London: Routledge.
Huang, Chu-Ren, Shu-Kai Hsieh, and Keh-Jiann Chen. In Preparation (2016). Mandarin Chinese Words and Parts of Speech: A corpus-based study. London: Routledge
Huang, Chu-Ren, and Dingxu Shi. 2016. A Reference Grammar of Chinese. Cambridge: Cambridge University Press.
Huang, Chu-Ren, and Sophia Y. M. Lee. 2013. Eds. Special Issues on Ontology and Chinese Language Processing. Contemporary Linguistics《当代语言学》2013.2-3.
Huang, Chu-Ren and Nianwen Xue. 2012. Words without Boundaries : Computational Approaches to Chinese Word Segmentation. Language and Linguistics Compass. Vol. 6 Issue 8. Pp. 494-505.
Huang, Chu-Ren, Keh-jiann Chen and Benjamin K. T’sou. 1996. Readings in Chinese Natural Language Processing. Journal of Chinese Linguistics Monograph Series No. 9. Berkeley: Journal of Chinese Linguistics.
Lu, Qin, Nianwen Xue, and Chu-Ren Huang. In Preparation. Chinese Language Processing. SNLP book series. Cambridge: Cambridge University Press.
Song, Zuoyan and Chu-Ren Huang. In Press (2016). Generative Lexicon Studies in Chinese. Beijing: Commercial Press. 宋作艳,黄居仁。编着。待刊。生成词汇理论与汉语研究。北京:商务印书馆。
Resources
Huang, Chu-Ren. 2009. Tagged Chinese Gigaword Version 2.0. Philadelphia: Lexical Data Consortium. University of Pennsylvania. ISBN 1-58563-516-2
Sinica Corpus: Academia Sinica Balanced Corpus for Mandarin Chinese. http://www.sinica.edu.tw/SinicaCorpus
Sinica BOW: Academia Sinica Bilingual Ontological Wordnet http://BOW.sinica.edu.tw
Sinica TreeBank http://TreeBank.sinica.edu.tw/
Chinese Wordnet 2005. http://cwn.ling.sinica.edu.tw
Hantology 漢字知識本體. 2006 hantology.ling.sinica.edu.tw