Accurate Word Segmentation using Transliteration and Language Model Projection
Masato Hagiwara and Satoshi Sekine
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
Transliterated compound nouns not separated by whitespaces pose difficulty on word segmentation (WS). Offline approaches have been proposed to split them using word statistics, but they rely on static lexicon, limiting their use. We propose an online approach, integrating source LM, and/or, back-transliteration and English LM. The experiments on Japanese and Chinese WS have shown that the proposed models achieve significant improvement over state-of-the-art, reducing 16% errors in Japanese.
Conference Manager (V2.61.0 - Rev. 2792M)