START Conference Manager    

Accurate Word Segmentation using Transliteration and Language Model Projection

Masato Hagiwara and Satoshi Sekine

The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013


Transliterated compound nouns not separated by whitespaces pose difficulty on word segmentation (WS). Offline approaches have been proposed to split them using word statistics, but they rely on static lexicon, limiting their use. We propose an online approach, integrating source LM, and/or, back-transliteration and English LM. The experiments on Japanese and Chinese WS have shown that the proposed models achieve significant improvement over state-of-the-art, reducing 16% errors in Japanese.

START Conference Manager (V2.61.0 - Rev. 2792M)