START Conference Manager    

Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Wenbin Jiang, Meng Sun, Yajuan Lv, Yating Yang and Qun Liu

The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013


Structural information in web text provides natural annotations for NLP problems such as word segmentation and parsing. In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet. It utilizes the Internet as an external corpus with massive (although slight and sparse) natural annotations, and enables a classifier to evolve on the large-scaled and real-time updated web text. With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves significant improvement on a series of testing sets from different domains, even with a single classifier and local features.

START Conference Manager (V2.61.0 - Rev. 2792M)