START Conference Manager    

Ensemble Reranking with Linguistic and Semantic Features for Arabic Character Recognition

Nadi Tomeh, Nizar Habash, Ryan Roth, Noura Farra, Pradeep Dasigi and Mona Diab

The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013


Extant optical character recognition (OCR) systems for Arabic rely on information contained in the scanned images to recognize sequences of characters and on language models to emphasize fluency. In this paper we incorporate linguistically and semantically motivated features to an existing OCR system. To do so we follow an n-best list reranking approach that exploits recent advances in learning to rank techniques.

We achieve 10.1% and 11.4% reduction in recognition word error rate (WER) relative to the baseline system on typewritten and handwritten Arabic respectively.

START Conference Manager (V2.61.0 - Rev. 2792M)