Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison

Kyumars Sheykh Esmaili and Shahin Salavati

The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013


Resource scarcity along with diversity --both in dialect and script-- are the two primary challenges in Kurdish language processing. In this paper we aim at addressing these two problems by (i) building a text corpus for Sorani and Kurmanji, the two main dialects of Kurdish, and (ii) highlighting a range of statistical (as well as rule-based) differences between these two dialects and their writing systems.

