START Conference Manager    

Automatically Predicting Sentence Translation Difficulty

Abhijit Mishra and Pushpak Bhattacharyya

The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013


In this paper we introduce Translation Difficulty Index (TDI), a measure of difficulty in text translation. We first define and quantify translation difficulty in terms of TDI. We realize that any measure of TDI based on direct input by translators is fraught with subjectivity and ad-hocism. We, rather, rely on cognitive evidence from eye tracking. TDI is measured as the sum of fixation (eye gaze) and saccade times of the eye. We then establish that TDI is correlated with three properties of the input sentence, viz. length (L), degree of polysemy (DP) and structural complexity (SC). We train a Support Vector Regression (SVR) system to predict TDIs for new sentences using these features as input. The prediction done by our framework is well correlated with the empirical gold standard data, which is a repository of <L,DP, SC> and TDI pairs for a set of sentences. The primary use of our work is a way of "binning" sentences (to be translated) in "easy", "medium" and "hard" categories as per their predicted TDI. This can decide pricing of translation task for \emph{resource generation for Machine Translation systems} through translation crowdsourcing/outsourcing. This can also provide a way of monitoring progress of second language learners.

START Conference Manager (V2.61.0 - Rev. 2792M)