SenseSpotting: Never let your parallel data tie you to an old domain
Marine Carpuat, Hal Daume III, Katie Henry, Ann Irvine, Jagadeesh Jagarlamudi and Rachel Rudinger
The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013
Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SenseSpotting, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a gold-standard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has \never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.
Conference Manager (V2.61.0 - Rev. 2792M)