START Conference Manager    

Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

Weiwei Guo, Hao Li, Heng Ji and Mona Diab

The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013


Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. %Adapting these NLP tools to short texts becomes a pressing need due to the pervasive presence of social media data such as Twitter messages. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires a robust modeling and understanding of the semantics of short text data tweets.

The contribution of the paper is two-fold: 1.\ we introduce the Linking-Tweets-to-News task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2.\ in contrast to previous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines for three evaluation metrics in the new task.

START Conference Manager (V2.61.0 - Rev. 2792M)