START Conference Manager    

Utterance-Level Multimodal Sentiment Analysis

Veronica Perez-Rosas, Rada Mihalcea and Louis-Philippe Morency

The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013


During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.

START Conference Manager (V2.61.0 - Rev. 2792M)