Sentiment Analysis in Twitter with Lightweight Discourse Analysis

We propose a lightweight method for using discourse relations for polarity detection of tweets. This method is targeted towards the web-based applications that deal with noisy, unstructured text, like the tweets, and cannot afford to use heavy linguistic resources like parsing due to frequent failure of the parsers to handle noisy data. Most of the works in micro-blogs, like Twitter, use a bag-of-words model that ignores the discourse particles like but, since, althoug etc. In this work, we show how the discourse relations like the connectives and conditionals can be used to incorporate discourse information in any bag-of-words model, to improve sentiment classification accuracy. We also probe the influence of the semantic operators like modals and negations on the discourse relations that affect the sentiment of a sentence. Discourse relations and corresponding rules are identified with minimal processing - just a list look up. We first give a linguistic description of the various discourse relations which leads to conditions in rules and features in SVM. We show that our discourse-based bag-of-words model performs well in a noisy medium (Twitter), where it performs better than an existing Twitter-based application. Furthermore, we show that our approach is beneficial to structured reviews as well, where we achieve a better accuracy than a state-of-the-art system in the travel review domain. Our system compares favorably with the state-of-the-art systems and has the additional attractiveness of being less resource intensive.


  • Subhabrata Mukherjee and Pushpak Bhattacharyya.
    Proc. of the 24th International Conference on Computational Linguistics (COLING). 2012.


Dataset used in the COLING 2012 paper: