A Curated Corpus for Sentiment-Topic Analysis

Emmanuel Ebuka Ibeke, Chenghua Lin, Christopher David Coe, Adam Zachary Wyner, Dong Liu, Mohamad Hardyman Bin Barawi, Noor Fazilla Abd Yusof

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

57 Downloads (Pure)


There has been a rapid growth of research interest in natural language processing that seeks to better understand sentiment or opinion
expressed in text. However, most research focus on developing new models for opinion mining, with little efforts being devoted to
the development of curated datasets for training and evaluation of these models. This work provides a manually annotated corpus of
customer reviews, which has two unique characteristics. First, the corpus captures sentiment and topic information at both the review and
sentence levels. Second, it is time-variant, which preserves the sentiment and topic dynamic information of the reviews. The annotation
process was performed in a two-stage approach by three independent annotators, achieving a substantial level of inter-annotator agreements.
In another set of experiments, we performed supervised sentiment classification using our manual annotations as gold-standard.
Experimental results show that both Naive Bayes model and Support Vector Machine achieved more than 92% accuracy on the task of
polarity classification. We hypothesise that this corpus could serve as a benchmark to facilitate training and experimentation in a broad
range of opinion mining tasks.
Original languageEnglish
Title of host publicationProceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis”
EditorsJ. Fernando Sánchez-Rada , Björn Schuller
Number of pages8
Publication statusPublished - 23 May 2016


  • Opinion mining
  • Sentiment and Topic analysis
  • Annotation guidelines


Dive into the research topics of 'A Curated Corpus for Sentiment-Topic Analysis'. Together they form a unique fingerprint.

Cite this