SLIDE 1
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1367–1375, Uppsala, Sweden, 11-16 July 2010. c 2010 Association for Computational Linguistics
A Unified Graph Model for Sentence-based Opinion Retrieval
Binyang Li, Lanjun Zhou, Shi Feng, Kam-Fai Wong Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong {byli, ljzhou, sfeng, kfwong}@se.cuhk.edu.hk Abstract
There is a growing research interest in opinion retrieval as on-line users’ opinions are becom- ing more and more popular in business, social networks, etc. Practically speaking, the goal of
- pinion retrieval is to retrieve documents,
which entail opinions or comments, relevant to a target subject specified by the user’s query. A fundamental challenge in opinion retrieval is information representation. Existing research focuses on document-based approaches and documents are represented by bag-of-word. However, due to loss of contextual information, this representation fails to capture the associa- tive information between an opinion and its corresponding target. It cannot distinguish dif- ferent degrees of a sentiment word when asso- ciated with different targets. This in turn se- riously affects opinion retrieval performance. In this paper, we propose a sentence-based ap- proach based on a new information representa- tion, namely topic-sentiment word pair, to cap- ture intra-sentence contextual information be- tween an opinion and its target. Additionally, we consider inter-sentence information to cap- ture the relationships among the opinions on the same topic. Finally, the two types of infor- mation are combined in a unified graph-based model, which can effectively rank the docu-
- ments. Compared with existing approaches,
experimental results on the COAE08 dataset showed that our graph-based model achieved significant improvement.
1 Introduction
In recent years, there is a growing interest in sharing personal opinions on the Web, such as product reviews, economic analysis, political polls, etc. These opinions cannot only help inde- pendent users make decisions, but also obtain valuable feedbacks (Pang et al., 2008). Opinion
- riented research, including sentiment classifica-
tion, opinion extraction, opinion question ans- wering, and opinion summarization, etc. are re- ceiving growing attention (Wilson, et al., 2005; Liu et al., 2005; Oard et al., 2006). However, most existing works concentrate on analyzing
- pinions expressed in the documents, and none
- n how to represent the information needs re-
quired to retrieve opinionated documents. In this paper, we focus on opinion retrieval, whose goal is to find a set of documents containing not only the query keyword(s) but also the relevant opi-
- nions. This requirement brings about the chal-
lenge on how to represent information needs for effective opinion retrieval. In order to solve the above problem, previous work adopts a 2-stage approach. In the first stage, relevant documents are determined and ranked by a score, i.e. tf-idf value. In the second stage, an opinion score is generated for each relevant document (Macdonald and Ounis, 2007; Oard et al., 2006). The opinion score can be acquired by either machine learning-based sentiment classifi- ers, such as SVM (Zhang and Yu, 2007), or a sentiment lexicons with weighted scores from training documents (Amati et al., 2007; Hannah et al., 2007; Na et al., 2009). Finally, an overall score combining the two is computed by using a score function, e.g. linear combination, to re-rank the retrieved documents. Retrieval in the 2-stage approach is based on document and document is represented by bag-of-word. This representation, however, can
- nly ensure that there is at least one opinion in
each relevant document, but it cannot determine the relevance pairing of individual opinion to its
- target. In general, by simply representing a
document in bag-of-word, contextual informa- tion i.e. the corresponding target of an opinion, is
- neglected. This may result in possible mismatch