A Unified Graph Model for Sentence-based Opinion Retrieval Binyang - - PDF document

▶

Feb 07, 2024 230 likes •340 views

A Unified Graph Model for Sentence-based Opinion Retrieval Binyang Li, Lanjun Zhou, Shi Feng, Kam-Fai Wong Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong {byli, ljzhou, sfeng,

SLIDE 1

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1367–1375, Uppsala, Sweden, 11-16 July 2010. c 2010 Association for Computational Linguistics

A Unified Graph Model for Sentence-based Opinion Retrieval

Binyang Li, Lanjun Zhou, Shi Feng, Kam-Fai Wong Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong {byli, ljzhou, sfeng, kfwong}@se.cuhk.edu.hk Abstract

There is a growing research interest in opinion retrieval as on-line users’ opinions are becom- ing more and more popular in business, social networks, etc. Practically speaking, the goal of

pinion retrieval is to retrieve documents,

which entail opinions or comments, relevant to a target subject specified by the user’s query. A fundamental challenge in opinion retrieval is information representation. Existing research focuses on document-based approaches and documents are represented by bag-of-word. However, due to loss of contextual information, this representation fails to capture the associa- tive information between an opinion and its corresponding target. It cannot distinguish dif- ferent degrees of a sentiment word when asso- ciated with different targets. This in turn se- riously affects opinion retrieval performance. In this paper, we propose a sentence-based ap- proach based on a new information representa- tion, namely topic-sentiment word pair, to cap- ture intra-sentence contextual information be- tween an opinion and its target. Additionally, we consider inter-sentence information to cap- ture the relationships among the opinions on the same topic. Finally, the two types of infor- mation are combined in a unified graph-based model, which can effectively rank the docu-

ments. Compared with existing approaches,

experimental results on the COAE08 dataset showed that our graph-based model achieved significant improvement.

1 Introduction

In recent years, there is a growing interest in sharing personal opinions on the Web, such as product reviews, economic analysis, political polls, etc. These opinions cannot only help inde- pendent users make decisions, but also obtain valuable feedbacks (Pang et al., 2008). Opinion

riented research, including sentiment classifica-

tion, opinion extraction, opinion question ans- wering, and opinion summarization, etc. are re- ceiving growing attention (Wilson, et al., 2005; Liu et al., 2005; Oard et al., 2006). However, most existing works concentrate on analyzing

pinions expressed in the documents, and none
n how to represent the information needs re-

quired to retrieve opinionated documents. In this paper, we focus on opinion retrieval, whose goal is to find a set of documents containing not only the query keyword(s) but also the relevant opi-

nions. This requirement brings about the chal-

lenge on how to represent information needs for effective opinion retrieval. In order to solve the above problem, previous work adopts a 2-stage approach. In the first stage, relevant documents are determined and ranked by a score, i.e. tf-idf value. In the second stage, an opinion score is generated for each relevant document (Macdonald and Ounis, 2007; Oard et al., 2006). The opinion score can be acquired by either machine learning-based sentiment classifi- ers, such as SVM (Zhang and Yu, 2007), or a sentiment lexicons with weighted scores from training documents (Amati et al., 2007; Hannah et al., 2007; Na et al., 2009). Finally, an overall score combining the two is computed by using a score function, e.g. linear combination, to re-rank the retrieved documents. Retrieval in the 2-stage approach is based on document and document is represented by bag-of-word. This representation, however, can

nly ensure that there is at least one opinion in

each relevant document, but it cannot determine the relevance pairing of individual opinion to its

target. In general, by simply representing a

document in bag-of-word, contextual informa- tion i.e. the corresponding target of an opinion, is

neglected. This may result in possible mismatch

between an opinion and a target and in turn af- fects opinion retrieval performance. By the same token, the effect to documents consisting of mul- 1367

SLIDE 2

tiple topics, which is common in blogs and

n-line reviews, is also significant. In this setting,

even if a document is regarded opinionated, it cannot ensure that all opinions in the document are indeed relevant to the target concerned. Therefore, we argue that existing information representation i.e. bag-of-word, cannot satisfy the information needs for opinion retrieval. In this paper, we propose to handle opinion re- trieval in the granularity of sentence. It is ob- served that a complete opinion is always ex- pressed in one sentence, and the relevant target

f the opinion is mostly the one found in it.

Therefore, it is crucial to maintain the associative information between an opinion and its target within a sentence. We define the notion of a top- ic-sentiment word pair, which is composed of a topic term (i.e. the target) and a sentiment word (i.e. opinion) of a sentence. Word pairs can maintain intra-sentence contextual information to express the potential relevant opinions. In addi- tion, inter-sentence contextual information is also captured by word pairs to represent the relation- ship among opinions on the same topic. In prac- tice, the inter-sentence information reflects the degree of a word pair. Finally, we combine both intra-sentence and inter-sentence contextual in- formation to construct a unified undirected graph to achieve effective opinion retrieval. The rest of the paper is organized as follows. In Section 2, we describe the motivation of our

approach. Section 3 presents a novel unified

graph-based model for opinion retrieval. We evaluated our model and the results are presented in Section 4. We review related works on opi- nion retrieval in Section 5. Finally, in Section 6, the paper is concluded and future work is sug- gested.

2 Motivation

In this section, we start from briefly describing the objective of opinion retrieval. We then illu- strate the limitations of current opinion retrieval approaches, and analyze the motivation of our method. 2.1 Formal Description of Problem Opinion retrieval was first presented in the TREC 2006 Blog track, and the objective is to retrieve documents that express an opinion about a given target. The opinion target can be a “tradi- tional” named entity (e.g. a name of person, lo- cation, or organization, etc.), a concept (e.g. a type of technology), or an event (e.g. presidential election). The topic of the document is not re- quired to be the same as the target, but an opi- nion about the target has to be presented in the document or one of the comments to the docu- ment (Macdonald and Ounis, 2006). Therefore, in this paper we regard the information needs for

pinion retrieval as relevant opinion.

2.2 Motivation of Our Approach In traditional information retrieval (IR) bag-of-word representation is the most common way to express information needs. However, in

pinion retrieval, information need target at re-

levant opinion, and this renders bag-of-word re- presentation ineffective. Consider the example in Figure 1. There are three sentences A, B, and C in a document di. Now given an opinion-oriented query Q related to ‘Avatar’. According to the conventional 2-stage

pinion

retrieval approach, di is represented by a bag-of-word. Among the words, there is a topic term Avatar (t1) occurring twice, i.e. Avatar in A and Avatar in C, and two senti- ment words comfortable (o1) and favorite (o2) (refer to Figure 2 (a)). In order to rank this doc- ument, an overall score of the document di is computed by a simple combination of the rele- vant score ( ) and the opinion score (), e.g. equal weighted linear combination, as follows. For simplicity, we let

, and

be computed by using lexicon-based

method: ℎ ℎ. Figure 1: A retrieved document di on the target ‘Avatar’. Although bag-of-word representation achieves good performance in retrieving relevant docu- ments, our study shows that it cannot satisfy the information needs for retrieval of relevant opi-

nion. It suffers from the following limitations:

(1) It cannot maintain contextual information; thus, an opinion may not be related to the target

f the retrieved document is neglected. In this

example, only the opinion favorite (o2) on Avatar in C is the relevant opinion. But due to loss of contextual information between the opinion and its corresponding target, Avatar in A and com-

A. 阿凡达明日将在中国上映。

Tomorrow, Avatar will be shown in China.

B. 我预订到了IMAX 影院中最舒服的位子。

I’ve reserved a comfortable seat in IMAX.

C. 阿凡达是我最喜欢的一部 3D 电影。

Avatar is my favorite 3D movie.

1368

SLIDE 3

fortable (o1) are also regarded as relevant opi- nion mistakenly, creating a false positive. In re- ality comfortable (o1) describes “the seats in IMAX”, which is an irrelevant opinion, and sen- tence A is a factual statement rather than an opi- nion statement. (a) (b) Figure 2: Two kinds of information representa- tion of opinion retrieval. (t1=‘Avatar’ o1= ‘com- fortable’, o2=‘favorite’) (1) Current approaches cannot capture the re- lationship among opinions about the same topic. Suppose there is another document including sentence C which expresses the same opinion on Avatar. Existing information representation simply does not cater for the two identical opi- nions from different documents. In addition, if many documents contain opinions on Avatar, the relationship among them is not clearly represented by existing approaches. In this paper, we process opinion retrieval in the granularity of sentence as we observe that a complete opinion always exists within a sentence (refer to Figure 2 (b)). To represent a relevant

pinion, we define the notion of topic-sentiment

word pair, which consists of a topic term and a sentiment word. A word pair maintains the asso- ciative information between the two words, and enables systems to draw up the relationship among all the sentences with the same opinion

n an identical target. This relationship informa-

tion can identify all documents with sentences including the sentiment words and to determine the contributions of such words to the target (topic term). Furthermore, based on word pairs, we designed a unified graph-based method for

pinion retrieval (see later in Section 3).

3 Graph-based model

3.1 Basic Idea Different from existing approaches which simply make use of document relevance to reflect the relevance of opinions embedded in them, our approach concerns more on identifying the re- levance of individual opinions. Intuitively, we believed that the more relevant opinions appear in a document, the more relevant is that docu- ment for subsequent opinion analysis operations. Further, since the lexical scope of an opinion does not usually go beyond a sentence, we pro- pose to handle opinion retrieval in the granularity

f sentence.

Without loss of generality, we assume that there is a document set , , , , , and a specific query , , , , , where

, , , , are query keywords. Opinion re-

trieval aims at retrieving documents from with relevant opinion about the query . In ad- dition, we construct a sentiment word lexicon

and a topic term lexicon

(see Section 4). To

maintain the associative information between the target and the opinion, we consider the document set as a bag of sentences, and define a sentence set as , , , , . For each sentence, we capture the intra-sentence information through the topic-sentiment word pair. Definition 1. topic-sentiment word pair con- sists of two elements, one is from

, and the

ther one is from

,

| , .

The topic term from

determines relevance

by the query term matching, and the sentiment word from

is used to express an opinion. We

use the word pair to maintain the associative in- formation between the topic term and the opinion word (also referred to as sentiment word). The word pair is used to identify a relevant opinion in a sentence. In Figure 2 (b), t1, i.e. Avatar in C, is a topic term relevant to the query, and o2 (‘favo- rite’) is supposed to be an opinion; and the word pair < t1, o2> indicates sentence C contains a re- levant opinion. Similarly, we map each sentence in word pairs by the following rule, and express the intra-sentence information using word pairs. For each sentiment word of a sentence, we choose the topic term with minimum distance as the other element of the word pair:

,

| min, for each

According to the mapping rule, although a

sentence may give rise to a number of word pairs,

nly the pair with the minimum word distance is
selected. We do not take into consideration of the
ther words in a sentence as relevant opinions

are generally formed in close proximity. A sen- tence is regarded non-opinionated unless it con- tains at least one word pair. In practice, not all word pairs carry equal weights to express a relevant opinion as the con- tribution of an opinion word differs from differ- ent target topics, and vice versa. For example, the word pair < t1, o2> should be more probable as a relevant opinion than < t1, o1>. To consider 1369

SLIDE 4

that, inter-sentence contextual information is ex-

plored. This is achieved by assigning a weight to

each word pair to measure their associative de- grees to different queries. We believe that the more a word pair appears the higher should be the weight between the opinion and the target in the context. We will describe how to utilize intra-sentence contextual information to express relevant opi- nion, and inter-sentence information to measure the degree of each word pair through a graph-based model in the following section. 3.2 HITS Model We propose an opinion retrieval model based on HITS, a popular graph ranking algorithm (Kleinberg, 1999). By considering both in- tra-sentence information and inter-sentence in- formation, we can determine the weight of a word pair and rank the documents. HITS algorithm distinguishes hubs and au- thorities in objects. A hub object has links to many authorities. An authority object, which has high-quality content, would have many hubs linking to it. The hub scores and authority scores are computed in an iterative way. Our proposed

pinion retrieval model contains two layers. The

upper level contains all the topic-sentiment word pairs ,

| , . The lower

level contains all the documents to be retrieved. Figure 3 gives the bipartite graph representation

f the HITS model.

Figure 3: Bipartite link graph. For our purpose, the word pairs layer is consi- dered as hubs and the documents layer authori-

ties. If a word pair occurs in one sentence of a

document, there will be an edge between them. In Figure 3, we can see that the word pair that has links to many documents can be assigned a high weight to denote a strong associative degree between the topic term and a sentiment word, and it likely expresses a relevant opinion. On the

ther hand, if a document has links to many word

pairs, the document is with many relevant opi- nions, and it will result in high ranking. Formally, the representation for the bipartite graph is denoted as , , , where

is the set of all pairs of topic words

and sentiment words, which appear in one sen-

tence. is the set of documents.

| , corresponds to the

connection between documents and top- ic-sentiment word pairs. Each edge

is asso-

ciated with a weight

0,1 denoting the

contribution of to the document . The weight

is computed by the contribution of

word pair in all sentences of as follows:

|| ∑

· , 1

1
|| is the number of sentences in ;
is introduced as the trade-off parameter to

balance the , and

, ;

, is computed to judge the relevance
f in which belongs to ;

,

, (2)

where

, is the number of appears in ,

and

where

is the number of sentences that the

word appears in.

, is the contribution of

which belongs to .

, ,0.51.5

where is the average number of sentences in

;

, is the number of appears in (Al-

lan et al., 2003; Otterbacher et al., 2005). It is found that the contribution of a sentiment word

will not decrease even if it appears in

all the sentences. Therefore in Equation 4, we just use the length of a sentence instead of

to normalize long sentences which would likely

contain more sentiment words. The authority score

document and a hub score

f at the 1 iteration are computed

based on the hub scores and authority scores in the iteration as follows.

∑

We let ,|||| denote the adjacency matrix.

where

|| is the vector

f authority scores for documents at the ite-

ration and

|| is the

vector of hub scores for the word pairs at

iteration. In order to ensure convergence of the

iterative form,

and

are normalized in each

iteration cycle. 1370

SLIDE 5

For computation of the final scores, the initial scores of all documents are set to

√, and top-

ic-sentiment word pairs are set to

√. The

above iterative steps are then used to compute the new scores until convergence. Usually the convergence of the iteration algorithm is achieved when the difference between the scores computed at two successive iterations for any nodes falls below a given threshold (Wan et al., 2008; Li et al., 2009; Erkan and Radev, 2004). In

ur model, we use the hub scores to denote the

associative degree of each word pair and the au- thority scores as the total scores. The documents are then ranked based on the total scores.

4 Experiment

We performed the experiments on the Chinese benchmark dataset to verify our proposed ap- proach for opinion retrieval. We first tested the effect of the parameter of our model. To demonstrate the effectiveness of our opinion re- trieval model, we compared its performance with the same of other approaches. In addition, we studied each individual query to investigate the influence of query to our model. Furthermore, we showed the top-5 highest weight word pairs

f 5 queries to further demonstrate the effect of

word pair. 4.1 Experiment Setup 4.1.1 Benchmark Datasets Our experiments are based on the Chinese benchmark dataset, COAE08 (Zhao et al., 2008). COAE dataset is the benchmark data set for the

pinion retrieval track in the Chinese Opinion

Analysis Evaluation (COAE) workshop, consist- ing of blogs and reviews. 20 queries are provided in COAE08. In our experiment, we created re- levance judgments through pooling method, where documents are ranked at different levels: irrelevant, relevant but without opinion, and re- levant with opinion. Since polarity is not consi- dered, all relevant documents with opinion are classified into the same level. 4.1.2 Sentiment Lexicon In our experiment, the sentiment lexicon is composed by the following resources (Xu et al., 2007): (1) The Lexicon of Chinese Positive Words, which consists of 5,054 positive words and the Lexicon of Chinese Negative Words, which consists of 3,493 negative words; (2) The opinion word lexicon provided by Na- tional Taiwan University which consists of 2,812 positive words and 8,276 negative words; (3) Sentiment word lexicon and comment word lexicon from Hownet. It contains 1836 posi- tive sentiment words, 3,730 positive com- ments, 1,254 negative sentiment words and 3,116 negative comment words. The different graphemes corresponding to Traditional Chinese and Simplified Chinese are both considered so that the sentiment lexicons from different sources are applicable to process Simplified Chinese text. The lexicon was ma- nually verified. 4.1.3 Topic Term Collection In order to acquire the collection of topic terms, we adopt two expansion methods, dictio- nary-based method and pseudo relevance feed- back method. The dictionary-based method utilizes Wikipe- dia (Popescu and Etzioni, 2005) to find an entry page for a phrase or a single term in a query. If such an entry exists, all titles of the entry page are extracted as synonyms of the query concept. For example, if we search “绿坝” (Green Tsu- nami, a firewall) in Wikipedia, it is re-directed to an entry page titled “花季护航” (Youth Escort). This term is then added as a synonym of “绿坝” (Green Tsunami) in the query. Synonyms are treated the same as the original query terms in a retrieval process. The content words in the entry page are ranked by their frequencies in the page. The top-k terms are returned as potential ex- panded topic terms. The second query expansion method is a web-based method. It is similar to the pseudo relevance feedback expansion but using web documents as the document collection. The query is submitted to a web search engine, such as Google, which returns a ranked list of docu-

ments. In the top-n documents, the top-m topic

terms which are highly correlated to the query terms are returned. 4.2 Performance Evaluation 4.2.1 Parameter Tuning We first studied how the parameter (see Equ- ation 1) influenced the mean average precision (MAP) in our model. The result is given in Fig- ure 4. 1371

SLIDE 6

Figure 4: Performance of MAP with varying . Best MAP performance was achieved in COAE08 evaluation, when was set between 0.4 and 0.6. Therefore, in the following experi- ments, we set 0.4. 4.2.2 Opinion Retrieval Model Comparison To demonstrate the effectiveness of our proposed model, we compared it with the following mod- els using different evaluation metrics: (1) IR: We adopted a classical information re- trieval model, and further assumed that all re- trieved documents contained relevant opinions. (2) Doc: The 2-stage document-based opinion retrieval model was adopted. The model used sentiment lexicon-based method for opinion identification and a conventional information retrieval method for relevance detection. (3) ROSC: This was the model which achieved the best run in TREC Blog 07. It employed ma- chine learning method to identify opinions for each sentence, and to determine the target topic by a NEAR operator. (4) ROCC: This model was similar to ROSC, but it considered the factor of sentence and re- garded the count of relevant opinionated sen- tence to be the opinion score (Zhang and Yu, 2007). In our experiment, we treated this model as the evaluation baseline. (5) GORM: our proposed graph-based opinion retrieval model.

Approach COAE08 Evaluation metrics Run id MAP R-pre bPref P@10

IR 0.2797 0.3545 0.2474 0.4868 Doc 0.3316 0.3690 0.3030 0.6696 ROSC 0.3762 0.4321 0.4162 0.7089 Baseline 0.3774 0.4411 0.4198 0.6931 GORM 0.3978 0.4835 0.4265 0.7309

Table 1: Comparison of different approaches on COAE08 dataset, and the best is highlighted. Most of the above models were originally de- signed for opinion retrieval in English, and re-designed them to handle Chinese opinionated

documents. We incorporated our own Chinese

sentiment lexicon for this purpose. In our expe- riments, in addition to MAP, other metrics such as R-precision (R-prec), binary Preference (bPref) and Precision at 10 documents (P@10) were also

used. The evaluation results based on these me-

trics are shown in Table 1. Table 1 summarized the results obtained. We found that GORM achieved the best performance in all the evaluation metrics. Our baseline, ROSC and GORM which were sentence-based ap- proaches achieved better performance than the document-based approaches by 20% in average. Moreover, our GORM approach did not use ma- chine learning techniques, but it could still achieve outstanding performance. To study GORM influenced by different que- ries, the MAP from median average precision on individual topic was shown in Figure 5. Figure 5: Difference of MAP from Median on COAE08 dataset. (MAP of Median is 0.3724) As shown in Figure 5, the MAP performance was very low on topic 8 and topic 11. Topic 8, i.e. ‘成龙’ (Jackie Chan), it was influenced by topic 7, i.e. ‘李连杰’ (Jet Lee) as there were a number

f similar relevant targets for the two topics, and

therefore many word pairs ended up the same. As a result, documents belonging to topic 7 and topic 8 could not be differentiated, and they both performed badly. In order to solve this problem, we extracted the topic term with highest relevant weight in the sentence to form word pairs so that it reduce the impact on the topic terms in com-

mon. 24% and 30% improvement were achieved,

respectively. As to topic 11, i.e. ‘指环王’ (Lord of King), there were only 8 relevant documents without any opinion and 14 documents with relevant

pinions. As a result, the graph constructed by

insufficient documents worked ineffectively. Except for the above queries, GORM per- formed well in most of the others. To further in- vestigate the effect of word pair, we summarized the top-5 word pairs with highest weight of 5 queries in Table 2.

0.2 0.25 0.3 0.35 0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MAP λ

COAE08

‐0.4 ‐0.3 ‐0.2 ‐0.1 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 1011121314151617181920

Difference Topic Difference from Median Average Precision per Topic

1372

SLIDE 7

Table 2: Top-5 highest weight word pairs for 5 queries in COAE08 dataset. Table 2 showed that most word pairs could represent the relevant opinions about the corres- ponding queries. This showed that inter-sentence information was very helpful to identify the as- sociative degree of a word pair. Furthermore, since word pairs can indicate relevant opinions effectively, it is worth further study on how they could be applied to other opinion oriented appli- cations, e.g. opinion summarization, opinion prediction, etc.

5 Related Work

Our research focuses on relevant opinion rather than on relevant document retrieval. We, there- fore, review related works in opinion identifica- tion research. Furthermore, we do not support the conventional 2-stage opinion retrieval approach. We conducted literature review on unified opi- nion retrieval models and related work in this area is presented in the section. 5.1 Lexicon-based Opinion Identification Different from traditional IR, opinion retrieval focuses on the opinion nature of documents. During the last three years, NTICR and TREC evaluations have shown that sentiment lex- icon-based methods led to good performance in

pinion identification.

A lightweight lexicon-based statistical ap- proach was proposed by Hannah et al. (2007). In this method, the distribution of terms in relevant

pinionated documents was compared to their

distribution in relevant fact-based documents to calculate an opinion weight. These weights were used to compute opinion scores for each re- trieved document. A weighted dictionary was generated from previous TREC relevance data (Amati et al., 2007). This dictionary was submit- ted as a query to a search engine to get an initial query-independent opinion score of all retrieved

documents. Similarly, a pseudo opinionated

word composed of all opinion words was first created, and then used to estimate the opinion score of a document (Na et al., 2009). This me- thod was shown to be very effective in TREC evaluations (Lee et al., 2008). More recently, Huang and Croft (2009) proposed an effective relevance model, which integrated both query-independent and query-dependent senti- ment words into a mixture model. In our approach, we also adopt sentiment lex- icon-based method for opinion identification. Unlike the above methods, we generate a weight to a sentiment word for each target (associated topic term) rather than assign a unified weight or an equal weight to the sentiment word for the whole topics. Besides, in our model no training data is required. We just utilize the structure of

ur graph to generate a weight to reflect the as-

sociative degree between the two elements of a word pair in different context. 5.2 Unified Opinion Retrieval Model In addition to conventional 2-stage approach, there has been some research on unified opinion retrieval models. Eguchi and Lavrenko proposed an opinion re- trieval model in the framework of generative language modeling (Eguchi and Lavrenko, 2006). They modeled a collection of natural language documents or statements, each of which con- sisted of some topic-bearing and some senti- ment-bearing words. The sentiment was either represented by a group of predefined seed words,

r extracted from a training sentiment corpus.

This model was shown to be effective on the MPQA corpus. Mei et al. tried to build a fine-grained opinion retrieval system for consumer products (Mei et al., 2007). The opinion score for a product was a mixture of several facets. Due to the difficulty in

Top-5 MAP

陈凯歌 Chen Kaige 国六条 Six States 宏观调控 Macro-regulation 周星驰 Stephen Chow Vista Vista <陈凯歌支持> Chen Kaige Support <陈凯歌最佳> Chen Kaige Best <《无极》骂> Limitless Revile <影片优秀> Movie Excellent <阵容强大的> Cast Strong <房价上涨> Room rate Rise <调控加强> Regulate Strengthen <中央加强> CCP Strengthen <房价平稳> Room rate Steady <住房保障> Housing Security <经济平稳> Economics Steady <价格上涨> Price Rise <发展平稳> Development Steady <消费上涨> Consume Rise <社会保障> Social Security <电影喜欢> Movie Like <周星驰喜欢> Stephen Chow Like <主角最佳> Protagonist Best <喜剧好> Comedy Good <作品精彩> Works Splendid <价格贵> Price Expensive <微软喜欢> Microsoft Like <Vista 推荐> Vista Recommend <问题重要> Problem Vital <性能不> Performance No

1373

SLIDE 8

associating sentiment with products and facets, the system was only tested using small scale text collections. Zhang and Ye proposed a generative model to unify topic relevance and opinion generation (Zhang and Ye, 2008). This model led to satis- factory performance, but an intensive computa- tion load was inevitable during retrieval, since for each possible candidate document, an opinion score was summed up from the generative prob- ability of thousands of sentiment words. Huang and Croft proposed a unified opinion retrieval model according to the Kullback-Leib- ler divergence between the two probability dis- tributions of opinion relevance model and docu- ment model (Huang and Croft, 2009). They di- vided the sentiment words into query-dependent and query-independent by utilizing several sen- timent expansion techniques, and integrated them into a mixed model. However, in this model, the contribution of a sentiment word was its corres- ponding incremental mean average precision

value. This method required that large amount of

training data and manual labeling. Different from the above opinion retrieval ap- proaches, our proposed graph-based model processes opinion retrieval in the granularity of

sentence. Instead of bag-of-word, the sentence is

split into word pairs which can maintain the contextual information. On the one hand, word pair can identify the relevant opinion according to intra-sentence contextual information. On the

ther hand, it can measure the degree of a rele-

vant opinion by considering the inter-sentence contextual information.

6 Conclusion and Future Work

In this work we focus on the problem of opinion

retrieval. Different from existing approaches,

which regard document relevance as the key in- dicator of opinion relevance, we propose to ex- plore the relevance of individual opinion. To do that, opinion retrieval is performed in the granu- larity of sentence. We define the notion of word pair, which can not only maintain the association between the opinion and the corresponding target in the sentence, but it can also build up the rela- tionship among sentences through the same word

pair. Furthermore, we convert the relationships

between word pairs and sentences into a unified graph, and use the HITS algorithm to achieve document ranking for opinion retrieval. Finally, we compare our approach with existing methods. Experimental results show that our proposed model performs well on COAE08 dataset. The novelty of our work lies in using word pairs to represent the information needs for opi- nion retrieval. On the one hand, word pairs can identify the relevant opinion according to in- tra-sentence contextual information. On the other hand, word pairs can measure the degree of a relevant opinion by taking inter-sentence con- textual information into consideration. With the help of word pairs, the information needs for

pinion retrieval can be represented appropriate-

ly. In the future, more research is required in the following directions: (1) Since word pairs can indicate relevant opi- nions effectively, it is worth further study on how they could be applied to other opinion

riented applications, e.g. opinion summa-

rization, opinion prediction, etc. (2) The characteristics of blogs will be taken into consideration, i.e., the post time, which could be helpful to create a more time sensi- tivity graph to filter out fake opinions. (3) Opinion holder is another important role of an opinion, and the identification of opinion holder is a main task in NTCIR. It would be interesting to study opinion holders, e.g. its seniority, for opinion retrieval. Acknowledgements: This work is partially supported by the Innovation and Technology Fund of Hong Kong SAR (No. ITS/182/08) and National 863 program (No. 2009AA01Z150). Special thanks to Xu Hongbo for providing the Chinese sentiment resources. We also thank Bo Chen, Wei Gao, Xu Han and anonymous re- viewers for their helpful comments.

References

James Allan, Courtney Wade, and Alvaro Bolivar.

2003. Retrieval and novelty detection at the sen-

tence level. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 314-321. ACM. Giambattista Amati, Edgardo Ambrosi, Marco Bianc- hi, Carlo Gaibisso, and Giorgio Gambosi. 2007. FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog Track. In Proceedings of the 15th Text Retrieval Conference. Koji Eguchi and Victor Lavrenko. Sentiment retrieval using generative models. 2006. In EMNLP ’06, Proceedings of 2006 Conference on Empirical Me- thods in Natural Language Processing, page 345-354.

1374

SLIDE 9

Gunes Erkan and Dragomir R. Radev. 2004. Lexpa- gerank: Prestige in multi-document text summariza-

tion. In EMNLP ’04, Proceedings of 2004 Confe-

rence on Empirical Methods in Natural Language Processing. David Hannah, Craig Macdonald, Jie Peng, Ben He, and Iadh Ounis. 2007. University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier. In Proceedings of the 15th Text Retrieval Conference. Xuanjing Huang, William Bruce Croft. 2009. A Uni- fied Relevance Model for Opinion Retrieval. In Proceedings of CIKM. Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM, 46(5): 604-632. Yeha Lee, Seung-Hoon Na, Jungi Kim, Sang-Hyob Nam, Hun-young Jung, Jong-Hyeok Lee. 2008. KLE at TREC 2008 Blog Track: Blog Post and Feed

Retrieval. In Proceedings of the 15th Text Retrieval

Conference. Fangtao Li, Yang Tang, Minlie Huang, and Xiaoyan

Zhu. 2009. Answering Opinion Questions with

Random Walks on Graphs. In ACL ’09, Proceedings

f the 48th Annual Meeting of the Association for

Computational Linguistics. Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing opi- nion s on the web. In WWW ’05: Proceedings of the 14th International Conference on World Wide Web. Craig Macdonald and Iadh Ounis. 2007. Overview of the TREC-2007 Blog Track. In Proceedings of the 15th Text Retrieval Conference. Craig Macdonald and Iadh Ounis. 2006. Overview of the TREC-2006 Blog Track. In Proceedings of the 14th Text Retrieval Conference. Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and Chengxiang Zhai. 2007. Topic sentiment mix- ture: Modeling facets and opinions in weblogs. In WWW ’07: Proceedings of the 16 International Conference on World Wide Web. Seung-Hoon Na, Yeha Lee, Sang-Hyob Nam, and Jong-Hyeok Lee. 2009. Improving opinion retrieval based on query-specific sentiment lexicon. In ECIR ’09: Proceedings of the 31st annual European Conference

Information Retrieval, pages 734-738. Douglas Oard, Tamer Elsayed, Jianqiang Wang, Ye- jun Wu, Pengyi Zhang, Eileen Abels, Jimmy Lin, and Dagbert Soergel. 2006. TREC-2006 at Mary- land: Blog, Enterprise, Legal and QA Tracks. In Proceedings of the 15th Text Retrieval Conference. Jahna Otterbacher, Gunes Erkan, and Dragomir R.

Radev. 2005. Using random walks for ques-

tion-focused sentence retrieval. In EMNLP ’05, Proceedings of 2005 Conference on Empirical Me- thods in Natural Language Processing. Larry Page, Sergey Brin, Rajeev Motwani, and Terry

Winograd. 1998. The pagerank citation ranking:

Bringing order to the web. Technical report, Stan- ford University. Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in In- formation Retrieval, 2(1-2): 1-135. Ana-Maria Popescu and Oren Etzioni. 2005. Extract- ing product features and opinion s from reviews. In EMNLP ’05, Proceedings of 2005 Conference on Empirical Methods in Natural Language Processing. Xiaojun Wan and Jianwu Yang. 2008. Mul- ti-document summarization using cluster-based link

analysis. In SIGIR ’08: Proceedings of the 31th an-

nual international ACM SIGIR conference on Re- search and development in information retrieval, pages 299-306. ACM. Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In EMNLP ’05, Proceedings of 2005 Conference on Empirical Me- thods in Natural Language Processing. Ruifeng Xu, Kam-Fai Wong and Yunqing Xia. 2007. Opinmine - Opinion Analysis System by CUHK for NTCIR-6 Pilot Task. In Proceedings of NTCIR-6. Min Zhang and Xingyao Ye. 2008. A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval. In SIGIR ’08: Pro- ceedings of the 31st Annual International ACM SI- GIR conference on Research and Development in Information Retrieval, pages 411-418. ACM. Wei Zhang and Clement Yu. 2007. UIC at TREC 2007 Blog Track. In Proceedings of the 15th Text Retrieval Conference. Jun Zhao, Hongbo Xu, Xuanjing Huang, Songbo Tan, Kang Liu, and Qi Zhang. 2008. Overview of Chi- nese Opinion Analysis Evaluation 2008. In Pro- ceedings of the First Chinese Opinion Analysis Evaluation.