SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES - PowerPoint PPT Presentation
1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informtica UFRGS Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br Introduction 2
1 SEMI-SUPERVISED STANCE DETECTION IN TWEETS BASED ON SENTIMENT RULES Marcelo Dias and Karin Becker Instituto de Informática – UFRGS – Porto Alegre - Brazil marcelo.dias@inf.ufrgs.br and karin.becker@inf.ufrgs.br
Introduction 2 Opinion Analysis Detect sentiment polarity (negative or positive) T arget (often mentioned in the text) Stance Detection Detect Stance (against or favor) T owards a given target (main target vs indirect targets) In favor stance can be expressed through positive/negative sentiments (and vice-versa)
Introduction 3 Related Work Structured text or discussion threads (congress vote, on-line debate, ....) wider textual context to interpret content [Thomas et al. 2006] [Anand et al. 2011] [Somasundaran and Wiebe 2009] T weets: short text and poorly written content rely more on inferences from static/dynamic properties of the platform [Rajadesingan and Liu 2014] Less focus on properties extracted from textual contents only Most works adopt supervised methods Often address a binary problem (Favor/Against)
Goal 4 Stance Detection based only on tweets textual content Rule-based, Semi-supervised method 3 classes problem (Favor, Against and None) Improvements on our early work Third place in SemEval 2016 T ask 6-B (unsupervised, Trump T arget) Evaluate generality using several distinct domains SemEval 2016 T ask 6-A T argets (supervised)
Process Overview 6
Process Overview 7
Process Overview: automatic labeling 8
Key and T arget N-grams 9 Key n-grams: terms/phrases that denote a stance T arget n-grams: identify a target directly or indirectly related to main target combined with polarity to denote a stance May be Favor or Against Main target: Hillary Clinton N-GRAMS FAVOR AGAINST KEY ReadyForHillary, StopHillary, Hillary2016 MakeAmericaGreatAgain TARGET Hillary, Democrats T rump, Republicans
Key and T arget N-grams Identifjcation 10
Key and T arget N-grams Identifjcation 11 Input: domain corpus Current selection N-Gram frequency ranking Manual selection of top frequent n-grams Output: selected Key and T arget n-grams Currently evaluating automatic n-grams selection methods
Process Overview: Automatic Labeling 12
Rules x Stance 13 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
Rules x Stance 14 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
Rules x Stance 15 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
Rules x Stance 16 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
Rules x Stance 17 FEATURES Presence of at least one Favor/Against Key N-grams Presence of at least one Favor/Against T arget N-grams Presence of at least one hashtag T weet Polarity
Automatic Labeling 18 Input: selected n-grams and a dataset T weet Pre-processing features extraction tweet polarity detection (combination of ofg-the- shelf APIs) Rules Application Output: Filtered labeled tweets and discarded tweets
Predictive Model Generation 20
Method Overview: Stance Detection 22
Experiments 24 Goal: Generality of the method for stance detection 6 datasets on various domains Rules coverage Rules precision Stance prediction
Datasets: SemEval 2016 – T ask 6 25 Stance: Against, Favor or None Subtask A – Supervised 5 targets with 2 datasets each (training and test) Atheism, Climate change is a real concern, Feminism, Hillary Clinton and Legalization of Abortion Subtask B – Semi- supervised/Unsupervised 1 targets with 2 datasets each (domain and test) Fonte: Donald Trump http://www.saifmohamma d.com/WebPages/StanceD ataset.htm
Rules Coverage 26 Average corpus coverage: 75% In general, Rules 2, 3, 4 and 7 were representative 13% to 17% Rules 5 and 6 are representative only for Atheism Rule 1 is representative only for Feminism
Rules Precision 27 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% RULE 1 RULE 2 RULE 3 RULE 4 RULE 5 RULE 6 RULE 7
Automatic Labeling x Predictive Model 28 Precision weighted Average 80 77 75 69 69 70 63 62 58 60 56 48 50 42 41 Automatic Labelling 40 35 Predictive Model 30 20 10 0 Abortion Atheism Climate Feminism Hillary Trump
Results x Baseline 29 0.7 0.63 0.62 0.61 0.58 0.57 0.6 0.56 0.54 0.54 0.51 0.48 0.48 0.5 0.42 0.4 0.3 OUR RESUL T 0.2 SEMEVAL WINNER 0.1 0 Except for Trump, all the baselines were developed using a supervised method
Strengths and Weakness 30 Strengths Simplicity of the method May be applied to difgerent domains/targets Simplify the manual corpus annotation efgort Restricted to n-grams Weakness Dependent on the appropriate selection of n-grams Requires domain knowledge Some rules do not perform well Performance depends on the prevalence of the class
Future Work 31 Key and target N-grams automatic identifjcation Revised set of rules Neutral stance identifjcation improvement Improvement of supervised-learning predictive models Predictive model features Automatic extraction of training instances from authority twitter profjles Classifjcation algorithms or committees
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.