Outline of the talk Part I: Representation(s) and Categorization(s) - - PowerPoint PPT Presentation

outline of the talk
SMART_READER_LITE
LIVE PREVIEW

Outline of the talk Part I: Representation(s) and Categorization(s) - - PowerPoint PPT Presentation

Outline of the talk Part I: Representation(s) and Categorization(s) Structuring temporal sparse data with application to opinion mining Part II: Evolutionary Clustering for Sparse Data Julien Velcin Part III: Application to the ImagiWeb


slide-1
SLIDE 1

Structuring temporal sparse data with application to opinion mining

Julien Velcin University of Lyon – ERIC Lab

Joint work with Y.M. Kim, A. Hasnat, S. Bonnevay, J. Jacques and more…

1st Lyon-Columbia Research Workshop ISFA, June 27, 2016

Part IV: Conclusion and Future Work

Outline of the talk

2 Part I: Representation(s) and Categorization(s) Part II: Evolutionary Clustering for Sparse Data Part III: Application to the ImagiWeb Project Part IV: Conclusion and Future Work

Outline of the talk

3 Part I: Representation(s) and Categorization(s) Part II: Evolutionary Clustering for Sparse Data Part III: Application to the ImagiWeb Project

Studying representations

4

slide-2
SLIDE 2

Nowadays with Internet

5

Chance and curse of big data:

  • Volume
  • Variety (sources and data)
  • Velocity

etc.

For textual data:

  • semanYc gap
  • language is living
  • curse of dimensionality

Representing ≈ categorizing

Philosophy, logic

Necessary and Sufficient CondiYons [Aristotle] Family resemblance [Wiagenstein,1958]

Psychology, linguisYcs

CogniYve représentaYons and prototypes [Rosch,1973] LinguisYc categories [Lakoff,1987]

Sociology

Social representaYons [Lippmann,1922] [Moscovici,1961]

è Data Science

6

When Who? How? What?

Key idea to take home

7 topic learning

  • pinion

mining SN analysis, role detecYon Yme-aware models

Machine learning (weakly supervised clustering) can help for studying representaYons

Representation and sparseness

Title Type Plot Actors Rythm Originality etc. Tomorrowland Sci-fi (…)

  • G. Clooney,
  • H. Laurie…
  • +

+

  • -

+ 8

Image of a movie

“New Disney rather desappoinYng. But I like so much sci-fi movies I couldn’t miss it.” “Ambi=ous and visually stunning, this movie…” “The film stars George Clooney, Hugh Laurie, Bria Robertson, and Raffey Cassidy” “Like the whole plot but obviously too long for kids” “Tomorrowland’ forgeEable look into future” “How do you spell boring? T-O-M-O-R-R-O-W-L-A-N-D. ”

slide-3
SLIDE 3

Part IV: Conclusion and Future Work

Outline of the talk

9 Part I: Representation(s) and Categorization(s) Part II: Evolutionary Clustering for Sparse Data Part III: Application to the ImagiWeb Project

Temporal evolution of entities

10

t

EnYty

Examples:

  • movie
  • poliYcian
  • company
  • brand

etc. category of similar representaYons = cluster of similar objects

Sparse matrix as input

Author Time f1 f2 f3 f4 f5 f6 … fn-1 fn pseudo1 t1 1 2 1 pseudo1 t2 1 1 pseudo1 t3 2 2 pseudo2 t1 3 1 1 pseudo3 t1 3 pseudo3 t2 2 pseudo3 t3 2 pseudo4 t3 3 1 pseudo5 t3 3 2 11

descripYon features

Some state of the art

Taking Yme into account

incremental clustering [Aggarwal,2003] [Labroche,2014] evoluYonary clustering [ChakrabarY,2006] [Chi,2007] monitoring cluster evoluYon [Spiliopoulou,2006,2013]

Dealing with sparse data

mixture models [Dempster,1977] topic models [Hofmann,1999] [Blei,2003] default clustering [Velcin,2005]

12

slide-4
SLIDE 4

Our objective

Analyze temporal sparse data using clustering

idenYfy group of users who use similar descripYons track enYty’s image over Yme detect and interpret temporal changes

Test on real data within ImagiWeb project

case study 1: image of French poliYcians given by Twiaer users case study 2: image of a big naYonal company about nuclear energy given by bloggers

13

Model 1: Temporal Mixture Model

TMM = probabilisYc generaYve model [Kim,2015] What’s new?

retrospecYve approach: the recent past maaers no Dirichlet prior, in opposite to most topic models

Parameters to esYmate: OpYmizaYon by ExpectaYon-MaximizaYon (EM)

14

Model 2: Parametric link approach

MM-Plink = MM + linear link between (t-1) and (t) + model selecYon using BIC RelaYon between the parameters μt-1 and μt: Clustering esYmated with classic EM Different combinaYons tested for (δ,γ):

(1,0) = no change, (0,γj,k) = totally new clusters, (δ,γ) = same global change, etc.

15

cluster at Yme t cluster at Yme t link parameter (mult.) link parameter (add.)

Differences between the two models

16

t model 1: TMM model 2: MM-Plink t

+ interpretaYon

slide-5
SLIDE 5

Part IV: Conclusion and Future Work

Outline of the talk

17 Part I: Representation(s) and Categorization(s) Part II: Evolutionary Clustering for Sparse Data Part III: Application to the ImagiWeb Project

ImagiWeb project

Studying the image (representaYon) of enYYes emiaed from the social media and its evoluYon over Yme [Velcin,2014] Granted by the ANR for 3 years (2012-2015) Needs complementary skills: NLP, machine learning, sovware engineering, analysis of public opinion, semiology… 6 partners : ERIC (management), CEPEL, LIA, AMI Sovware, EDF R&D, Xerox Research Centre Europe (XRCE)

18

Aspects:

  • PoliYcal line
  • Future project
  • Balance sheet
  • Ethic
  • InjuncYon
  • CommunicaYon

etc.

Blogs Twiaer

EnYty

Design of a full annotation scheme

19 (Sarkozy, - ) (Sarkozy, bilan, -- ) (Sarkozy, communicaYon, + ) (Sarkozy, compétence, + ) 20

slide-6
SLIDE 6

La France est une république indivisible, démocraYque, laïque et sociale, voilà mon engagement. #FH2012

Automatic annotation

21

Copé appelle Hollande à "reprendre en main" son gouvernement "incompétent" hap://t.co/lPanwi5r via @LePoint François Hollande : le mensonge c'est maintenant: C'est cela un président . Il y a pas comme un léger bug #Delanoë "ce qui me frappe ds la campagne de #Hollande c son honnêteté intellectuelle alors que #Sarkozy dit tout et n importe quoi" Je savais qu'Hollande était un gros mou de socialiste. Mais là si ce n'est pas du reniement ou du renoncement ?#Libertédeconscience @****** Hollande n'a aucun charisme ! Il fait honte à la France et aux Français !

(Compétence, - )

@aut-1154 Neuilly sur Seine 61100 habitants , France 65000 000 .Votez Hollande.

(InjoncYon, + )

Geste fort du président #Hollande qui parYcipera ce jeudi à la journée des mémoires, de la traite, de l'esclavage et de leurs aboliYons.

(Ethique, + + ) (PosiYonnement, + )

SympaYsch, ce Hollande. Et culYvé avec ça. On a parlé saucisses toute la soirée.

(Personne:Charisme, - )

Pourquoi j'aime bien Mélenchon et je voterai Hollande hap://t.co/ TVM8RwoH via @***************

(InjoncYon, + )

(Ethique:Honnêteté, + )

(Personne:Charisme, - ) (Ethique:Honnêteté, - ) (Ethique:Honnêteté, - - )

Extracting and monitoring images

Author Time (a1,++) (a1,+) (a1,o) (a1,-) (a1,- -) (a2,++) … (ap,-) (ap,- -) pseudo1 t1 1 2 1 pseudo1 t2 1 1 pseudo1 t3 2 2 pseudo2 t1 3 1 1 pseudo3 t1 3 pseudo3 t2 2 pseudo3 t3 2 pseudo4 t3 1 pseudo5 t3 3 2 22 (EnYté, + + ) (EnYté, + ) (EnYté, o ) (EnYté, - - ) (EnYté, - ) (Aaribut, + + ) (Projet, - ) (Projet, - - ) E.g. with poliYcians:

Testing model-based evolutionary clustering

François Hollande Aaribute InjuncYon EnYty PoliYcal line CommunicaYon Person Skills Balance sheet Project Ethic

20 50 90

++

  • -
  • +
  • polarity

1 cluster

  • f 254 users

(before elecYon) 23 distribu=on:

Quantitative results of TMM

Tested on a subset of tweets (enYty FH, before and aver elecYon, k=9) Comparison with:

Dynamic Topic Model (DTM) [Blei,2006] Simple Mixture Model (MM)

  • Probab. Latent SemanYc Analysis (pLSA) [Hofmann,1999]

Internal critera:

Co-ocurrence level (COL) Average Unsmoothness (AUS) Average Homogeneity (AHM) Author Consistency Sum (ACS)

24

slide-7
SLIDE 7

Quantitative results of TMM

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 DTM MM pLSA TTM

AHM

108 113 118 123 128 DTM MM pLSA TTM

COL

1.3 1.8 2.3 2.8 3.3 3.8 4.3 DTM MM pLSA TTM

AUS

20 25 30 35 40 45 DTM MM pLSA TTM

ACS 25

TMM TMM TMM TMM

Quantitative results of MM-Plink

Tested on a subset of tweets (enYty FH, 3 Yme periods, k=3, total of ~3000 observaYons) Comparison between:

Simple Mixture Model (MM) Temporal Mixture Model (TMM) [Kim,2015] Parametric-link MM (MM-Plink) = our new proposal

AddiYonal criterion: Average Perplexity (APL)

26 0.9 <= < 1.1

Towards and understanding of evolution

27 Link parameter δj,k: < 0.9 1.1 <= Model (δj,k,0) selected

Integrated into the final prototype

28

With TMM: MM-Plink soon…

slide-8
SLIDE 8

Part IV: Conclusion and Future Work

Outline of the talk

29 Part I: Representation(s) and Categorization(s) Part II: Evolutionary Clustering for Sparse Data Part III: Application to the ImagiWeb Project

Conclusion

New models for evoluYonary clustering

dedicated to sparse data taking temporal transiYon into account trying to add more interpretaYon of the evoluYon process

applied to social media analysis

extracYon and monitoring opinionated images

in close collaboraYon with social sciences

joint work with specialists in poliYcal studies and semiologists (all along the process)

30

Future work

31

From the methodology point of view:

going farther into the interpretaYon process more comparisons needed (see MONIC [Spiliopoulou,2006] for instance) tesYng non-parametric approaches looking for change points in the Ymeline

For the ImagiWeb project:

tesYng TMM and MM-Plink on the rest of the available data and 2nd case study more (qualitaYve) evaluaYon needed qualifying users’ groups using addiYonal variables

[ THANK YOU ]

32