SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - - PowerPoint PPT Presentation

sigir 10 siddharth gopal yiming yang introduction
SMART_READER_LITE
LIVE PREVIEW

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - - PowerPoint PPT Presentation

SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation Proposed approach Ranking Thresholding Experiments 7/20/2010 2 Webpage/Image/ News Article Binary classification (e.g.) Ad vs Not-an-Ad


slide-1
SLIDE 1

SIGIR ’10 Siddharth Gopal & Yiming Yang

slide-2
SLIDE 2

 Introduction  Motivation  Proposed approach –

  • Ranking
  • Thresholding

 Experiments

2 7/20/2010

slide-3
SLIDE 3

 Binary classification (e.g.)

  • Ad vs Not-an-Ad
  • Spam vs Genuine

 Multiclass classification (e.g.)

  • Which country is it about ? Switzerland, France, Italy, United States, ..

 Multilabel classification

  • What topics is it related to ? Politics , Terrorism, Health, Sports, ..

Webpage/Image/ News Article

3 7/20/2010

slide-4
SLIDE 4

 Our goal  Given:

  • A set of training examples
  • For each training instance, the set of

relevant categories

: , , { 1,2,....., }

d

F x y x R y m   

Subset of categories

{ | }

d i i

x x R  { | { 1,2,3.... }}

i i

y y m 

4 7/20/2010

Webpage , Image , etc..

slide-5
SLIDE 5

 Binary relevance learning

  • Split the problem into several independent binary classification

problems - One vs Rest, Pairwise.

 Instance based multilabel classifier

  • Standard ML-kNN ( Yang, SIGIR 1994 )
  • Bayesian style ML-kNN. (Zhang and Zhou , Pattern Recognition 2007)
  • Logistic regression style – (IBLR-ML) using kNN features (Cheng and

Hüllermeier, Machine Learning 2009)

 Model based method

  • Rank-SVM for MLC, A maximum margin method re-enforcing

partial order constraints. (Elisseff and Weston, NIPS 2002)

5 7/20/2010

slide-6
SLIDE 6

 Rank-svm

  • Having a global optimization criteria: Not break-

down into multiple independent binary problems

  • A large number of parameters (mD )
  • Different from Rank-SVM for IR [ and other Learning

to rank IR methods ]

 Follows a two-step procedure

(a) Rank categories for a given instance (b) Select an instance specific threshold.

 Our approach – to leverage recent learning to

rank methods in IR to solve (a).

6 7/20/2010

slide-7
SLIDE 7

The typical learning to rank framework

Documents are represented using a combined feature representation between query, and document (TF, Cosine-sim, BM25 , Okapi etc)

1 2 3

.. .. d d d q                                     Corpus Query

Model

10 3 1

.. .. d d d                

1 2 3

.. .. d d d q                                     Corpus Query

1 2 3

( , ) ( , ) ( , ) .. .. q d q d q d                   

Model

10 3 1

.. .. d d d                

7 7/20/2010

slide-8
SLIDE 8

 Given a new instance, rank the categories ..  How do we define a Combined Feature representation ?

1 2 3 .. Cats Doc d m                                    

Model 5 1 2 .. ..                

( ,1) ( ,2) ( ,3) .. ( , ) vec d vec d vec d vec d m                

Model

1 2 3 .. Cats Doc d m                                    

5 1 2 .. ..                

8 7/20/2010

slide-9
SLIDE 9

 Define feature representation of the pair

(instance, category) as follows

 Distance to category centroid also appended  Concatenated L1, L2 and cosine similarity

distances

1 2

( , ) [ ( ( ...., ( ]

i NN i c NN i c kNN i c c

vec x c Dist x ,D ),Dist x ,D ), Dist x ,D ) D Instances that belong to category 'c'  

9 7/20/2010

slide-10
SLIDE 10

 Pictorially (using only L2 links)  Thicker lines denotes links to the centroid  Thinner lines denotes links to the category

neighborhood

10 7/20/2010

slide-11
SLIDE 11

 In short,

  • Represent the relation between each instance and

category using

  • Substantially reduced model parameters

compared to Rank-SVM for MLC.

  • Allow to use any learning to rank algorithm for IR

to rank the categories

  • In our experience, we used SVM-MAP as the

learning to rank method.

( , )

i

vec x c

11 7/20/2010

slide-12
SLIDE 12

 Introduction  Motivation  Proposed approach –

  • Ranking
  • Thresholding

 Experiments

12 7/20/2010

slide-13
SLIDE 13

Supervised learning of instance-specific threshold (Elisseff

and Weston, NIPS 2002)

1) 2) 3) 4)

1 2

1...

[ ,

,... ]

m LETOR i i i i

i n

x s s s

 

1 2

[

Learn : , ,... ]

T m i

w w s s s t 

1 2

: [ , ,... ]

T m test test test test

Predict Threshold t w s s s 

Ranklist of category scores Threshold for a ranklist is the

  • ne that minimizes the sum of

FP and FN

13 7/20/2010 1 2 1 1 1 1 1 2 2 2 2 2 1 2

[ [ [

( , ,... ], ) ( , ,... ], ) ::: ( , ,... ], )

m m m n n n n

s s s t s s s t s s s t

slide-14
SLIDE 14

 Introduction  Motivation  Proposed approach –

  • Ranking
  • Thresholding

 Experiments

14 7/20/2010

slide-15
SLIDE 15

Dataset #Training #Testing #Categories #Avg-label per instance #Features

Emotions 391 202 6 1.87 72 Scene 1211 1196 6 1.07 294 Yeast 1500 917 14 4.24 103 Citeseer 5303 1326 17 1.26 14601 Reuters- 21578 7770 3019 90 1.23 18637

15 7/20/2010

slide-16
SLIDE 16

 SVM-MAP-MLC

  • Our proposed approach

 ML-kNN (Zhang and Zhou , Pattern Recognition 2007)  IBLR-ML (Cheng and Hüllermeier, Machine Learning 2009)  Rank-SVM (Elisseff and Weston, NIPS 2002)  Standard One vs Rest SVM

16 7/20/2010

slide-17
SLIDE 17

 Average Precision

  • Standard metric in IR
  • For a ranklist, measures the precision at each relevant category

and averages them.

 RankingLoss

  • Measures the average number of inversions between the

relevant and irrelevant categories in the ranklist

 Micro-F1 & Macro-F1

  • F1 is the harmonic mean of precision and recall.
  • Micro-averaging gives equal importance to each document.
  • Macro-averaging gives equal importance to each category.

17 7/20/2010

slide-18
SLIDE 18

0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

1-Rankloss performance

SVM-MAP- MLC ML-kNN Rank-SVM Binary-SVM IBLR 0.7 0.75 0.8 0.85 0.9 0.95 1

MAP performance

SVM-MAP-MLC ML-kNN Rank-SVM Binary-SVM IBLR

18 7/20/2010

slide-19
SLIDE 19

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

Micro-F1 performance

SVM-MAP-MLC ML-kNN Rank-SVM Binary-SVM IBLR 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Macro-F1 performance

SVM-MAP- MLC ML-kNN Rank-SVM Binary-SVM IBLR

19 7/20/2010

slide-20
SLIDE 20

 Meta-level features to represent the

relationship between instances and categories

 Merging learning to rank and multilabel

classification using the Meta-level features.

 Improve the state-of-the-art for multilabel

classification

20 7/20/2010

slide-21
SLIDE 21

 Different kinds of meta-level features  Different Learning to rank methods  Optimize different metrics other than MAP.

21 7/20/2010

slide-22
SLIDE 22

THANKS !

22 7/20/2010

slide-23
SLIDE 23

 A Typical scenario in text categorization  Support vector machine, logistic regression or boosting learn

‘m’ weight vectors each of length |vocabulary|, a total of m*|vocabulary| parameters. Is this good or bad ?

Classifie r

Wall Street Market Crime . . Bag of Words

23 7/20/2010

slide-24
SLIDE 24

 Words are fairly discriminative  Current methods build a predictor based on weighting

different words

 Disadvantages

  • Too many words
  • Does not allow us to have a firm control over how each

instance is related to a particular category.

24 7/20/2010

slide-25
SLIDE 25

 Effect of Different feature-sets

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Emotions Yeast Scene Citeseer Reuters-21578

ALL L2 L1 Cos

25 7/20/2010

slide-26
SLIDE 26

7/20/2010 26

Rank-svm for IR Rank-svm for MLC

slide-27
SLIDE 27

7/20/2010 27

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SVM-MAP MLKNN RANKSVM-MLC SVM IBLR-ML

slide-28
SLIDE 28

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SVM-MAP MLKNN RANKSVM-MLC SVM IBLR-ML

7/20/2010 28