SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - - PowerPoint PPT Presentation
SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation - - PowerPoint PPT Presentation
SIGIR 10 Siddharth Gopal & Yiming Yang Introduction Motivation Proposed approach Ranking Thresholding Experiments 7/20/2010 2 Webpage/Image/ News Article Binary classification (e.g.) Ad vs Not-an-Ad
Introduction Motivation Proposed approach –
- Ranking
- Thresholding
Experiments
2 7/20/2010
Binary classification (e.g.)
- Ad vs Not-an-Ad
- Spam vs Genuine
Multiclass classification (e.g.)
- Which country is it about ? Switzerland, France, Italy, United States, ..
Multilabel classification
- What topics is it related to ? Politics , Terrorism, Health, Sports, ..
Webpage/Image/ News Article
3 7/20/2010
Our goal Given:
- A set of training examples
- For each training instance, the set of
relevant categories
: , , { 1,2,....., }
d
F x y x R y m
Subset of categories
{ | }
d i i
x x R { | { 1,2,3.... }}
i i
y y m
4 7/20/2010
Webpage , Image , etc..
Binary relevance learning
- Split the problem into several independent binary classification
problems - One vs Rest, Pairwise.
Instance based multilabel classifier
- Standard ML-kNN ( Yang, SIGIR 1994 )
- Bayesian style ML-kNN. (Zhang and Zhou , Pattern Recognition 2007)
- Logistic regression style – (IBLR-ML) using kNN features (Cheng and
Hüllermeier, Machine Learning 2009)
Model based method
- Rank-SVM for MLC, A maximum margin method re-enforcing
partial order constraints. (Elisseff and Weston, NIPS 2002)
5 7/20/2010
Rank-svm
- Having a global optimization criteria: Not break-
down into multiple independent binary problems
- A large number of parameters (mD )
- Different from Rank-SVM for IR [ and other Learning
to rank IR methods ]
Follows a two-step procedure
(a) Rank categories for a given instance (b) Select an instance specific threshold.
Our approach – to leverage recent learning to
rank methods in IR to solve (a).
6 7/20/2010
The typical learning to rank framework
Documents are represented using a combined feature representation between query, and document (TF, Cosine-sim, BM25 , Okapi etc)
1 2 3
.. .. d d d q Corpus Query
Model
10 3 1
.. .. d d d
1 2 3
.. .. d d d q Corpus Query
1 2 3
( , ) ( , ) ( , ) .. .. q d q d q d
Model
10 3 1
.. .. d d d
7 7/20/2010
Given a new instance, rank the categories .. How do we define a Combined Feature representation ?
1 2 3 .. Cats Doc d m
Model 5 1 2 .. ..
( ,1) ( ,2) ( ,3) .. ( , ) vec d vec d vec d vec d m
Model
1 2 3 .. Cats Doc d m
5 1 2 .. ..
8 7/20/2010
Define feature representation of the pair
(instance, category) as follows
Distance to category centroid also appended Concatenated L1, L2 and cosine similarity
distances
1 2
( , ) [ ( ( ...., ( ]
i NN i c NN i c kNN i c c
vec x c Dist x ,D ),Dist x ,D ), Dist x ,D ) D Instances that belong to category 'c'
9 7/20/2010
Pictorially (using only L2 links) Thicker lines denotes links to the centroid Thinner lines denotes links to the category
neighborhood
10 7/20/2010
In short,
- Represent the relation between each instance and
category using
- Substantially reduced model parameters
compared to Rank-SVM for MLC.
- Allow to use any learning to rank algorithm for IR
to rank the categories
- In our experience, we used SVM-MAP as the
learning to rank method.
( , )
i
vec x c
11 7/20/2010
Introduction Motivation Proposed approach –
- Ranking
- Thresholding
Experiments
12 7/20/2010
Supervised learning of instance-specific threshold (Elisseff
and Weston, NIPS 2002)
1) 2) 3) 4)
1 2
1...
[ ,
,... ]
m LETOR i i i i
i n
x s s s
1 2
[
Learn : , ,... ]
T m i
w w s s s t
1 2
: [ , ,... ]
T m test test test test
Predict Threshold t w s s s
Ranklist of category scores Threshold for a ranklist is the
- ne that minimizes the sum of
FP and FN
13 7/20/2010 1 2 1 1 1 1 1 2 2 2 2 2 1 2
[ [ [
( , ,... ], ) ( , ,... ], ) ::: ( , ,... ], )
m m m n n n n
s s s t s s s t s s s t
Introduction Motivation Proposed approach –
- Ranking
- Thresholding
Experiments
14 7/20/2010
Dataset #Training #Testing #Categories #Avg-label per instance #Features
Emotions 391 202 6 1.87 72 Scene 1211 1196 6 1.07 294 Yeast 1500 917 14 4.24 103 Citeseer 5303 1326 17 1.26 14601 Reuters- 21578 7770 3019 90 1.23 18637
15 7/20/2010
SVM-MAP-MLC
- Our proposed approach
ML-kNN (Zhang and Zhou , Pattern Recognition 2007) IBLR-ML (Cheng and Hüllermeier, Machine Learning 2009) Rank-SVM (Elisseff and Weston, NIPS 2002) Standard One vs Rest SVM
16 7/20/2010
Average Precision
- Standard metric in IR
- For a ranklist, measures the precision at each relevant category
and averages them.
RankingLoss
- Measures the average number of inversions between the
relevant and irrelevant categories in the ranklist
Micro-F1 & Macro-F1
- F1 is the harmonic mean of precision and recall.
- Micro-averaging gives equal importance to each document.
- Macro-averaging gives equal importance to each category.
17 7/20/2010
0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1
1-Rankloss performance
SVM-MAP- MLC ML-kNN Rank-SVM Binary-SVM IBLR 0.7 0.75 0.8 0.85 0.9 0.95 1
MAP performance
SVM-MAP-MLC ML-kNN Rank-SVM Binary-SVM IBLR
18 7/20/2010
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9
Micro-F1 performance
SVM-MAP-MLC ML-kNN Rank-SVM Binary-SVM IBLR 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Macro-F1 performance
SVM-MAP- MLC ML-kNN Rank-SVM Binary-SVM IBLR
19 7/20/2010
Meta-level features to represent the
relationship between instances and categories
Merging learning to rank and multilabel
classification using the Meta-level features.
Improve the state-of-the-art for multilabel
classification
20 7/20/2010
Different kinds of meta-level features Different Learning to rank methods Optimize different metrics other than MAP.
21 7/20/2010
THANKS !
22 7/20/2010
A Typical scenario in text categorization Support vector machine, logistic regression or boosting learn
‘m’ weight vectors each of length |vocabulary|, a total of m*|vocabulary| parameters. Is this good or bad ?
Classifie r
Wall Street Market Crime . . Bag of Words
23 7/20/2010
Words are fairly discriminative Current methods build a predictor based on weighting
different words
Disadvantages
- Too many words
- Does not allow us to have a firm control over how each
instance is related to a particular category.
24 7/20/2010
Effect of Different feature-sets
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Emotions Yeast Scene Citeseer Reuters-21578
ALL L2 L1 Cos
25 7/20/2010
7/20/2010 26
Rank-svm for IR Rank-svm for MLC
7/20/2010 27
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SVM-MAP MLKNN RANKSVM-MLC SVM IBLR-ML
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SVM-MAP MLKNN RANKSVM-MLC SVM IBLR-ML
7/20/2010 28