Learning To Rank Academic Experts Catarina Moreira Outline - - PowerPoint PPT Presentation

learning to rank academic experts
SMART_READER_LITE
LIVE PREVIEW

Learning To Rank Academic Experts Catarina Moreira Outline - - PowerPoint PPT Presentation

Instituto Superior Tcnico Universidade Tcnica de Lisboa Learning To Rank Academic Experts Catarina Moreira Outline Introduction State of the Art Problems Features to Estimate Expertise Datasets Approaches and Results


slide-1
SLIDE 1

Learning To Rank Academic Experts

Catarina Moreira Instituto Superior Técnico Universidade Técnica de Lisboa

slide-2
SLIDE 2

Outline

✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work

2/34

slide-3
SLIDE 3

Expert Finding

Information Retrieval

Gerard Salton Ricardo Baeza-Yates Bruce Croft

3/34

slide-4
SLIDE 4

State of the Art Problems

Usage of Generative Probabilistic Models Heuristics are too simple and do not reflect expertise Heuristics only based on the documents’ textual contents

4/34

slide-5
SLIDE 5

Contributions

  • 1. Different Sets of Features to Estimate Expertise
  • 2. Rank Aggregation Framework for Expert Finding
  • 3. Learning to Rank (L2R) Framework for Expert Finding

5/34

slide-6
SLIDE 6

Outline

✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work

6/34

slide-7
SLIDE 7

Features: Hypothesis

Multiple estimators of expertise, based on different sources of evidence, will enable the construction of more accurate and reliable ranking models!

7/34

slide-8
SLIDE 8

Textual Similarity

Term Frequency Inverse Document Frequency BM25 TF.IDF

8/34

slide-9
SLIDE 9

Profile Information

✓ Number of Publications with(out) query topics ✓ Number of Journals with(out) query topics ✓ Years Between Publications with(out) query topics ✓ Average Number of Publications per year

9/34

slide-10
SLIDE 10

Graphs

✓ Total/Max/Avg citations of the authors’ papers ✓ Total Number of Unique Collaborators ✓ Publications’ PageRank ✓ Academic Indexes

10/34

slide-11
SLIDE 11

Hirsch Index

Hirsch ¡Index

11/34

slide-12
SLIDE 12

Other Indexes

a-Index Contemporary h-Index (extension of h Index) Trend h-Index (extension of h Index)

12/34

slide-13
SLIDE 13

Datasets

DBLP - Computer Science Dataset

  • Covers journal and conference publications
  • Contains abstracts and citation links

13/34

  • All this information was processed and stored in a database
slide-14
SLIDE 14

Datasets

Arnetminer - Validation

  • Contains experts for 13 query topics
  • Experts collected from important Program Committees

related to the query topics 14/34

slide-15
SLIDE 15

Outline

✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ DataSets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work

15/34

slide-16
SLIDE 16

Question

How can we combine these features?

16/34

slide-17
SLIDE 17

Answer

Traditional IR techniques use frameworks inspired in traditional search engines to combine different sources of evidence!

17/34

slide-18
SLIDE 18

Rank Aggregation Framework for Expert Finding

18/34

slide-19
SLIDE 19

Data Fusion Algorithms

✓ Positional ✓ Based on the position that a candidate occupies in a ranked list ✓ Algorithms: Borda Fuse and Reciprocal Rank Fuse ✓ Score Aggregation ✓ Based on the score that a candidate achieved in a ranked list ✓ Algorithms: CombSUM, CombMNZ and CombANZ ✓ Majoritarian ✓ Based on pairwise comparisons between candidates ✓ Algorithms: Condorcet Fusion

19/34

slide-20
SLIDE 20

43,82% 48,43% [-10,25%]

  • Cond. Fusion

CombMNZ

Results Rank Aggregation (MAP)*

41,34% [+6,00%] CombSUM

  • Rec. Rank Fuse

39,99% [+9,58%] Borda Fuse 39,99% [+9,58%] 35,61% [+23,06%] CombANZ*

20/34

*Mean Average Precision *Sig. Tests of 0.95 conf.

slide-21
SLIDE 21

Impact of the Features with Condorcet Fusion(MAP)*

Text + Profile + Graph Text* Profile* 43,82% 32,67% [+25,45%] 39,08% [+10,82%] 41,65% [+4,95%] 29,75% [+32,11%] 36,87% [+15,86%] 43,86% [- 0,09%] Graph Text + Graph* Text + Profile* Profile + Graph

21/34

*Mean Average Precision *Sig. Tests of 0.95 conf.

slide-22
SLIDE 22

Outline

✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work

22/34

slide-23
SLIDE 23

Question

How can we combine these features in an optimal way?

23/34

slide-24
SLIDE 24

Answer

IR literature focuses on Machine learning techniques, They enable the combination

  • f multiple estimators in an
  • ptimal way!

24/34

slide-25
SLIDE 25

The L2R Framework For Expert Finding

25

25/34

slide-26
SLIDE 26

L2R Algorithms

✓ Pointwise

✓ Input: single candidate ✓ Goal: use scoring functions to predict relevance ✓ Algorithms: Additive Groves

✓ Pairwise

✓ Input: pair of candidates ✓ Goal: loss function to minimize number of misclassified candidate pairs ✓ Algorithms: RankBoost, SVMrank and RankNet

✓ Listwise

✓ Input: list of candidates ✓ Goal: loss function which directly optimizes an IR metric ✓ Algorithm: SVMmap, Coordinate Ascent and AdaRank

26/34

slide-27
SLIDE 27

Results Learning to Rank (MAP)*

89,40% 87,02% [+2,66%] Additive Groves SVMmap SVMrank 83,11[+7,04%] 75,77 [+15,25%]

  • Coord. Ascent*

78,40 [+12,30%] RankBoost* 65,30% [+26,96%] RankNet* 64,78% [+27,54%] AdaRank*

27/34

*Mean Average Precision *Sig. Tests of 0.95 conf.

slide-28
SLIDE 28

Impact of the Features with Additive Groves(Map)*

Text + Profile + Graph Profile + Graph* 89,40% 87,14% [+2,53%] 88,25% [+1,29%] 82,37% [+7,86%] 86,60% [+3,13%] 87,28% [+2,37%] 85,26% [+4,63%] Text + Graph* Profile Text + Profile* Text Graph Text + Profile + Graph

28/34

*Mean Average Precision *Sig. Tests of 0.95 conf.

slide-29
SLIDE 29

Comparison with State

  • f the Art (MAP)*

Balog’s Model 2 Yang’s SVMrank 89,40% Moreira’s Add. Groves 39,15% [+56,21%] 63,56% [+28,90%] Deng’s AuthorRank 49,06% [+45,12%]

29/34

*Mean Average Precision

slide-30
SLIDE 30

Prototype

slide-31
SLIDE 31

Outline

✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work

31/34

slide-32
SLIDE 32

Conclusions

✓ Effectiveness of the Learning to Rank Framework ✓ Best algorithms: Additive Groves, SVMmap and SVMrank ✓ Effectiveness of the Rank Aggregation Approach ✓ Best algorithms: CombMNZ and Condorcet Fusion ✓ Effectiveness of the Proposed Features ✓ Set of full features are the best

32/34

slide-33
SLIDE 33

Future Work

✓ Feature Selection Techniques (ex: PCA) ✓ Expert Finding in an organizational environment (TREC dataset) ✓ Tasks beyond expert finding ✓ Natural Language Processing ✓ Geographic Information Retrieval

33/34

slide-34
SLIDE 34

Publications

✓ C. Moreira, P

. Calado and B. Martins, Learning to Rank for Expert Search in Digital Libraries of Academic Publications, In proceedings of the 15th portuguese conference on Artificial Intelligence, 2011

✓ C. Moreira, B. Martins and P

. Calado, Using Rank Aggregation for Expert Search in Academic Digital Libraries, In Simpósio de Informática, INFORUM, 2011

✓ C. Moreira, A. Mendes, L. Coheur and B. Martins, Towards the Rapid

Development of a Natural Language Understanding Module, In proceedings

  • f the 11th conference on intelligent virtual agents, 2011

34/34