Portfolio Theory of Information Retrieval
Jun Wang and Jianhan Zhu
jun.wang@cs.ucl.ac.uk
Department of Computer Science University College London, UK
Portfolio Theory of Information Retrieval – p. 1/22
Portfolio Theory of Information Retrieval Jun Wang and Jianhan Zhu - - PowerPoint PPT Presentation
Portfolio Theory of Information Retrieval Jun Wang and Jianhan Zhu jun.wang@cs.ucl.ac.uk Department of Computer Science University College London, UK Portfolio Theory of Information Retrieval p. 1/22 Outline Research Problem An Analogy:
Jun Wang and Jianhan Zhu
jun.wang@cs.ucl.ac.uk
Department of Computer Science University College London, UK
Portfolio Theory of Information Retrieval – p. 1/22
Portfolio Theory of Information Retrieval – p. 2/22
Portfolio Theory of Information Retrieval – p. 3/22
0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 Document A Document B
Uncertainty of the relevance scores Correlations of relevance scores
Portfolio Theory of Information Retrieval – p. 4/22
UserClass d1: Apple_Comp. d2: Apple_Comp. d3: Apple_Fruit Apple Computers 1 1 Apple Fruit 1 p(r) 2/3 2/3 1/3
Portfolio Theory of Information Retrieval – p. 5/22
Portfolio Theory of Information Retrieval – p. 6/22
Portfolio Theory of Information Retrieval – p. 7/22
Portfolio Theory of Information Retrieval – p. 8/22
4 6 8 10 12 14 0.5 1 1.5 2 2.5 3 Standard Deviation Return Efficient Frontier Google Coca− Cola 20 40 60 80 100 120 0.2 0.4 0.6 0.8 1 α (Risk Preference) Portfolio Percentage Google Coca− Cola
Efficient Frontier Percentage in the Portfolio
Portfolio Theory of Information Retrieval – p. 9/22
n
i=1 wi = 1, differentiates the
Portfolio Theory of Information Retrieval – p. 10/22
Portfolio Theory of Information Retrieval – p. 11/22
n
n
n
Portfolio Theory of Information Retrieval – p. 12/22
Portfolio Theory of Information Retrieval – p. 13/22
0.05 0.1 0.15 0.2 0.25 2 2.5 3 3.5 4 4.5 5 5.5 6 Variance Expected Relevance
Portfolio Theory of Information Retrieval – p. 14/22
0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.4 0.5 0.6 0.7 0.8 0.9 1 Relevance Standard Deviation ρ = −1 ρ = −0.5 ρ = 0 ρ = 1
Portfolio Theory of Information Retrieval – p. 15/22
Portfolio Theory of Information Retrieval – p. 16/22
(a) Mean Reciprocal Rank (MRR) (b) Mean Average Precision (MAP)
Portfolio Theory of Information Retrieval – p. 17/22
Portfolio Theory of Information Retrieval – p. 18/22
Measures CSIRO WT10g Robust Robust hard TREC8 MRR 0.869 0.558 0.592 0.393 0.589 0.843 0.492 0.549 0.352 0.472 +3.08% +13.41%* +7.83%* +11.65%* +24.79%* MAP 0.41 0.182 0.204 0.084 0.212 0.347 0.157 0.185 0.078 0.198 +18.16%*+15.92%*+10.27%* +7.69%* +7.07%* NDCG 0.633 0.433 0.421 0.271 0.452 0.587 0.398 0.396 0.252 0.422 +7.88%* +8.82%* +6.25%* +7.55%* +7.05%* NDCG@10 0.185 0.157 0.175 0.081 0.149 0.170 0.141 0.169 0.078 0.140 +8.96%* +11.23%* +3.80% +3.90% +6.36%* NDCG@100 0.377 0.286 0.314 0.169 0.305 0.355 0.262 0.292 0.159 0.287 +6.25%* +9.27%* +7.55%* +6.58%* +6.34%*
Portfolio Theory of Information Retrieval – p. 19/22
In each cell, the first line shows the performance of our approach, and the second line shows the performance of the MMR method and gain of our method over the MMR method. Models Dirichlet Jelinek-Mercer BM25 sub-MRR 0.014 0.011 0.009 0.012 (+16.67%*)0.009 (+22.22%*)0.007 (+28.57%*) sub-Recall@5 0.324 0.255 0.275 0.304 (+6.58%*) 0.234 (+8.97%*) 0.27 (+1.85%) sub-Recall@10 0.381 0.366 0.352 0.362 (+5.25%) 0.351 (+4.27%) 0.344 (+2.33%) sub-Recall@20 0.472 0.458 0.464 0.455 (+3.74%) 0.41 (+11.71%*) 0.446 (+4.04%) sub-Recall@100 0.563 0.582 0.577 0.558 (+0.90%) 0.55 (+5.82%*) 0.558 (+3.41%)
Portfolio Theory of Information Retrieval – p. 20/22
Portfolio Theory of Information Retrieval – p. 21/22
[Cooper(1971)] William S. Cooper. The inadequacy of probability of usefulness as a ranking criterion for retrieval system output. University of California, Berkeley, 1971. [Robertson(1977)] S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, pages 294–304, 1977. [Gordon and Lenk(1991)] Michael D. Gordon and Peter Lenk. A utility theoretic examination
[Maron and Kuhns(1960)] M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. J. ACM, 7(3), 1960. [Stirling(1977)] Keith H. Stirling. The Effect of Document Ranking on Retrieval System Performance: A Search for an Optimal Ranking Rule. PhD thesis, UC, Berkeley, 1977. [Carbonell and Goldstein(1998)] Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. [Chen and Karger(2006)] Harr Chen and David R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, 2006. [Markowitz(1952)] H Markowitz. Portfolio selection. Journal of Finance, 1952. [Järvelin and Kekäläinen(2002)] Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002.
Portfolio Theory of Information Retrieval – p. 22/22