Selective Sampling for Information Extraction with a Committee of Classifiers
Evaluating Machine Learning for Information Extraction, Track 2
Ben Hachey, Markus Becker, Claire Grover & Ewan Klein University of Edinburgh
Selective Sampling for Information Extraction with a Committee of - - PowerPoint PPT Presentation
Selective Sampling for Information Extraction with a Committee of Classifiers Evaluating Machine Learning for Information Extraction, Track 2 Ben Hachey, Markus Becker, Claire Grover & Ewan Klein University of Edinburgh Overview
Ben Hachey, Markus Becker, Claire Grover & Ewan Klein University of Edinburgh
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 2
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 3
Usefulness ≈ uncertainty of single learner
– Confidence: Label examples for which classifier is the least confident – Entropy: Label examples for which output distribution from classifier has highest entropy
Usefulness ≈ disagreement of committee of learners
– Vote entropy: disagreement between winners – KL-divergence: distance between class output distributions – F-score: distance between tag structures
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 4
– Bagging or randomly perturbing event counts, random feature subspaces (Abe and Mamitsuka, 1998; Argamon-Engelson
and Dagan, 1999; Chawla 2005)
– Hand-crafted feature split (Osborne & Baldridge, 2004)
(Klein et al., 2003; Finkel et al., 2005)
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 5
Document Position Position NEi-1 + shapei Prev NE + shape Prev NE NEi-1 + shapei-1 + shapei NEi-2 + NEi-1 + shapei-2 + shapei-1 + shapei NEi-1 + shapei+1 NEi-1 + wi NEi-3 + NEi-2 + NEi-1 Prev NE + Word NEi-1, NEi-2 + NEi-1 shapei + shapei-1 + shapei+1 shapei + shapei+1 shapei, shapei-1, shapei+1 Prev NE Word Shape NEi-1, NEi-2 + NEi-1 Disjunction of 5 prev words Capture multiple references to NEs Occurrence Patterns NEi-2+ NEi-1 + POSi-2 + POSi-1 + POSi NEi-1 + POSi-1 + POSi Prev NE + POS Disjunction of 5 next words POSi, POSi-1, POSi+1 wi, wi-1, wi+1 Word Features TnT POS tags Feature Set 2 Feature Set 1
Parts-of-speech, Occurrence patterns of proper nouns Words, Word shapes, Document position
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 6
X x
x q x p x p q p D ) ( ) ( log ) ( ) || (
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 7
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 8
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 9
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 10
– Maximum per-token KL-divergence
(Ngai & Yarowsky, 2000) – Structural comparison between analyses – Pairwise f-score between phrase assignments:
2 1
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 11
(Kim et al., 2003)
DNA, RNA, cell line, cell type, protein
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 12
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 13
Cost: 10.0/19.3/26.7 Error: 1.6/4.9/4.9
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 14
Cost: 14.5/23.5/16.8 Error: 1.8/4.9/2.6
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 15
Cost: 28.7/12.1/11.4 Error: 5.3/2.4/1.9
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 16
sentence length
Expect shorter per sent annotation times.
Expect more positive examples for learning. 3.3 (0.2) 3.3 (0.2) 2.2 (0.7) 2.8 (0.1) Entities 12.2 % 27.1 (1.8) AveKL 10.7 % 30.9 (1.5) MaxKL 8.5 % 25.8 (2.4) F-comp 10.5 % 26.7 (0.8) Random Ent/Tok Tokens Metric
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 17
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 18
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 19
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 20
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 21
0.07 0.18 0.09 cfhm 0.06 0.01 0.01 wscdt 0.07 0.01 0.01 wsndt 0.1 0.03 wssdt 0.13 0.07 wsdt 0.13 0.03 0.06 cfac 0.2 0.15 wslo 0.03 0.22 0.08 wsac 0.21 0.01 0.08 cfnm 0.64 0.34 wsnm 0.11 0.9 0.33 wshm 0.03 0.01 0.02 0.03 0.05 0.05 0.04 0.06 0.11 0.34 94.88 O cfhm wscdt wsndt wssdt wsdt cfac wslo wsac cfnm wsnm wshm O
selective
0.09 0.16 0.09 cfhm 0.06 0.01 wscdt 0.07 0.01 0.01 sndt 0.1 0.03 wssdt 0.13 0.07 wsdt 0.15 0.03 0.05 cfac 0.19 0.16 wslo 0.04 0.19 0.1 wsac 0.2 0.01 0.09 cfnm 0.64 0.34 wsnm 0.14 0.86 0.35 wshm 0.03 0.01 0.01 0.02 0.04 0.05 0.04 0.04 0.07 0.14 0.37 94.82 O cfhm wscdt wsndt wssdt wsdt cfac wslo wsac cfnm wsnm wshm O
random
0.07 0.18 0.09 cfhm 0.06 0.01 0.01 wscdt 0.07 0.01 0.01 wsndt 0.1 0.03 wssdt 0.13 0.07 wsdt 0.13 0.03 0.06 cfac 0.2 0.15 wslo 0.03 0.22 0.08 wsac 0.21 0.01 0.08 cfnm 0.64 0.34 wsnm 0.11 0.9 0.33 wshm 0.03 0.01 0.02 0.03 0.05 0.05 0.04 0.06 0.11 0.34 94.88 O cfhm wscdt wsndt wssdt wsdt cfac wslo wsac cfnm wsnm wshm O
selective
0.09 0.16 0.09 cfhm 0.06 0.01 wscdt 0.07 0.01 0.01 sndt 0.1 0.03 wssdt 0.13 0.07 wsdt 0.15 0.03 0.05 cfac 0.19 0.16 wslo 0.04 0.19 0.1 wsac 0.2 0.01 0.09 cfnm 0.64 0.34 wsnm 0.14 0.86 0.35 wshm 0.03 0.01 0.01 0.02 0.04 0.05 0.04 0.04 0.07 0.14 0.37 94.82 O cfhm wscdt wsndt wssdt wsdt cfac wslo wsac cfnm wsnm wshm O
random
0.07 0.18 0.09 cfhm 0.06 0.01 0.01 wscdt 0.07 0.01 0.01 wsndt 0.1 0.03 wssdt 0.13 0.07 wsdt 0.13 0.03 0.06 cfac 0.2 0.15 wslo 0.03 0.22 0.08 wsac 0.21 0.01 0.08 cfnm 0.64 0.34 wsnm 0.11 0.9 0.33 wshm 0.03 0.01 0.02 0.03 0.05 0.05 0.04 0.06 0.11 0.34 94.88 O cfhm wscdt wsndt wssdt wsdt cfac wslo wsac cfnm wsnm wshm O
selective
0.09 0.16 0.09 cfhm 0.06 0.01 wscdt 0.07 0.01 0.01 sndt 0.1 0.03 wssdt 0.13 0.07 wsdt 0.15 0.03 0.05 cfac 0.19 0.16 wslo 0.04 0.19 0.1 wsac 0.2 0.01 0.09 cfnm 0.64 0.34 wsnm 0.14 0.86 0.35 wshm 0.03 0.01 0.01 0.02 0.04 0.05 0.04 0.04 0.07 0.14 0.37 94.82 O cfhm wscdt wsndt wssdt wsdt cfac wslo wsac cfnm wsnm wshm O
random
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 25
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 26
AL for IE with a Committee of Classifiers:
disagreement amongst MEMM classifiers
– Classification framework: simplification of IE task
Suggestions:
estimates
– Comparison of methods will benefit from real cost information…
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 27
Bea Alex, Markus Becker, Shipra Dingare, Rachel Dowsett, Claire Grover, Ben Hachey, Olivia Johnson, Ewan Klein, Yuval Krymolowski, Jochen Leidner, Bob Mann, Malvina Nissim, Bonnie Webber Chris Cox, Jenny Finkel, Chris Manning, Huy Nguyen, Jamie Nicolson Stanford: Edinburgh:
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 29
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 30
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 31
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 32
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 33
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 34
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 35
0.975/0.970 0.974/0.970 1 0.977/0.972 0.977/0.971 4 8 0.979/0.975 0.978/0.973 Selective Random Round
13/04/2005 Selective Sampling for IE with a Committee of Classifiers 36
8 4 1 Round 0.663/0.684/0.690 0.619/0.643/0.643 0.568/0.594/0.593 Selective 0.004/0.001/0.018 0.564/0.593/0.588
0.623/0.648/0.647 0.015/0.015/0.013 0.648/0.669/0.676 ∆ Random