On the benefits of output sparsity for multi-label classification
Evgenii Chzhen http://echzhen.com Université Paris-Est, Télécom Paristech Joint work with: Christoph Denis, Mohamed Hebiri, Joseph Salmon
1 / 13
On the benefits of output sparsity for multi-label classification - - PowerPoint PPT Presentation
On the benefits of output sparsity for multi-label classification Evgenii Chzhen http://echzhen.com Universit Paris-Est, Tlcom Paristech Joint work with: Christoph Denis, Mohamed Hebiri, Joseph Salmon 1 / 13 Outline Introduction
1 / 13
2 / 13
3 / 13
§ Observations: Xi P RD, § Label vectors = binary vectors: Yi “ pY 1 i , . . . , Y L i qJ P t0, 1uL, § N, L, D - huge and probably N L, § Yi consists of at most K ones (active labels) and K ! L.
4 / 13
5 / 13
6 / 13
10
90
10
5
85
5
5
90
§ Same amount of mistakes but of different type § Which one is better for a user?
6 / 13
L
l“1
Y lu “
Y l“0
Y l“1u `
Y l“1
Y l“0u § For Hamming loss ˆ
§ Hamming loss does not know anything about sparsity K, § But Hamming is separable, hence easy to optimize.
6 / 13
7 / 13
Y l“0
Y l“1u ` p1
Y l“1
Y l“0u ,
8 / 13
Y l“0
Y l“1u ` p1
Y l“1
Y l“0u ,
§ Hamming loss: p0 “ p1 “ 0.5 § [Jain et al., 2016] : p0 “ 0 and p1 “ 1 § Our choice: p0 “ 2K L and p1 “ 1 ´ p0
8 / 13
§ Y “ p1, . . . , 1
K
L´K
§ ˆ
§ ˆ
§ ˆ
2K
L´2K
§ Do not forget that K ! L
9 / 13
§ Y “ p1, . . . , 1
K
L´K
§ ˆ
§ ˆ
§ ˆ
2K
L´2K
§ Do not forget that K ! L
§ ˆ
§ ˆ
9 / 13
§ Y “ p1, . . . , 1
K
L´K
§ ˆ
§ ˆ
§ ˆ
2K
L´2K
§ Do not forget that K ! L
§ ˆ
§ ˆ
9 / 13
§ Y “ p1, . . . , 1
K
L´K
§ ˆ
§ ˆ
§ ˆ
2K
L´2K
§ Do not forget that K ! L
§ ˆ
§ ˆ
9 / 13
10 / 13
§ When K ! L we output MORE active labels, § Hence, better Recall and worse Precision, § When K ą 10 our setting are violated.
11 / 13
§ For sparse datasets: errors of 0/1-type are not the same for a
§ Use our framework if you agree with the previous idea; § We do not introduce a new algorithm per se, but we construct
§ We provide a theoretical justification to our framework
12 / 13
13 / 13