What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties
Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni
Facebook AI Research Université Le Mans (LIUM) ACL 2018
1
What you can cram into a single $&!#* vector: Probing sentence - - PowerPoint PPT Presentation
What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties Alexis Conneau, German Kruszewski, Guillaume Lample, Loc Barrault, Marco Baroni Facebook AI Research Universit Le Mans (LIUM)
Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni
Facebook AI Research Université Le Mans (LIUM) ACL 2018
1
2
*Courtesy: Thomas Wolf blogpost, Hugging Face
3
Professor Raymond J. Mooney
4
5
6
7
Shi et al. (EMNLP 2016) – Does string-based neural MT learn source syntax? Adi et al. (ICLR 2017) – Fine-grained analysis of sentence embeddings using auxiliary prediction tasks
8
Probing task Sentence Encoder
9
Probing task Sentence Encoder
10
Probing task Sentence Encoder
11
MLP classifier input
Surface information
12
MLP classifier input
Adi et al. (ICLR 2017) – Fine-grained analysis of sentence embeddings using auxiliary prediction tasks
Surface information
13
MLP classifier
input Shi et al. (EMNLP 2016) – Does string-based neural MT learn source syntax?
Syntactic information
14
MLP classifier
input
Syntactic information
15
16
MLP classifier
input
Semantic information
17
18
19
Source and target examples for seq2seq training tasks
Sutskever et al. (NIPS 2014) – Sequence to sequence learning with neural networks Kiros et al. (NIPS 2015) – SkipThought vectors Vinyals et al. (NIPS 2015) – Grammar as a Foreign Language
20
ACCURACY
25 50 75 100
SentLen WC TopConst BShift ObjNum
50 50 5 1 20 79.8 50.8 68.1 91.6 66.6 65.4 63.8 53 95 23 87 98 84 100 100
NB-uni-tfidf NB-bi-tfidf CBOW Majority vote
21
Accuracy
25 50 75 100
SentLen WC TopConst BShift ObjNum
71.3 54.5 70.5 47.3 75.9 77.1 60.1 75.4 35.9 68.1 94.7 78.6 89.4 14 94 85.3 58.8 81.3 52.6 82.4 82.1 62 78.2 23.3 99.3 79.8 50.8 68.1 91.6 66.6
CBOW AutoEncoder NMT En-Fr NMT En-Fi Seq2Tree SkipThought NLI
22
22.5 45 67.5 90
SentLen WC TopConst BShift ObjNum CoordInv
73.1 86.1 73 78.3 35 87.5 68.7 83.9 62.4 79.7 40.3 83.9 72.6 86.6 72.9 79.2 46.2 81.2
BiLSTM-max BiLSTM-last GatedConvNet
23
24
Correlation between probing and downstream tasks Blue=higher - Red=lower - Grey=not significant
25
26
27
28
MLP classifier