Pattern-based Solutions to Limitations of Leading Word Embeddings
Roy Schwartz
University of Washington NLP Seminar, February 8th, 2016 Joint work with Roi Reichart and Ari Rappoport
Pattern-based Solutions to Limitations of Leading Word Embeddings - - PowerPoint PPT Presentation
Pattern-based Solutions to Limitations of Leading Word Embeddings Roy Schwartz University of Washington NLP Seminar, February 8 th , 2016 Joint work with Roi Reichart and Ari Rappoport Background Word embeddings are great! Problem
University of Washington NLP Seminar, February 8th, 2016 Joint work with Roi Reichart and Ari Rappoport
(Schwartz, Reichart & Rappoport, in review)
(Schwartz, Reichart & Rappoport, CoNLL 2015)
(Rubinstein, Levi, Schwartz & Rappoport, ACL 2015)
Acquired Symmetric Patterns (Schwartz, Reichart & Rappoport, COLING 2014)
EMNLP 2013)
COLING 2012)
Parsing Evaluation (Schwartz, Abend, Reichart & Rappoport, ACL 2011)
3 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
– Words that occur in similar contexts are likely to have similar meanings
4 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
– Words that occur in similar contexts are likely to have similar meanings
4 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
– Without taking into account order or directionality
4 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
friend is good a Mary
John
– Without taking into account order or directionality
John is a good friend of Mary
4 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
5 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
– Sentiment Analysis (Maas et al., ACL 2011, Socher et al., EMNLP 2013) – Parsing (Socher et al, EMNLP 2012; Lazaridou et al., EMNLP 2013)
5 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
–
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
6
– cup/coffee vs. cup/glass – dog/leash vs. dog/cat – car/wheel vs. car/train
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
6
– –
(Schwartz et al., CoNLL 2015)
– good/great vs. good/bad – big/large vs. big/small
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
6
– –
–
– dog/animal, flu/fever
7
– cat, dog and elephant belong to the same class (animals)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
7
– bananas, the sun and school buses share the same color (yellow)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
7
– bananas, the sun and school buses share the same color (yellow)
word2vec GloVe DM dep. w2v
Classification F1-Score
Word Embedding Model
– Significantly less than nouns – Very few verb datasets
8 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
–
similarity, as compared to noun similarity (Schwartz et al., CoNLL 2015; Schwartz et al., in review)
8 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
–
similarity, as compared to noun similarity (Schwartz et al., CoNLL 2015; Schwartz et al., in review)
8 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Verbs Nouns Model 0.163 0.377 GloVe (Pennington et al., 2014) 0.307 0.501 word2vec skip-gram (Mikolov et al., 2013)
Similarity, dissimilarity, hyponymy, entailment …
Bananas and school buses are yellow, elephants and mountains are large
9 Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
– “X and Y” – “X is a Y” – “wow, what a great X!”
10
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
– Used “X such as Y” to detect hyponyms (“animals such as dogs”) – This method is still considered one of the most efficient ways of extracting hyponyms
10
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 11
grained relations:
– Antonyms (opposite meaning, Lin et al., 2003) – General verb relations (happens-before, stronger-than, Chklovski and Pantel, 2004)
from different domains
– Entertainment: stars-in-film (Etzioni et al., Artificial Intelligence 2005) – Geography: capital-of, river-in (Davidov, Rappoport & Koppel, ACL 2007) – Technology: accessory-of (Davidov & Rappoport, ACL 2008)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 11
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
12
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
12
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
12
semantic role
– John and Mary went to school – Is it better to walk or run? – Jane is smart as well as funny
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
12
– Lexical acquisition (Widdows & Dorow, COLING 2002), – Semantic clustering (Davidov & Rappoport, ACL 2006) – Construction of connotative lexicon (Feng et al., ACL 2013) – Minimally supervised word classification (Schwartz et al., COLING 2014)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
13
–
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
13
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
14
– Related words are not necessarily similar (cow/milk) – Word embeddings (based on bag-of-words context) fail to make this distinction
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
14
#instances Example Type Symmetric Patterns Bag-of-words 145 2418 (car,train) similar 1857 6324 (coffee,tea) 2090 3645 (dog,cat) 3 333 (car,wheel) related 6 7247 (coffee,cup) 4 2837 (dog,walking)
–
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
14
#instances Example Type Symmetric Patterns Bag-of-words 145 2418 (car,train) similar 1857 6324 (coffee,tea) 2090 3645 (dog,cat) 3 333 (car,wheel) related 6 7247 (coffee,cup) 4 2837 (dog,walking)
–
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
14
#instances Example Type Symmetric Patterns Bag-of-words 145 2418 (car,train) similar 1857 6324 (coffee,tea) 2090 3645 (dog,cat) 3 333 (car,wheel) related 6 7247 (coffee,cup) 4 2837 (dog,walking)
–
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Schwartz, Reichart and Rappoport, CoNLL 2015
. . . count(dog,wi) . . .
dog
15
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Schwartz, Reichart and Rappoport, CoNLL 2015
. . . count(dog,wi) . . .
dog
. . . symmatric-pattern_count(dog,wi) . . .
SP dog
15
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Schwartz, Reichart and Rappoport, CoNLL 2015
. . . count(dog,wi) . . .
dog
. . . symmatric-pattern_count(dog,wi) . . .
SP dog
15
. . . symmatric-pattern_count(dog,cat) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
16
small/zero
. . . count(dog,cat) . . .
dog
SP dog
positive
. . . symmatric-pattern_count(dog,cat) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
16
small/zero
. . . count(dog,cat) . . .
dog
SP dog
positive
. . . symmatric-pattern_count(dog,leash) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
17
positive small/zero
. . . count(dog,leash) . . .
dog
SP dog
. . . symmatric-pattern_count(dog,leash) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
17
positive small/zero
. . . count(dog,leash) . . .
dog
SP dog
. . . symmatric-pattern_count(dog,leash) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
17
positive small/zero
. . . count(dog,leash) . . .
dog
SP dog
– Word embeddings fail to distinguish between similar and opposite pairs of words (good/great vs. good/bad)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 18
– Antonym patterns = { “either X or Y”, “from X to Y” } – either big or small, from poverty to richness
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 18
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 18
#instances Example Type Antonym Patterns Symmetric Patterns Bag-of-words 1208 (bad,dream) related 114 561 (bad,evil) similar 80 806 23532 (bad,good)
antonym pairs
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 19
β is tuned using a development set
AP SP SP w w w
AP w
SP w
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 19
. . . symmatric-pattern_count(bad,dream) . . . . . . antonym-pattern_count(bad,dream) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
20
small/zero
SP bad
positive
. . . symmatric-pattern_count(bad,dream) . . . . . . antonym-pattern_count(bad,dream) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
20
small/zero
SP bad
positive
. . . symmatric-pattern_count(bad,evil) . . . . . . antonym-pattern_count(bad,evil) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
21
small/zero
SP bad
positive
. . . symmatric-pattern_count(bad,evil) . . . . . . antonym-pattern_count(bad,evil) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
21
small/zero
SP bad
positive
. . . symmatric-pattern_count(bad,good) . . . . . . antonym-pattern_count(bad,good) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
22
small/zero
SP bad
positive
. . . symmatric-pattern_count(bad,good) . . . . . . antonym-pattern_count(bad,good) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
22
small/zero
SP bad
positive
. . . symmatric-pattern_count(bad,good) . . . . . . antonym-pattern_count(bad,good) . . .
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
22
small/zero
SP bad
positive
– Set of symmetric pattern types is extracted from plain text using the (Davidov & Rappoport, 2006) algorithm – Positive Point-wise Mutual Information (PPMI) normalization – Personalized Page-rank like smoothing
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
23
– SimLex999 dataset (Hill et al., 2014) – Compute a ranking based on the SP+ model’s prediction of the degree
– Compare this ranking to the one generated by human judgments
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
23
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Spearman’s ρ Model 0.35 GloVe (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 word2vec Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al., 2013) 0.517 SP+ (Schwartz et al., 2015) 0.563 Joint
24
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Spearman’s ρ Model 0.35 GloVe (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 word2vec Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al., 2013) 0.517 SP+ (Schwartz et al., 2015) 0.563 Joint
24
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Spearman’s ρ Model 0.35 GloVe (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 word2vec Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al., 2013) 0.517 SP+ (Schwartz et al., 2015) 0.563 Joint
int j i gram skip j i SP j i jo
24
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
Spearman’s ρ Model 0.35 GloVe (Pennington et al., 2014) 0.423 PPMI-Bag-of-words 0.43 word2vec CBOW (Mikolov et al,. 2013) 0.436 word2vec Dep (Levy and Goldberg, 2014) 0.455 NNSE (Murphy et al., 2012) 0.462 word2vec skip-gram (Mikolov et al., 2013) 0.517 SP+ (Schwartz et al., 2015) 0.563 Joint
int j i gram skip j i SP j i jo
24
Verbs Nouns Adjective Model 0.163 0.377 0.571 GloVe (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 word2vec Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al., 2013) 0.578 0.497 0.663 SP+ (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
25
Verbs Nouns Adjective Model 0.163 0.377 0.571 GloVe (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 word2vec Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al., 2013) 0.578 0.497 0.663 SP+ (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
25
Verbs Nouns Adjective Model 0.163 0.377 0.571 GloVe (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 word2vec Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al., 2013) 0.578 0.497 0.663 SP+ (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
25
Verbs Nouns Adjective Model 0.163 0.377 0.571 GloVe (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 word2vec Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al., 2013) 0.578 0.497 0.663 SP+ (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
25
Verbs Nouns Adjective Model 0.163 0.377 0.571 GloVe (Pennington et al., 2014) 0.276 0.451 0.548 PPMI-Bag-of-words 0.252 0.48 0.579 word2vec CBOW (Mikolov et al,. 2013) 0.376 0.449 0.54 word2vec Dep (Levy and Goldberg, 2014) 0.318 0.487 0.594 NNSE (Murphy et al., 2012) 0.307 0.501 0.604 word2vec skip-gram (Mikolov et al., 2013) 0.578 0.497 0.663 SP+ (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
25
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
general word embeddings
– They capture similarity rather than relatedness – They distinguish between similar and opposite pairs – They capture verb similarity
– 5.5% improvement over six leading models – 10% improvement with a joint model – 20% improvement on verbs
26
Constraints (Liu et al.)
(Pham et al.)
Lexemes (Rothe and Schutze)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
ACL 2015 Papers
27
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
ACL 2015 Papers
27
similarity scores are particularly low
performance of this model
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 29
Verbs Nouns Model 0.307 0.501 word2vec skip-gram (Mikolov et al., 2013) 0.578 0.497 SP+ (Schwartz et al., 2015)
time with a different type of context
– Bag-of-words contexts (Mikolov et al., 2013) – Dependency contexts (Levy & Goldberg, 2014) – Symmetric pattern contexts (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 30
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 31
Spearman’s ρ Context Type Model
0.307 Bag-of-Words skip-gram 0.386 Dependency Links 0.459 Symmetric Patterns
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 32
Train Time (Mins) #Contexts Verbs Context Type Model
320 13000M 0.307 Bag-of-Words skip-gram 551 14500M 0.386 Dependency Links 11 270M 0.459 Symmetric Patterns
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 33
Verbs Context Type Model
0.307 Bag-of-Words skip-gram 0.386 Dependency Links 0.459 Symmetric Patterns 0.578 Symmetric Patterns SP+ (Schwartz et al., 2015) 0.441 Symmetric Patterns SP-NW (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 33
Verbs Context Type Model
0.307 Bag-of-Words skip-gram 0.386 Dependency Links 0.459 Symmetric Patterns 0.578 Symmetric Patterns SP+ (Schwartz et al., 2015) 0.441 Symmetric Patterns SP-NW (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 33
Verbs Context Type Model
0.307 Bag-of-Words skip-gram 0.386 Dependency Links 0.459 Symmetric Patterns 0.578 Symmetric Patterns SP+ (Schwartz et al., 2015) 0.441 Symmetric Patterns SP-NW (Schwartz et al., 2015)
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 33
Verbs Context Type Model
0.307 Bag-of-Words skip-gram 0.386 Dependency Links 0.459 Symmetric Patterns 0.578 Symmetric Patterns SP+ (Schwartz et al., 2015) 0.441 Symmetric Patterns SP-NW (Schwartz et al., 2015)
+~15% +~15%
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 33
Verbs Context Type Model
0.307 Bag-of-Words skip-gram 0.386 Dependency Links 0.459 Symmetric Patterns 0.578 Symmetric Patterns SP+ (Schwartz et al., 2015) 0.441 Symmetric Patterns SP-NW (Schwartz et al., 2015)
+~15% +~15% +~27%
embeddings
– 15-27% improvement on a verb similarity task
– Training with pattern contexts takes ~2-3% of the training time with
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 34
– now or never > *never or now
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 35
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 36
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 36
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 37
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz
– Text summarization – Text search – News feed – Dialogue systems – Essay scoring – Detection of sarcasm/humor – …
38
Pattern-based Solutions to Limitations of Leading Word Embeddings @ Roy Schwartz 39