[PPT] - Mimicking Word Embeddings using Subword RNNs Yuval Pinter, Robert PowerPoint Presentation

SLIDE 1

Mimicking Word Embeddings using Subword RNNs

Yuval Pinter, Robert Guthrie, Jacob Eisenstein

@yuvalpi

Presented at EMNLP September 2017, Copenhagen

SLIDE 2

The Word Embedding Pipeline

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Wikipedia GigaWord Reddit ... 2

SLIDE 3

Embedding model (vectors) W2V GloVe Polyglot FastText ...

The Word Embedding Pipeline

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Wikipedia GigaWord Reddit ... 2

SLIDE 4

Supervised task corpus Supervised task corpus

Penn TreeBank SemEval OntoNotes

Univ. Dependencies

... Embedding model (vectors) W2V GloVe Polyglot FastText ...

The Word Embedding Pipeline

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Wikipedia GigaWord Reddit ... 2

SLIDE 5

Supervised task corpus Supervised task corpus

Penn TreeBank SemEval OntoNotes

Univ. Dependencies

... Embedding model (vectors) W2V GloVe Polyglot FastText ...

The Word Embedding Pipeline

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Wikipedia GigaWord Reddit ... Tagging Parsing Sentiment NER ... Task model 2

SLIDE 6

All possible text Unlabeled text

Assumed Pattern

Supervised task text 3

SLIDE 7

All possible text Unlabeled text

Actual Pattern

Supervised task text 4

SLIDE 8

All possible text Unlabeled text

Actual Pattern

Supervised task text 5

SLIDE 9

All possible text Unlabeled text

Actual Pattern

Supervised task text

No pre-trained vectors

5

SLIDE 10

All possible text Unlabeled text

Actual Pattern

Supervised task text

No pre-trained vectors
Affects supervised tasks

5

SLIDE 11

All possible text Unlabeled text

Actual Pattern

Supervised task text

No pre-trained vectors
Affects supervised tasks
Multiple treatments

suggested

5

SLIDE 12

All possible text Unlabeled text

Actual Pattern

Supervised task text

No pre-trained vectors
Affects supervised tasks
Multiple treatments

suggested

Our method - compositional

subword OOV model

5

SLIDE 13

Sources of OOVs

6

SLIDE 14

Sources of OOVs

Names

Chalabi has increasingly marginalized within Iraq, ...

6

SLIDE 15

Sources of OOVs

Names
Domain-specific jargon

Chalabi has increasingly marginalized within Iraq, ... Important species (...) include shrimp, (...) and some varieties of flatfish.

6

SLIDE 16

Sources of OOVs

Names
Domain-specific jargon
Foreign words

Chalabi has increasingly marginalized within Iraq, ... Important species (...) include shrimp, (...) and some varieties of flatfish. This term was first used in German (Hochrenaissance), …

6

SLIDE 17

Sources of OOVs

Names
Domain-specific jargon
Foreign words
Rare morphological derivations

Chalabi has increasingly marginalized within Iraq, ... Important species (...) include shrimp, (...) and some varieties of flatfish. This term was first used in German (Hochrenaissance), … Without George Martin the Beatles would have been just another untalented band as Oasis.

6

SLIDE 18

Sources of OOVs

Names
Domain-specific jargon
Foreign words
Rare morphological derivations
Nonce words

Chalabi has increasingly marginalized within Iraq, ... Important species (...) include shrimp, (...) and some varieties of flatfish. This term was first used in German (Hochrenaissance), … Without George Martin the Beatles would have been just another untalented band as Oasis. What if Google morphed into GoogleOS?

6

SLIDE 19

Sources of OOVs

Names
Domain-specific jargon
Foreign words
Rare morphological derivations
Nonce words
Nonstandard orthography

Chalabi has increasingly marginalized within Iraq, ... Important species (...) include shrimp, (...) and some varieties of flatfish. This term was first used in German (Hochrenaissance), … Without George Martin the Beatles would have been just another untalented band as Oasis. What if Google morphed into GoogleOS? We’ll have four bands, and Big D is cookin’. lots of fun and great prizes.

6

SLIDE 20

Sources of OOVs

Names
Domain-specific jargon
Foreign words
Rare morphological derivations
Nonce words
Nonstandard orthography
Typos and other errors

Chalabi has increasingly marginalized within Iraq, ... Important species (...) include shrimp, (...) and some varieties of flatfish. This term was first used in German (Hochrenaissance), … Without George Martin the Beatles would have been just another untalented band as Oasis. What if Google morphed into GoogleOS? We’ll have four bands, and Big D is cookin’. lots of fun and great prizes. I dislike this urban society and I want to leave this whole enviroment.

6

SLIDE 21

Sources of OOVs

Names
Domain-specific jargon
Foreign words
Rare morphological derivations
Nonce words
Nonstandard orthography
Typos and other errors
…

Chalabi has increasingly marginalized within Iraq, ... Important species (...) include shrimp, (...) and some varieties of flatfish. This term was first used in German (Hochrenaissance), … Without George Martin the Beatles would have been just another untalented band as Oasis. What if Google morphed into GoogleOS? We’ll have four bands, and Big D is cookin’. lots of fun and great prizes. I dislike this urban society and I want to leave this whole enviroment. ???

6

SLIDE 22

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Common OOV handling techniques

None (random init)

OOV

7

SLIDE 23

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Common OOV handling techniques

None (random init)

OOV

??? 7

SLIDE 24

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Common OOV handling techniques

None (random init)
One UNK to rule them all

○ Average existing embeddings ○ Trained with embeddings (stochastic unking)

OOV UNK

8

SLIDE 25

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Common OOV handling techniques

None (random init)
One UNK to rule them all

○ Average existing embeddings ○ Trained with embeddings (stochastic unking)

OOV UNK

8

SLIDE 26

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Common OOV handling techniques

None (random init)
One UNK to rule them all

○ Average existing embeddings ○ Trained with embeddings (stochastic unking)

OOV UNK

8

SLIDE 27

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

Common OOV handling techniques

None (random init)
One UNK to rule them all

○ Average existing embeddings ○ Trained with embeddings (stochastic unking)

Add subword model during WE training

○ Bhatia et al. (2016), Wieting et al. (2016)

OOV

9

SLIDE 28

Supervised task corpus

Common OOV handling techniques

None (random init)
One UNK to rule them all

○ Average existing embeddings ○ Trained with embeddings (stochastic unking)

Add subword model during WE training

○ Bhatia et al. (2016), Wieting et al. (2016) ○ What if we don’t have access to the original corpus? (e.g. FastText)

OOV

9

SLIDE 29

Char2Tag

OOV

10

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

SLIDE 30

Char2Tag

Add subword layer to supervised task

○ Ling et al. (2015), Plank et al. (2016)

OOV

10

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

SLIDE 31

Char2Tag

Add subword layer to supervised task

○ Ling et al. (2015), Plank et al. (2016)

OOVs benefit from co-trained character model

OOV

10

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

SLIDE 32

Char2Tag

Add subword layer to supervised task

○ Ling et al. (2015), Plank et al. (2016)

OOVs benefit from co-trained character model
Requires large supervised training set for

efficient transfer to test set OOVs

OOV

10

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

SLIDE 33

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

OOV

Enter MIMICK

OOV

11

SLIDE 34

What data do we have, post-unlabeled corpus?

○ Vector dictionary ○ Orthography (the way words are spelled)

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

OOV

Enter MIMICK

OOV

11

SLIDE 35

What data do we have, post-unlabeled corpus?

○ Vector dictionary ○ Orthography (the way words are spelled)

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

OOV

Enter MIMICK

OOV

11

SLIDE 36

What data do we have, post-unlabeled corpus?

○ Vector dictionary ○ Orthography (the way words are spelled)

Use the former as training objective, latter as input

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

OOV

Enter MIMICK

OOV

11

SLIDE 37

What data do we have, post-unlabeled corpus?

○ Vector dictionary ○ Orthography (the way words are spelled)

Use the former as training objective, latter as input
Pre-trained vectors as target

○ No need to access original unlabeled corpus ○ Many training examples ○ (No context)

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

OOV

Enter MIMICK

OOV

11

SLIDE 38

What data do we have, post-unlabeled corpus?

○ Vector dictionary ○ Orthography (the way words are spelled)

Use the former as training objective, latter as input
Pre-trained vectors as target

○ No need to access original unlabeled corpus ○ Many training examples ○ (No context)

Subword units as inputs

○ Very extensible ○ (Character inventory changes?)

Supervised task corpus

Unlabeled corpus Unlabeled corpus Unlabeled corpus Unlabeled corpus

OOV

Enter MIMICK

OOV

11

SLIDE 39

MIMICK Training

m e k a

Pre-trained Embedding (Polyglot/FastText/etc.)

make

All possible text Unlabeled text

make

12

SLIDE 40

Character embeddings

MIMICK Training

m e k a

Pre-trained Embedding (Polyglot/FastText/etc.)

make

All possible text Unlabeled text

make

12

SLIDE 41

Character embeddings

MIMICK Training

m e k a

Pre-trained Embedding (Polyglot/FastText/etc.)

make

All possible text Unlabeled text

make

Forward LSTM 12

SLIDE 42

Character embeddings

MIMICK Training

m e k a

Pre-trained Embedding (Polyglot/FastText/etc.)

make

All possible text Unlabeled text

make

Backward LSTM Forward LSTM 12

SLIDE 43

Character embeddings

MIMICK Training

m e k a

Pre-trained Embedding (Polyglot/FastText/etc.)

make

All possible text Unlabeled text

make

Backward LSTM Forward LSTM Multilayered Perceptron 12

SLIDE 44

Character embeddings

MIMICK Training

m e k a

Pre-trained Embedding (Polyglot/FastText/etc.)

make Loss (L2) Mimicked Embedding

All possible text Unlabeled text

make

Backward LSTM Forward LSTM Multilayered Perceptron 12

SLIDE 45

Character embeddings

MIMICK Inference

b h a l

Mimicked Embedding

All possible text Unlabeled text

blah

Backward LSTM Forward LSTM Multilayered Perceptron 13

SLIDE 46

Observation – Nearest Neighbors

14

SLIDE 47

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

14

SLIDE 48

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM 14

SLIDE 49

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM ○ pesky → euphoric, disagreeable, horrid, ghastly 14

SLIDE 50

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM ○ pesky → euphoric, disagreeable, horrid, ghastly ○ lawnmower → tradesman, bookmaker, postman, hairdresser 14

SLIDE 51

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM ○ pesky → euphoric, disagreeable, horrid, ghastly ○ lawnmower → tradesman, bookmaker, postman, hairdresser

Hebrew

14

SLIDE 52

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM ○ pesky → euphoric, disagreeable, horrid, ghastly ○ lawnmower → tradesman, bookmaker, postman, hairdresser

Hebrew

○רותפת→ גתת(she/you-3p.sg.) will come true (she/you-3p.sg.) will solve 14

SLIDE 53

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM ○ pesky → euphoric, disagreeable, horrid, ghastly ○ lawnmower → tradesman, bookmaker, postman, hairdresser

Hebrew

○רותפת→ גתת(she/you-3p.sg.) will come true (she/you-3p.sg.) will solve ○םיירט→ םיירטמואיג geometric (m.pl., nontrad. spelling) geometric (m.pl.) 14

SLIDE 54

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM ○ pesky → euphoric, disagreeable, horrid, ghastly ○ lawnmower → tradesman, bookmaker, postman, hairdresser

Hebrew

○רותפת→ גתת(she/you-3p.sg.) will come true (she/you-3p.sg.) will solve ○םיירט→ םיירטמואיג geometric (m.pl., nontrad. spelling) geometric (m.pl.) ○ךרא→ ציר’ןוסדר Richardson Eustrach 14

SLIDE 55

Observation – Nearest Neighbors

English (OOV  Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM ○ pesky → euphoric, disagreeable, horrid, ghastly ○ lawnmower → tradesman, bookmaker, postman, hairdresser

Hebrew

○רותפת→ גתת(she/you-3p.sg.) will come true (she/you-3p.sg.) will solve ○םיירט→ םיירטמואיג geometric (m.pl., nontrad. spelling) geometric (m.pl.) ○ךרא→ ציר’ןוסדר Richardson Eustrach

✔ Surface form

✔ Syntactic properties ✘ Semantics

14

SLIDE 56

Intrinsic Evaluation – RareWords

15

SLIDE 57

Intrinsic Evaluation – RareWords

RareWords similarity task: morphologically-complex, mostly unseen words

15

SLIDE 58

Intrinsic Evaluation – RareWords

RareWords similarity task: morphologically-complex, mostly unseen words

15

SLIDE 59

Intrinsic Evaluation – RareWords

RareWords similarity task: morphologically-complex, mostly unseen words
Names
Domain-specific jargon
Foreign words
Rare(-ish) morphological

derivations

Nonce words
Nonstandard orthography
Typos and other errors
...

15

SLIDE 60

Intrinsic Evaluation – RareWords

RareWords similarity task: morphologically-complex, mostly unseen words
Names
Domain-specific jargon
Foreign words
Rare(-ish) morphological

derivations

Nonce words
Nonstandard orthography
Typos and other errors
...

Nearest: programmatic transformational mechanistic transactional contextual NN FUN!!! 15

SLIDE 61

Extrinsic Evaluation – POS + Attribute Tagging

Names
Domain-specific jargon
Foreign words
Rare(-ish) morphological

derivations

Nonce words
Nonstandard
rthography
Typos and other errors
...

16

UD is annotated for POS and morphosyntactic attributes

○ Eng: his stated goals

Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku

SLIDE 62

Extrinsic Evaluation – POS + Attribute Tagging

Names
Domain-specific jargon
Foreign words
Rare(-ish) morphological

derivations

Nonce words
Nonstandard
rthography
Typos and other errors
...

16

UD is annotated for POS and morphosyntactic attributes

○ Eng: his stated goals

Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku

POS model from Ling et al. (2015)

VBZ VBG NN DT Word embeddings the sitting is cat Forward LSTM Backward LSTM

SLIDE 63

UD is annotated for POS and morphosyntactic attributes

○ Eng: his stated goals

Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku

POS model from Ling et al. (2015)
Attributes - same as POS layer

pres

sing
VBZ

VBG NN DT Word embeddings the sitting is cat Forward LSTM Backward LSTM

POS Number Tense

17

Extrinsic Evaluation – POS + Attribute Tagging

SLIDE 64

UD is annotated for POS and morphosyntactic attributes

○ Eng: his stated goals

Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku

POS model from Ling et al. (2015)
Attributes - same as POS layer

pres

sing
VBZ

VBG NN DT Word embeddings the sitting is cat Forward LSTM Backward LSTM

POS Number Tense

17

Negative effect on POS

Extrinsic Evaluation – POS + Attribute Tagging

SLIDE 65

UD is annotated for POS and morphosyntactic attributes

○ Eng: his stated goals

Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku

POS model from Ling et al. (2015)
Attributes - same as POS layer

pres

sing
VBZ

VBG NN DT Word embeddings the sitting is cat Forward LSTM Backward LSTM

POS Number Tense

17

Negative effect on POS
Attribute evaluation metric

○ Micro F1

Extrinsic Evaluation – POS + Attribute Tagging

SLIDE 66

Language Selection

Minna Sundberg

18

SLIDE 67

Language Selection

|UD ∩ Polyglot| = 44, we took 23

Minna Sundberg

18

SLIDE 68

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

Minna Sundberg

18

SLIDE 69

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional

Minna Sundberg

18

SLIDE 70

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic

Minna Sundberg

18

SLIDE 71

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating

Minna Sundberg

18

SLIDE 72

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Minna Sundberg

18

SLIDE 73

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Geneological diversity

Minna Sundberg

18

SLIDE 74

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Geneological diversity

○ 13 Indo-European (7 different branches)

Minna Sundberg

18

SLIDE 75

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Geneological diversity

○ 13 Indo-European (7 different branches) ○ 10 from 8 non-IE branches

Minna Sundberg

18

SLIDE 76

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Geneological diversity

○ 13 Indo-European (7 different branches) ○ 10 from 8 non-IE branches

MRLs (e.g. Slavic languages)

Minna Sundberg

18

SLIDE 77

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Geneological diversity

○ 13 Indo-European (7 different branches) ○ 10 from 8 non-IE branches

MRLs (e.g. Slavic languages)

○ Much word-level data

Minna Sundberg

18

SLIDE 78

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Geneological diversity

○ 13 Indo-European (7 different branches) ○ 10 from 8 non-IE branches

MRLs (e.g. Slavic languages)

○ Much word-level data ○ Relatively free word order

Minna Sundberg

18

SLIDE 79

Language Selection

|UD ∩ Polyglot| = 44, we took 23
Morphological structure

○ 12 fusional ○ 3 analytic ○ 1 isolating ○ 7 agglutinative

Geneological diversity

○ 13 Indo-European (7 different branches) ○ 10 from 8 non-IE branches

MRLs (e.g. Slavic languages)

○ Much word-level data ○ Relatively free word order Institutional Entrepreneurial Linguistic Anatomical Ideological

Minna Sundberg

18

SLIDE 80

Language Selection (contd.)

19

SLIDE 81

Language Selection (contd.)

Script type

19

SLIDE 82

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts 19

SLIDE 83

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters 19

SLIDE 84

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters ○ Hebrew, Arabic - no casing, no vowels, syntactic fusion 19

SLIDE 85

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters ○ Hebrew, Arabic - no casing, no vowels, syntactic fusion ○ Vietnamese - tokens are non-compositional syllables 19

SLIDE 86

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters ○ Hebrew, Arabic - no casing, no vowels, syntactic fusion ○ Vietnamese - tokens are non-compositional syllables

Attribute-carrying tokens

19

SLIDE 87

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters ○ Hebrew, Arabic - no casing, no vowels, syntactic fusion ○ Vietnamese - tokens are non-compositional syllables

Attribute-carrying tokens

○ Range from 0% (Vietnamese) to 92.4% (Hindi) 19

SLIDE 88

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters ○ Hebrew, Arabic - no casing, no vowels, syntactic fusion ○ Vietnamese - tokens are non-compositional syllables

Attribute-carrying tokens

○ Range from 0% (Vietnamese) to 92.4% (Hindi)

OOV rate (UD against Polyglot vocabulary)

19

SLIDE 89

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters ○ Hebrew, Arabic - no casing, no vowels, syntactic fusion ○ Vietnamese - tokens are non-compositional syllables

Attribute-carrying tokens

○ Range from 0% (Vietnamese) to 92.4% (Hindi)

OOV rate (UD against Polyglot vocabulary)

○ 16.9%-70.8% type-level (median 29.1%) 19

SLIDE 90

Language Selection (contd.)

Script type

○ 7 in non-alphabetic scripts ○ Ideographic (Chinese) - ~12K characters ○ Hebrew, Arabic - no casing, no vowels, syntactic fusion ○ Vietnamese - tokens are non-compositional syllables

Attribute-carrying tokens

○ Range from 0% (Vietnamese) to 92.4% (Hindi)

OOV rate (UD against Polyglot vocabulary)

○ 16.9%-70.8% type-level (median 29.1%) ○ 2.2%-33.1% token-level (median 9.2%) 19

SLIDE 91

Evaluated Systems

NONE: Polyglot’s default UNK embedding

the sitting is flatfish

20

SLIDE 92

Evaluated Systems

NONE: Polyglot’s default UNK embedding
MIMICK

the sitting is flatfish

20

SLIDE 93

Evaluated Systems

NONE: Polyglot’s default UNK embedding
MIMICK
CHAR2TAG - additional RNN layer

○ 3x Training time

Char- LSTM Char- LSTM Char- LSTM Char- LSTM the sitting is flatfish

20

SLIDE 94

Evaluated Systems

NONE: Polyglot’s default UNK embedding
MIMICK
CHAR2TAG - additional RNN layer

○ 3x Training time

BOTH: MIMICK + CHAR2TAG

Char- LSTM Char- LSTM Char- LSTM Char- LSTM the sitting is flatfish

20

SLIDE 95

Evaluated Systems

NONE: Polyglot’s default UNK embedding
MIMICK
CHAR2TAG - additional RNN layer

○ 3x Training time

BOTH: MIMICK + CHAR2TAG

POINT UNION ROAD LIGHT LONG

Char- LSTM Char- LSTM Char- LSTM Char- LSTM the sitting is flatfish

20

SLIDE 96

Results - Full Data

POS tags (accuracy)

Morpho. Attributes (micro F1)

21

NONE MIMICK CHAR2TAG BOTH NONE MIMICK CHAR2TAG BOTH

SLIDE 97

Results - 5,000 training tokens

POS tags (accuracy)

Morpho. Attributes (micro F1)

22

NONE MIMICK CHAR2TAG BOTH NONE MIMICK CHAR2TAG BOTH

SLIDE 98

NONE MIMICK CHAR2TAG BOTH

Results - Language Types (5,000 tokens)

Slavic languages POS 23

SLIDE 99

NONE MIMICK CHAR2TAG BOTH NONE MIMICK CHAR2TAG BOTH

Results - Language Types (5,000 tokens)

Slavic languages POS Agglutinative languages morpho. attribute F1 23

SLIDE 100

Results - Chinese

24

POS tags (accuracy)

Morpho. Attributes (micro F1)

NONE MIMICK CHAR2TAG BOTH NONE MIMICK CHAR2TAG BOTH

SLIDE 101

A Word (Model) from our Sponsor

Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 102

A Word (Model) from our Sponsor

Our extrinsic results are on tagging

Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 103

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 104

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 105

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 106

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 107

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! ○ QA! Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 108

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! ○ QA! ○ ... Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 109

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! ○ QA! ○ ...

Code compatible with w2v, Polyglot, FastText

Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 110

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! ○ QA! ○ ...

Code compatible with w2v, Polyglot, FastText
Models for Polyglot also on github

Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 111

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! ○ QA! ○ ...

Code compatible with w2v, Polyglot, FastText
Models for Polyglot also on github

○ <1MB each, dynet format Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 112

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! ○ QA! ○ ...

Code compatible with w2v, Polyglot, FastText
Models for Polyglot also on github

○ <1MB each, dynet format ○ Learn all OOVs in advance and add to param table, or Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 113

A Word (Model) from our Sponsor

Our extrinsic results are on tagging
Please consider us for all your WE use cases!

○ Sentiment! ○ Parsing! ○ IE! ○ QA! ○ ...

Code compatible with w2v, Polyglot, FastText
Models for Polyglot also on github

○ <1MB each, dynet format ○ Learn all OOVs in advance and add to param table, or ○ Load into memory and infer on-line Code & models: https://github.com/yuvalpinter/Mimick 25

SLIDE 114

Conclusions

26

SLIDE 115

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks

26

SLIDE 116

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact

26

SLIDE 117

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios

26

SLIDE 118

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

26

SLIDE 119

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

○ Morphologically-rich languages 26

SLIDE 120

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

○ Morphologically-rich languages ○ Large character vocabulary 26

SLIDE 121

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

○ Morphologically-rich languages ○ Large character vocabulary

Sore spots and Future Work

26

SLIDE 122

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

○ Morphologically-rich languages ○ Large character vocabulary

Sore spots and Future Work

○ Vietnamese - syllabic vocabulary 26

SLIDE 123

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

○ Morphologically-rich languages ○ Large character vocabulary

Sore spots and Future Work

○ Vietnamese - syllabic vocabulary ○ Hebrew and Arabic - nontrivial tokenization, no case 26

SLIDE 124

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

○ Morphologically-rich languages ○ Large character vocabulary

Sore spots and Future Work

○ Vietnamese - syllabic vocabulary ○ Hebrew and Arabic - nontrivial tokenization, no case ○ Try other subword levels (morphemes, phonemes, bytes) 26

SLIDE 125

Conclusions

MIMICK: an OOV-extension embedding processing step for downstream tasks
Compositional model complementing distributional artifact
Powerful technique for low-resource scenarios
Especially good for:

○ Morphologically-rich languages ○ Large character vocabulary

Sore spots and Future Work

○ Vietnamese - syllabic vocabulary ○ Hebrew and Arabic - nontrivial tokenization, no case ○ Try other subword levels (morphemes, phonemes, bytes) ○ Improve morphosyntactic attribute tagging scheme 26

SLIDE 126

Questions?

Code & models: https://github.com/yuvalpinter/Mimick Neglect Satisfaction Illness Espionage Bullying

27