Dont Get Fooled by Word Embeddings Better Watch Their Neighborhood - - PowerPoint PPT Presentation

don t get fooled by word embeddings better watch their
SMART_READER_LITE
LIVE PREVIEW

Dont Get Fooled by Word Embeddings Better Watch Their Neighborhood - - PowerPoint PPT Presentation

August 11, 2017, Montreal, Canada DH 2017 Dont Get Fooled by Word Embeddings Better Watch Their Neighborhood Johannes Hellrich 1,2 & Udo Hahn 2 2: Jena University Language & Information 1: Graduate School 'The Romantic


slide-1
SLIDE 1

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 1

2: Jena University Language & Information Engineering (JULIE) Lab Friedrich Schiller University Jena, Jena, Germany

http://www.julielab.de

1: Graduate School 'The Romantic Model', Friedrich Schiller University Jena, Jena, Germany

http://www.modellromantik.uni-jena.de

Don’t Get Fooled by Word Embeddings— Better Watch Their Neighborhood

Johannes Hellrich1,2 & Udo Hahn2

slide-2
SLIDE 2

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 2

You shall know a word by the company it keeps!

Firth, 1957

He reads a poem. She reads a novel. The novel has 312 pages. The poem fits on two pages. She listens to an opera. He listens to jazz.

slide-3
SLIDE 3

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 3

You shall know a word by the company it keeps!

Firth, 1957

He reads a poem. She reads a novel. The novel has 312 pages. The poem fits on two pages. She listens to an opera. He listens to jazz.

slide-4
SLIDE 4

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 4

Counting Cooccurrences

read pages hate enjoy listen … novel 98 60 3 56 2 poem 67 10 1 47 8

  • pera

4 8 42 38 jazz 2 1 2 61 47 …

slide-5
SLIDE 5

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 5

Vector Representation

read listen novel poem

  • pera

jazz

slide-6
SLIDE 6

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 6

Distance and Similarity

read listen novel poem

  • pera

jazz

slide-7
SLIDE 7

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 7

Dimensionality Problem

  • One dimension per word
  • 50k to 100k dimensions

à Large files and slow operations

  • What about synonyms – it shouldn‘t matter if

I buy or purchase a novel

slide-8
SLIDE 8

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 8

Word Embeddings

  • Represent words as dense vectors with 200–

500 instead of 50k–100k dimensions

  • Very popular in computational linguistics and

digital humanities

  • Better on judging word similarity
slide-9
SLIDE 9

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 9

Application in DH: Semantic Development of Herz ‚heart‘

1799 1806 1813 1820 1827 1834 1841 1848 1855 1862 1869 1876 1883 1890 1897 1904 1911 1918 1925 1932 1939 1946 1953 1960 1967 1974 1981 1988 1995 2002 2009 0.55 0.6 0.65 0.7 0.75 0.8

year similarity

Gemüth 'mind' Gehirn 'brain' erschrecke 'frighten' Lunge 'lung' 0.55 0.6 0.65 0.7 0.75 0.8

similarity

1800 1900 2000

  • Hellrich & Hahn, DH 2016
  • First applied by Kim et al., ACL 2014 Workshop on Language

Technologies and Computational Social Science

slide-10
SLIDE 10

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 10

Types of Word Embeddings

lots

  • f

text lots

  • f

text read pages musician poem 475 156 novel 823 492 3

  • pera

51 19 993

Singular Value Decomposition Neural Word Embeddings

novel poem

  • pera

novel poem

  • pera

lots

  • f

text lots

  • f

text

novel poem

  • pera
slide-11
SLIDE 11

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 11

Neural Word Embeddings

w(t) INPUT PROJECTION OUTPUT w(t-2) w(t-1) w(t+1) w(t+2)

SGNS

  • Extremely popular skip-

gram negative sampling algorithm SGNS/word2vec

(Mikolov et al., NIPS 2013)

  • Alternative neural

embeddings using an explicit cooccurrence matrix: GloVe (Pennington et

al., EMNLP 2014)

slide-12
SLIDE 12

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 12

Training Neural Word Embeddings

  • Word Embeddings are updated after looking at the text
  • Tries to minimize false predictions (cost function)
  • Will lead us to a local, yet rarely to the global minimum
slide-13
SLIDE 13

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 13

Training Neural Word Embeddings

  • Word Embeddings are updated after looking at the text
  • Tries to minimize false predictions (cost function)
  • Will lead us to a local, yet rarely to the global minimum
slide-14
SLIDE 14

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 14

Singular Value Decomposition

  • Express Cooccurrences as UΣVT
  • U represents words, VT context words
  • Σ measures importance of dimensions
slide-15
SLIDE 15

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 15

Singular Value Decomposition

  • Classical SVD embeddings: 𝑉d, selecting only d

dimensions from 𝑉 based on Σ

slide-16
SLIDE 16

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 16

SVDPPMI

  • Levy et al., TACL 2015
  • Positive pointwise mutual information

instead of frequency

  • Post-/preprocessing inspired by SGNS and

GloVe

slide-17
SLIDE 17

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 17

Measuring Reliability

  • Train multiple models with identical parameters
  • n one corpus
  • Measure percentage of identical neighborhoods

for each word between models

  • Hellrich&Hahn, COLING 2016
slide-18
SLIDE 18

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 18

Measuring Reliability

  • Train multiple models with identical parameters
  • n one corpus
  • Measure percentage of identical neighborhoods

for each word between models

  • Example: No agreement at neighborhood size 1

for poem

novel poem

  • pera

novel poem

  • pera

novel poem

  • pera
slide-19
SLIDE 19

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 19

Measuring Reliability

  • Train multiple models with identical parameters
  • n one corpus
  • Measure percentage of identical neighborhoods

for each word between models

  • Example: Agreement at neighborhood size 2 for

poem

novel poem

  • pera

novel poem

  • pera

novel poem

  • pera
slide-20
SLIDE 20

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 20

Experiment

  • 3 models each for SGNS, GloVe and SVDPPMI
  • Trained on corpus of 645 German texts from

19th century, subset of Deutsches Textarchiv ‘German Text Archive’

  • Technical Details:
  • Window size 5,
  • 300 dimensions
  • hyperwords toolkit
slide-21
SLIDE 21

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 21

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-22
SLIDE 22

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 22

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-23
SLIDE 23

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 23

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-24
SLIDE 24

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 24

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-25
SLIDE 25

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 25

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-26
SLIDE 26

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 26

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-27
SLIDE 27

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 27

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-28
SLIDE 28

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 28

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-29
SLIDE 29

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 29

Reliability for Herz ‘heart’

Embedding Model First Neighbor Second Neighbor Third Neighbor Fourth Neighbor Fifth Neighbor SGNS 1

schmerzen

‘pain’

beklommen

‘anxious’

busen

‘bosom’

bluten

‘to bleed’

herzen

‘to caress’

SGNS 2

bluten

‘to bleed’

klopfend

‘beating’

busen

‘bosom’

beklommen

‘anxious’

herzen

‘to caress’

SGNS 3

herzen

‘to caress’

busen

‘bosom’

klopfend

‘beating’

beklommen

‘anxious’

bluten

‘to bleed’

GloVe 1

gemüt

‘mind’

mein

‘my’

seele

‘soul’

liebe

‘love’

brust

‘chest’

GloVe 2

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

GloVe 3

gemüt

‘mind’

mein

‘my’

seele

‘soul’

brust

‘chest’

liebe

‘love’

SVDPPMI, all

busen

‘bosom’

fühlen

‘to feel’

liebe

‘love’

schmerzen

‘pain’

menschenherz

‘human heart’

slide-30
SLIDE 30

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 30

Reliability for 1000 most frequent nouns depending on neighborhood size

60 65 70 75 80 85 90 95 100

1 3 5 7 9

SVDPPMI GloVe SGNS Neighborhood Size Percentage of Identical Neighbors

slide-31
SLIDE 31

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 31

Reliability for 100–1000 most frequent nouns depending on word frequency

60 65 70 75 80 85 90 95 100

100 200 300 400 500 600 700 800 900 1000

Number of most frequent nouns Percentage of Identical Neighbors SVDPPMI GloVe SGNS

slide-32
SLIDE 32

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 32

Conclusion

  • Neural word embeddings are unreliable
  • SVDPPMI is reliable and performs very similar
  • n evaluation tasks
  • Also think about: Preprocessing often

includes random sampling

slide-33
SLIDE 33

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 33

http://jeseme.org

Hellrich & Hahn, ACL 2017

Accessible SVDPPMI embeddings for diachronic linguistics

slide-34
SLIDE 34

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 34

http://jeseme.org

Hellrich & Hahn, ACL 2017

Accessible SVDPPMI embeddings for diachronic linguistics

slide-35
SLIDE 35

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 35

2: Jena University Language & Information Engineering (JULIE) Lab Friedrich Schiller University Jena, Jena, Germany

http://www.julielab.de

1: Graduate School 'The Romantic Model', Friedrich Schiller University Jena, Jena, Germany

http://www.modellromantik.uni-jena.de

Don’t Get Fooled by Word Embeddings— Better Watch Their Neighborhood

Johannes Hellrich1,2 & Udo Hahn2

slide-36
SLIDE 36

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 36

Word Embedding Performance

Method WordSim WordSim Bruni et al. Radinsky et al. Luong et al. Hill et al. Google MSR Similarity Relatedness MEN

  • M. Turk

Rare Words SimLex Add / Mul Add / Mul PPMI .755 .697 .745 .686 .462 .393 .553 / .679 .306 / .535 SVD .793 .691 .778 .666 .514 .432 .554 / .591 .408 / .468 SGNS .793 .685 .774 .693 .470 .438 .676 / .688 .618 / .645 GloVe .725 .604 .729 .632 .403 .398 .569 / .596 .533 / .580 Table 4: Performance of each method across different tasks using the best configuration for that method and task combination, assuming win = 2.

From Levy et al. (2015)

slide-37
SLIDE 37

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 37

Reliability of word2vec at different frequencies

  • Hellrich&Hahn, COLING 2016
  • word2vec models trained on Google Books corpora

20 40 60 80 100 0.2 0.4 0.6 0.8 frequency percentile SGHS 1900–1904 SGNS 1900–1904 SGHS 2005–2009 SGNS 2005–2009

Figure 4:

English German

20 40 60 80 100 0.2 0.4 0.6 0.8 frequency percentile reliability SGHS 1900–1904 SGNS 1900–1904 SGHS 2005–2009 SGNS 2005–2009

slide-38
SLIDE 38

DH 2017 August 11, 2017, Montreal, Canada Johannes Hellrich & Udo Hahn Don’t Get Fooled by Word Embeddings 38

Warning: Automatic word change research is focused on high frequency words

20 40 60 80 100 0.2 0.4 0.6 0.8 frequency percentile reliability