[PPT] - Is the Future Almost Here? Large-Scale Completely Automated PowerPoint Presentation

SLIDE 1

Large-‑Scale ¡ ¡ Completely ¡Automated ¡Vowel ¡Extrac8on ¡ ¡

f ¡Free ¡Speech ¡

Sravana ¡Reddy ¡and ¡James ¡N. ¡Stanford ¡

¡ ¡ ¡ ¡

Dartmouth ¡College ¡

Is the Future Almost Here? ¡

SLIDE 2

Mo8va8on ¡

Transcrip8on ¡ Can this be completely automated?

SLIDE 3

Current ¡Level ¡of ¡Automa8on ¡

Penn ¡Aligner ¡(Yuan & Liberman 2008) ¡

– Evanini ¡(2009) ¡ – Evanini, ¡Isard ¡& ¡Liberman ¡(2009) ¡ ¡

ProsodyLab ¡(McGill) ¡Aligner ¡(Gorman et al. 2011)
WebMAUS (Kisler et al. 2012)
FAVE: ¡Forced ¡Alignment ¡Vowel ¡Extrac8on ¡

(Rosenfelder, Fruehwald, Evanini & Yuan 2011) 

¡ ¡ ¡

– Used ¡for ¡Philadelphia ¡data ¡analysis ¡in ¡ ¡ Labov, ¡Rosenfelder ¡& ¡Fruehwald ¡(2013) ¡

¡ ¡

– Fruehwald ¡& ¡Kendall ¡at ¡this ¡conference ¡

SLIDE 4

FAVE: ¡(1) ¡Word-‑Level ¡Transcrip8on ¡

SLIDE 5

FAVE: ¡(2) ¡Forced ¡Alignment ¡

SLIDE 6

FAVE: ¡(3) ¡Vowel ¡Extrac8on ¡

vowel ¡ ¡ ¡stress ¡ ¡word ¡ ¡ ¡ ¡F1 ¡ ¡ ¡ ¡ ¡ ¡F2 ¡ ¡ ¡ ¡ ¡ ¡F3 ¡ ¡ ¡ ¡ ¡ ¡B1 ¡ ¡ ¡ ¡ ¡ ¡B2 ¡ ¡ ¡ ¡ ¡ ¡B3 ¡ ¡ ¡ ¡ ¡ ¡t ¡ ¡ ¡ ¡ ¡ ¡ ¡beg ¡ ¡ ¡ ¡ ¡end ¡ ¡ ¡ ¡ ¡dur ¡ ¡ ¡ ¡ ¡ cd ¡ ¡ ¡ ¡ ¡ ¡fm ¡ ¡ ¡ ¡ ¡ ¡fp ¡ ¡ ¡ ¡ ¡ ¡fv ¡ ¡ ¡ ¡ ¡ ¡ps ¡ ¡ ¡ ¡ ¡ ¡fs ¡ ¡ ¡ ¡ ¡ ¡style ¡ ¡ ¡glide ¡ ¡ ¡F1@20% ¡ ¡F2@20% ¡ ¡F1@35% ¡ ¡ F2@35% ¡ ¡F1@50% ¡ ¡F2@50% ¡ ¡F1@65% ¡ ¡F2@65% ¡ ¡F1@80% ¡ ¡F2@80% ¡ ¡nFormants ¡ ¡ OW ¡ ¡ ¡ ¡ ¡ ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡NO ¡ ¡ ¡ ¡ ¡ ¡611.9 ¡ ¡ ¡1644.7 ¡ ¡2058.7 ¡ ¡65.5 ¡ ¡ ¡ ¡99.5 ¡ ¡ ¡ ¡815.6 ¡ ¡ ¡10.317 ¡ ¡10.28 ¡ ¡ ¡ 10.55 ¡ ¡ ¡0.27 ¡ ¡ ¡ ¡63 ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡4 ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡657.7 ¡ ¡ ¡1599.0 ¡ ¡610.0 ¡ ¡ ¡ 1455.2 ¡ ¡580.4 ¡ ¡ ¡1160.2 ¡ ¡546.1 ¡ ¡ ¡1059.3 ¡ ¡507.3 ¡ ¡ ¡1037.8 ¡ ¡5 ¡ ¡ AA ¡ ¡ ¡ ¡ ¡ ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡NOT ¡ ¡ ¡ ¡ ¡732.2 ¡ ¡ ¡1493.6 ¡ ¡2861.9 ¡ ¡232.1 ¡ ¡ ¡82.6 ¡ ¡ ¡ ¡289.4 ¡ ¡ ¡10.9 ¡ ¡ ¡ ¡10.8 ¡ ¡ ¡ ¡ 11.101 ¡ ¡0.301 ¡ ¡ ¡5 ¡ ¡ ¡ ¡ ¡ ¡ ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡4 ¡ ¡ ¡ ¡ ¡ ¡ ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡4 ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡698.8 ¡ ¡ ¡1484.7 ¡ ¡739.8 ¡ ¡ ¡ 1496.1 ¡ ¡790.9 ¡ ¡ ¡1503.2 ¡ ¡796.4 ¡ ¡ ¡1568.6 ¡ ¡788.2 ¡ ¡ ¡1646.4 ¡ ¡4 ¡ ¡ AE ¡ ¡ ¡ ¡ ¡ ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡HAVE ¡ ¡ ¡ ¡592.4 ¡ ¡ ¡1810.1 ¡ ¡2135.6 ¡ ¡49.8 ¡ ¡ ¡ ¡125.7 ¡ ¡ ¡699.1 ¡ ¡ ¡11.467 ¡ ¡11.43 ¡ ¡ ¡ 11.54 ¡ ¡ ¡0.11 ¡ ¡ ¡ ¡3 ¡ ¡ ¡ ¡ ¡ ¡ ¡3 ¡ ¡ ¡ ¡ ¡ ¡ ¡2 ¡ ¡ ¡ ¡ ¡ ¡ ¡2 ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡610.0 ¡ ¡ ¡1852.0 ¡ ¡589.3 ¡ ¡ ¡ 1800.2 ¡ ¡577.6 ¡ ¡ ¡1733.9 ¡ ¡552.3 ¡ ¡ ¡1656.6 ¡ ¡479.2 ¡ ¡ ¡1567.0 ¡ ¡5 ¡

SLIDE 7

This ¡Work ¡

Word-‑Level ¡ Transcrip8on ¡ ASR

(Automatic Speech Recognition)

FAVE Vowel ¡Formants ¡

SLIDE 8

This ¡Work ¡

CAVE: ¡ ¡ Completely ¡Automated ¡ ¡ Vowel ¡Extrac8on ¡

A ¡future ¡full ¡of ¡possibili.es! ¡ ¡ Analyze ¡hours ¡of ¡speech ¡from ¡the ¡radio ¡ ¡ and ¡TV, ¡terabytes ¡of ¡data ¡from ¡YouTube, ¡ live ¡interviews, ¡dialects ¡of ¡any ¡language… ¡

SLIDE 9

The ¡Southern ¡Shie ¡

(Labov ¡1996) ¡

SLIDE 10

Examples ¡of ¡ASR ¡Errors ¡

REF: give me your first impressions

HYP: give me yours first impression 

REF: it’s one of those

HYP: it’s close 

REF: no it’s it’s wood turning

HYP: no it it would turn it 

REF: and we really don’t spend on anything much

HYP: and we don’t depend on anything much 

REF: a real dog and cat and all the other animals

HYP: a real docking tap and on the other animals

Poor ¡understanding ¡

f ¡meaning ¡and ¡

syntax… ¡ ¡ but ¡the ¡(stressed) ¡ vowels ¡are ¡ok! ¡

SLIDE 11

ASR ¡Word ¡and ¡Phoneme ¡Errors ¡

0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ 80 ¡ 90 ¡ 100 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ ¡Error ¡Rate ¡ Speaker ¡ID ¡

Northern ¡Speakers ¡in ¡Our ¡Study ¡

Word ¡Errors ¡ Phoneme ¡ Errors ¡ Stressed ¡ Vowel ¡Errors ¡

SLIDE 12

Our ¡Idea ¡

ASR ¡vowel ¡error ¡rates ¡are ¡low. ¡ ¡ With ¡large ¡amounts ¡of ¡data, ¡ ¡ can ¡get ¡hundreds ¡of ¡tokens ¡per ¡vowel. ¡ ¡ Therefore, ¡ASR ¡transcrip8ons ¡should ¡be ¡ nearly ¡as ¡good ¡as ¡human ¡for ¡analyzing ¡ vowels ¡in ¡sociolinguis8cs. ¡

SLIDE 13

Technology ¡behind ¡FAVE ¡

Same ¡models ¡in ¡automa8c ¡speech ¡recogni8on ¡

– Forced ¡alignment ¡using ¡MFCC ¡features, ¡acous8c ¡ models, ¡dynamic ¡programming… ¡ ¡

Natural ¡ques8on: ¡take ¡it ¡further? ¡

SLIDE 14

This ¡Work ¡

Compare ¡

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡

Feasibility ¡Test: ¡Do ¡the ¡vowel ¡spaces ¡show ¡a ¡

dis8nc8on ¡between ¡Northern ¡and ¡Southern ¡dialect ¡ features? ¡

CAVE ¡

¡ ASR ¡word ¡ transcrip8ons ¡ ¡ ¡ + ¡vowel ¡extrac8on ¡ with ¡FAVE ¡

FAVE ¡ ¡

Human ¡word ¡ transcrip8ons ¡ ¡ ¡ + ¡vowel ¡extrac8on ¡ with ¡FAVE ¡

SLIDE 15

Data ¡

Switchboard-‑1 ¡Corpus ¡(1997), ¡available ¡from ¡the ¡LDC ¡

https://catalog.ldc.upenn.edu/LDC97S62 ¡

Two-‑sided ¡telephone ¡conversa8ons ¡between ¡US ¡speakers ¡
Includes ¡human ¡word-‑level ¡transcrip8ons ¡
Randomly ¡selected ¡20 ¡speakers ¡(15 ¡hours ¡of ¡speech, ¡

143266 ¡stressed ¡vowel ¡tokens, ¡approx. ¡300 ¡tokens ¡per ¡ vowel ¡per ¡speaker) ¡

Northern ¡ Southern ¡ Male ¡ 5 ¡ ¡ 5 ¡ Female ¡ 5 ¡ 5 ¡

SLIDE 16

Automa8c ¡Speech ¡Recogni8on ¡

Acous8c ¡Model: ¡ ¡ Probabilis8c ¡Mapping ¡ from ¡Phones ¡to ¡ Acous8cs ¡ Language ¡Model: ¡ ¡ Probability ¡Distribu8on ¡

ver ¡Word ¡Sequences ¡

Pronuncia8on ¡Model: ¡ ¡ Dic8onary ¡of ¡Canonical ¡ (SAE) ¡Pronuncia8ons ¡

Ukerance ¡ Phoneme ¡Sequence ¡ Speech ¡

ASR

SLIDE 17

ASR ¡System ¡

We ¡trained ¡an ¡acous8c ¡model ¡on ¡US ¡English ¡

speech ¡(mostly ¡newswire, ¡some ¡telephone) ¡

and ¡a ¡trigram ¡language ¡model ¡on ¡assorted ¡US ¡

English ¡corpora ¡

CMU ¡pronouncing ¡dic8onary ¡ ¡

¡

Decoding ¡with ¡CMU ¡Sphinx ¡ ¡

http://cmusphinx.sourceforge.net

SLIDE 18

Stressed ¡Vowel ¡Extrac8on ¡

Forced Alignment and Vowel Extraction Speech ¡+ ¡ ASR ¡transcrip8ons ¡ CAVE ¡formants ¡ Speech ¡+ ¡ human ¡transcrip8ons ¡ FAVE ¡formants ¡

SLIDE 19

Results ¡

1600

1500 1400 1300 1200 1100 1000 450 400 350

CAVE

F2 F1

Northern

Southern

AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW

1600

1500 1400 1300 1200 1100 1000 450 400 350

FAVE

F2 F1

Northern

Southern

AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW

Normalized ¡with ¡Lobanov ¡(Kendall ¡& ¡Thomas ¡2010) ¡ ¡

SLIDE 20

Evidence ¡of ¡the ¡Southern ¡Vowel ¡Shie ¡ ¡

Both ¡CAVE ¡and ¡FAVE ¡ ¡ ✔ Clear ¡north-‑south ¡contrasts ¡in ¡EY/EH ¡ ¡ and ¡IY/IH ¡in ¡the ¡expected ¡direc8ons ¡

✔ EY ¡(bait) ¡and ¡IY ¡(beet): ¡lowered/backed ¡for ¡ southerners ¡ ✔ EH ¡(bet) ¡and ¡IH ¡(bit): ¡raised/fronted ¡for ¡ southerners ¡

SLIDE 21

Tense/lax ¡shies ¡

1600

1500 1400 1300 1200 1100 1000 450 400 350

CAVE

F2 F1

Northern

Southern

AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW

1600

1500 1400 1300 1200 1100 1000 450 400 350

FAVE

F2

Northern

Southern

AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW

SLIDE 22

Evidence ¡of ¡the ¡Southern ¡Vowel ¡Shie ¡

Both ¡CAVE ¡and ¡FAVE ¡also ¡show ¡Southern ¡ fron8ng ¡of ¡AW, ¡UW, ¡and ¡OW ¡

SLIDE 23

Fron8ng: ¡UW, ¡AW, ¡OW ¡

1600

1500 1400 1300 1200 1100 1000 450 400 350

CAVE

F2 F1

Northern

Southern

AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW

1600

1500 1400 1300 1200 1100 1000 450 400 350

FAVE

F2

Northern

Southern

AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW AA AE AH AO AW AY EH ER EY IH IY OW OY UH UW

SLIDE 24

FAVE ¡vs. ¡CAVE ¡comparisons ¡

Kendall ¡& ¡Fridland ¡(2012:296) ¡use ¡Euclidean ¡distance ¡between ¡ EH ¡and ¡EY ¡as ¡a ¡measure ¡of ¡the ¡tense/lax ¡shie ¡ ¡

Repeated ¡Measures ¡ANOVA ¡results: ¡

¡

FAVE ¡ CAVE ¡ EH-‑EY ¡distance: ¡ North ¡mean=79 ¡Hz ¡ South ¡mean=31 ¡Hz ¡

Sig. ¡different ¡(p=0.001) ¡

North ¡mean=83 ¡Hz ¡ South ¡mean=39 ¡Hz ¡

Sig. ¡different ¡(p<0.0001) ¡

IH-‑IY ¡distance: ¡ North ¡mean=150 ¡Hz ¡ South ¡mean=117 ¡Hz ¡

Sig. ¡different ¡(p=0.011) ¡

North ¡mean=145 ¡Hz ¡ South ¡mean=134 ¡Hz ¡ n.s. ¡(p=0.284) ¡

Kendall ¡& ¡Fridland ¡also ¡find ¡EH-‑EY ¡shie ¡more ¡advanced ¡than ¡IH-‑IY ¡

SLIDE 25

Formant ¡Mean ¡Differences ¡ ¡ between ¡FAVE ¡and ¡CAVE ¡

0 ¡ 0.5 ¡ 1 ¡ 1.5 ¡ 2 ¡ 2.5 ¡ 3 ¡ 3.5 ¡ 4 ¡ F1 ¡ Absolute ¡Mean ¡Difference ¡(Hz) ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ F2 ¡ IH ¡(Southern) ¡ IH ¡(Northern) ¡ IY ¡(Southern) ¡ IY ¡(Northern) ¡

✔ ¡= ¡no ¡significant ¡difference ¡

✔ ¡ ✔ ¡ ✔ ¡ ✔ ¡ ✔ ¡ ✔ ¡ ✔ ¡ ✗ ¡

SLIDE 26

Formant ¡Mean ¡Differences ¡ ¡ between ¡FAVE ¡and ¡CAVE ¡

0 ¡ 2 ¡ 4 ¡ 6 ¡ 8 ¡ 10 ¡ 12 ¡ F1 ¡ Absolute ¡Mean ¡Difference ¡(Hz) ¡ 0 ¡ 2 ¡ 4 ¡ 6 ¡ 8 ¡ 10 ¡ 12 ¡ 14 ¡ 16 ¡ 18 ¡ 20 ¡ F2 ¡ EH ¡(Southern) ¡ EH ¡(Northern) ¡ EY ¡(Southern) ¡ EY ¡(Northern) ¡ ✔ ¡ ✔ ¡ ✔ ¡ ✗ ¡ ✔ ¡ ✗ ¡ ✔ ¡ ✗ ¡

✔ ¡= ¡no ¡significant ¡difference ¡

SLIDE 27

Formant ¡Mean ¡Differences ¡ ¡ between ¡FAVE ¡and ¡CAVE ¡

0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ F2 ¡ UW ¡ (Southern) ¡ UW ¡ (Northern) ¡ OW ¡ (Southern) ¡ OW ¡ (Northern) ¡ AW ¡ (Southern) ¡ AW ¡ (Northern) ¡ 0 ¡ 0.5 ¡ 1 ¡ 1.5 ¡ 2 ¡ 2.5 ¡ 3 ¡ 3.5 ¡ 4 ¡ Absolute ¡Mean ¡Difference ¡(Hz) ¡ F1 ¡ ✔ ¡ ✔ ¡ ✔ ✗ ¡ ✔ ¡ ✗ ¡ ✔ ¡ ✗ ¡ ✔ ¡ ✗ ¡ ✔ ✔ ¡

✔ ¡= ¡no ¡significant ¡difference ¡

SLIDE 28

Formant ¡Mean ¡Differences ¡ ¡

Differences ¡comparable ¡to ¡findings ¡on ¡ ¡ inter-‑analyst ¡differences ¡(Evanini ¡2009: ¡92-‑94) ¡

Labov ¡et ¡al. ¡(1972:32) ¡
F1: ¡31.5 ¡to ¡40.5 ¡Hz ¡

¡ ¡ ¡

F2: ¡38 ¡to ¡84 ¡Hz ¡ ¡ ¡
Deng ¡et ¡al. ¡(2006) ¡
F1: ¡55 ¡Hz ¡

¡ ¡

F2: ¡69 ¡Hz ¡

¡ ¡

Hillenbrand ¡et ¡al. ¡(1995:3101) ¡
F1: ¡9.2 ¡Hz ¡ ¡

¡ ¡ ¡ ¡ ¡

F2: ¡17.6 ¡Hz ¡ ¡ ¡

SLIDE 29

Future ¡Work ¡

Test ¡on ¡other ¡data ¡and ¡dialects ¡
Tailoring ¡ASR ¡for ¡sociolinguis8c ¡applica8ons ¡

– Get ¡mul8ple ¡candidate ¡transcrip8ons ¡and ¡take ¡a ¡ weighted ¡average: ¡more ¡resistance ¡to ¡errors ¡ – Build ¡unified ¡ASR ¡decoding ¡and ¡vowel ¡extrac8on ¡ that ¡directly ¡op8mizes ¡for ¡good ¡formant ¡outputs ¡ rather ¡than ¡transcrip8on ¡

SLIDE 30

Conclusions ¡

Feasibility ¡Test: ¡Southern ¡Vowel ¡Shie ¡evident ¡with ¡automated ¡ transcrip8on ¡and ¡analysis ¡ ¡ Suggests ¡that ¡meaningful ¡sociophone8c ¡results ¡can ¡be ¡drawn ¡ from ¡a ¡completely ¡automated ¡method ¡ As ¡ASR ¡improves, ¡automated ¡methods ¡will ¡become ¡more ¡ reliable ¡for ¡fast ¡analyses ¡of ¡vast ¡amounts ¡of ¡speech ¡

SLIDE 31

Obama’s ¡Vowel ¡Space ¡from ¡CAVE ¡

2014 ¡State ¡of ¡the ¡Union ¡speech ¡(65 ¡min) ¡

Acknowledgments: This project was supported by the Neukom Institute and the Karen Wetterhahn Award at Dartmouth

1600 1500 1400 1300 1200 1100 550 500 450 400

1

EH AH AY EY IH IY UH AE AA AO UW OW ER OY AW

Large-­‑Scale ¡ ¡ Completely ¡Automated ¡Vowel ¡Extrac8on ¡ ¡

Sravana ¡Reddy ¡and ¡James ¡N. ¡Stanford ¡

Dartmouth ¡College ¡

Is the Future Almost Here? ¡

Mo8va8on ¡

Transcrip8on ¡ Can this be completely automated?

Current ¡Level ¡of ¡Automa8on ¡

– Evanini ¡(2009) ¡ – Evanini, ¡Isard ¡& ¡Liberman ¡(2009) ¡ ¡

(Rosenfelder, Fruehwald, Evanini & Yuan 2011)

– Used ¡for ¡Philadelphia ¡data ¡analysis ¡in ¡ ¡ Labov, ¡Rosenfelder ¡& ¡Fruehwald ¡(2013) ¡

– Fruehwald ¡& ¡Kendall ¡at ¡this ¡conference ¡

FAVE: ¡(1) ¡Word-­‑Level ¡Transcrip8on ¡

FAVE: ¡(2) ¡Forced ¡Alignment ¡

FAVE: ¡(3) ¡Vowel ¡Extrac8on ¡

This ¡Work ¡

Word-­‑Level ¡ Transcrip8on ¡ ASR

(Automatic Speech Recognition)

FAVE Vowel ¡Formants ¡

This ¡Work ¡

CAVE: ¡ ¡ Completely ¡Automated ¡ ¡ Vowel ¡Extrac8on ¡

A ¡future ¡full ¡of ¡possibili.es! ¡ ¡ Analyze ¡hours ¡of ¡speech ¡from ¡the ¡radio ¡ ¡ and ¡TV, ¡terabytes ¡of ¡data ¡from ¡YouTube, ¡ live ¡interviews, ¡dialects ¡of ¡any ¡language… ¡

The ¡Southern ¡Shie ¡

Examples ¡of ¡ASR ¡Errors ¡

ASR ¡Word ¡and ¡Phoneme ¡Errors ¡

Our ¡Idea ¡

ASR ¡vowel ¡error ¡rates ¡are ¡low. ¡ ¡ With ¡large ¡amounts ¡of ¡data, ¡ ¡ can ¡get ¡hundreds ¡of ¡tokens ¡per ¡vowel. ¡ ¡ Therefore, ¡ASR ¡transcrip8ons ¡should ¡be ¡ nearly ¡as ¡good ¡as ¡human ¡for ¡analyzing ¡ vowels ¡in ¡sociolinguis8cs. ¡

Technology ¡behind ¡FAVE ¡

– Forced ¡alignment ¡using ¡MFCC ¡features, ¡acous8c ¡ models, ¡dynamic ¡programming… ¡ ¡

This ¡Work ¡

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡

dis8nc8on ¡between ¡Northern ¡and ¡Southern ¡dialect ¡ features? ¡

CAVE ¡

¡ ASR ¡word ¡ transcrip8ons ¡ ¡ ¡ + ¡vowel ¡extrac8on ¡ with ¡FAVE ¡

FAVE ¡ ¡

Human ¡word ¡ transcrip8ons ¡ ¡ ¡ + ¡vowel ¡extrac8on ¡ with ¡FAVE ¡

Data ¡

143266 ¡stressed ¡vowel ¡tokens, ¡approx. ¡300 ¡tokens ¡per ¡ vowel ¡per ¡speaker) ¡

Automa8c ¡Speech ¡Recogni8on ¡

Ukerance ¡ Phoneme ¡Sequence ¡ Speech ¡

ASR

ASR ¡System ¡

speech ¡(mostly ¡newswire, ¡some ¡telephone) ¡

English ¡corpora ¡

¡

http://cmusphinx.sourceforge.net

Stressed ¡Vowel ¡Extrac8on ¡

Forced Alignment and Vowel Extraction Speech ¡+ ¡ ASR ¡transcrip8ons ¡ CAVE ¡formants ¡ Speech ¡+ ¡ human ¡transcrip8ons ¡ FAVE ¡formants ¡

Results ¡

Evidence ¡of ¡the ¡Southern ¡Vowel ¡Shie ¡ ¡

Both ¡CAVE ¡and ¡FAVE ¡ ¡ ✔ Clear ¡north-­‑south ¡contrasts ¡in ¡EY/EH ¡ ¡ and ¡IY/IH ¡in ¡the ¡expected ¡direc8ons ¡

✔ EY ¡(bait) ¡and ¡IY ¡(beet): ¡lowered/backed ¡for ¡ southerners ¡ ✔ EH ¡(bet) ¡and ¡IH ¡(bit): ¡raised/fronted ¡for ¡ southerners ¡

Tense/lax ¡shies ¡

Evidence ¡of ¡the ¡Southern ¡Vowel ¡Shie ¡

Both ¡CAVE ¡and ¡FAVE ¡also ¡show ¡Southern ¡ fron8ng ¡of ¡AW, ¡UW, ¡and ¡OW ¡

Fron8ng: ¡UW, ¡AW, ¡OW ¡

FAVE ¡vs. ¡CAVE ¡comparisons ¡

Kendall ¡& ¡Fridland ¡(2012:296) ¡use ¡Euclidean ¡distance ¡between ¡ EH ¡and ¡EY ¡as ¡a ¡measure ¡of ¡the ¡tense/lax ¡shie ¡ ¡

¡

Kendall ¡& ¡Fridland ¡also ¡find ¡EH-­‑EY ¡shie ¡more ¡advanced ¡than ¡IH-­‑IY ¡

Formant ¡Mean ¡Differences ¡ ¡ between ¡FAVE ¡and ¡CAVE ¡

Formant ¡Mean ¡Differences ¡ ¡ between ¡FAVE ¡and ¡CAVE ¡

Formant ¡Mean ¡Differences ¡ ¡ between ¡FAVE ¡and ¡CAVE ¡

Formant ¡Mean ¡Differences ¡ ¡

Differences ¡comparable ¡to ¡findings ¡on ¡ ¡ inter-­‑analyst ¡differences ¡(Evanini ¡2009: ¡92-­‑94) ¡

¡ ¡ ¡

¡ ¡

¡ ¡

¡ ¡ ¡ ¡ ¡

Future ¡Work ¡

– Get ¡mul8ple ¡candidate ¡transcrip8ons ¡and ¡take ¡a ¡ weighted ¡average: ¡more ¡resistance ¡to ¡errors ¡ – Build ¡unified ¡ASR ¡decoding ¡and ¡vowel ¡extrac8on ¡ that ¡directly ¡op8mizes ¡for ¡good ¡formant ¡outputs ¡ rather ¡than ¡transcrip8on ¡

Conclusions ¡

Obama’s ¡Vowel ¡Space ¡from ¡CAVE ¡

2014 ¡State ¡of ¡the ¡Union ¡speech ¡(65 ¡min) ¡

Acknowledgments: This project was supported by the Neukom Institute and the Karen Wetterhahn Award at Dartmouth

Large-‑Scale ¡ ¡ Completely ¡Automated ¡Vowel ¡Extrac8on ¡ ¡

(Rosenfelder, Fruehwald, Evanini & Yuan 2011) 

FAVE: ¡(1) ¡Word-‑Level ¡Transcrip8on ¡

Word-‑Level ¡ Transcrip8on ¡ ASR

Both ¡CAVE ¡and ¡FAVE ¡ ¡ ✔ Clear ¡north-‑south ¡contrasts ¡in ¡EY/EH ¡ ¡ and ¡IY/IH ¡in ¡the ¡expected ¡direc8ons ¡

Kendall ¡& ¡Fridland ¡also ¡find ¡EH-‑EY ¡shie ¡more ¡advanced ¡than ¡IH-‑IY ¡

Differences ¡comparable ¡to ¡findings ¡on ¡ ¡ inter-‑analyst ¡differences ¡(Evanini ¡2009: ¡92-‑94) ¡