Why Neural Translations are the Right Length Xing Shi , Kevin Knight - - PowerPoint PPT Presentation

why neural translations are the right length
SMART_READER_LITE
LIVE PREVIEW

Why Neural Translations are the Right Length Xing Shi , Kevin Knight - - PowerPoint PPT Presentation

Why Neural Translations are the Right Length Xing Shi , Kevin Knight and Deniz Yuret; EMNLP 2016 What is the fundamental question as a PhD student ? How to publish a lot of high-quality papers ? How to graduate in 5 years ? PhD Life MT How


slide-1
SLIDE 1

Why Neural Translations are the Right Length

Xing Shi, Kevin Knight and Deniz Yuret;

EMNLP 2016

slide-2
SLIDE 2

What is the fundamental question as a PhD student ?

slide-3
SLIDE 3

How to publish a lot of high-quality papers ?

slide-4
SLIDE 4

How to graduate in 5 years ?

slide-5
SLIDE 5

How to publish a lot of high-quality papers ? How to graduate in 5 years ?

PhD Life MT

slide-6
SLIDE 6

How to publish a lot of high-quality papers ? How to graduate in 5 years ?

PhD Life MT

H-index || BLEU 5 years || right length

slide-7
SLIDE 7

Language Pairs BLEU Length Ratio (MT output / reference) English => Spanish 31.0 0.97 English => French 29.8 0.96 2-layer 1000 hidden units non-attentional LSTM seq2seq

slide-8
SLIDE 8

English : does he know about phone hacking ? French reference : a-t-il connaissance du piratage téléphonique ? French translation: <UNK> <UNK> <UNK> <UNK> ?

slide-9
SLIDE 9

When to stop PBMT [- - - -] → [- x - -] → [x x x x] Neural MT Word → Word → <EOF>

slide-10
SLIDE 10

When to stop How to generate right length ? PBMT [- - - -] → [- x - -] → [x x x x]

  • word-penalty feature

Neural MT Word → Word → <EOF>

  • no explicit penalty
slide-11
SLIDE 11

When to stop How to generate right length ? Statistical MT [- - - -] → [- x - -] → [x x x x]

  • word-penalty feature
  • MERT

Neural MT Word → Word → <EOF>

  • no explicit penalty
  • MLE
slide-12
SLIDE 12

When to stop How to generate right length ? Statistical MT [- - - -] → [- x - -] → [x x x x]

  • word-penalty feature
  • MERT
  • Heavy beam search

Neural MT Word → Word → <EOF>

  • no explicit penalty
  • MLE
  • light beam search (beam = 10)
slide-13
SLIDE 13

a a a b b <EOS> → a a a b b <EOS> b b a <EOS> → b b a <EOS> Train: 2500 random string Single-layer, 4 hidden states LSTM

Toy Example: String Copy

slide-14
SLIDE 14

b a <s> b a b a <EOF>

Ct = [-2.1 2 0.5 0.6]

Toy Example: String Copy

slide-15
SLIDE 15

Ct involves only elementwise + and x. Toy Example: String Copy

slide-16
SLIDE 16

x-axis: unit_1 y-axis: unit_2

slide-17
SLIDE 17

x-axis: unit_1 y-axis: unit_2

slide-18
SLIDE 18

x-axis: unit_1 y-axis: unit_2

slide-19
SLIDE 19

x-axis: unit_1 y-axis: unit_2

slide-20
SLIDE 20

x-axis: unit_1 y-axis: unit_2 unit1 = -len(input_string)

slide-21
SLIDE 21

Encoding Cell State unit_1 decrease by 1.0

<s> b b b a b a → <s> b b b a b a <EOF> Toy Example: String Copy

slide-22
SLIDE 22

Encoding Cell State unit_1 decrease by 1.0

Toy Example: String Copy <s> b b b a b a → <s> b b b a b a <EOF>

Decoding Cell State unit_1 increase by 1.0

slide-23
SLIDE 23

English => French 1000 hidden units LSTM 2 layers Non-attention BLEU = 29.8

Full Scale NMT

slide-24
SLIDE 24

Sentence_i It is raining right now Y 1 2 3 4 5 X 1000 cell states 1000 cell states 1000 cell states 1000 cell states 1000 cell states Y = w1 * X1 + w2 * X2 + … + w1000 * X1000 + b In total 143,379 (Y, X)

Full Scale NMT

slide-25
SLIDE 25

Y = w1 * X1 + w2 * X2 + … + w1000 * X1000 + b

R2 1000 units in lower-layer 0.990 1000 units in upper-layer 0.981

Full Scale NMT

slide-26
SLIDE 26

Full Scale NMT

slide-27
SLIDE 27

Encoding Unit 109 and 334 decrease from above zero Decoding Increase during decoding, once they are above zero, the model is ready to generate <EOS>.

slide-28
SLIDE 28

Toy Example Full Scale NMT Who Unit1 controls the length Unit109 and Unit334 contributes to the length How

Conclusion

slide-29
SLIDE 29

Thanks and QA