New Territory of Machine Translation Kyunghyun Cho Courant - - PowerPoint PPT Presentation

new territory of machine translation
SMART_READER_LITE
LIVE PREVIEW

New Territory of Machine Translation Kyunghyun Cho Courant - - PowerPoint PPT Presentation

New Territory of Machine Translation Kyunghyun Cho Courant Institute of Mathematical Sciences & Center for Data Science, New York University I really enjoyed this film. However, that is on the basis that Science Fiction is one of my


slide-1
SLIDE 1

New Territory of Machine Translation

Kyunghyun Cho

Courant Institute of Mathematical Sciences & Center for Data Science, New York University

slide-2
SLIDE 2
slide-3
SLIDE 3

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

slide-4
SLIDE 4

However, that is on the basis that Science Fiction is one of my favourite genres:

(However, ,, that, is, on, the, basis, that, Science, Fiction, is, one, of, my, favourite, genres, :)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(Cependant, ,, ce, qui, est, sur, la, base, que, la, science-fiction, est, un, de, mes, genres, préférés, :) Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés:

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

slide-5
SLIDE 5

Do you see three issues here?

slide-6
SLIDE 6

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

S e n t e n c e

  • w

i s e T r a n s l a t i

  • n
slide-7
SLIDE 7

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

Word-level Translation

slide-8
SLIDE 8

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

Bilingual Translation

slide-9
SLIDE 9

Word-level Sentence-wise Bilingual Translation

slide-10
SLIDE 10

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

slide-11
SLIDE 11

Neural Machine Translation

  • Input: a source sentence

  • Output: a target sentence

  • Data: Parallel corpus of sentences pairs

  • Goal: maximize the log-likelihood

Y = (y1, y2, . . . , yTy)

X = (x1, x2, . . . , xTx) {(X1, Y1), (X2, Y2), . . . , (XN, YN)}

1 N

N

X

n=1 Ty,n

X

t=1

log p(yn

t |yn <t, Xn)

h1 h2 h3 h4

x1 x2 x3 x4

= c

hsi y1 y2 y3 y4 z1

z2

z3

z4

z0

slide-12
SLIDE 12

Neural Machine Translation

  • Input and Output: sequences of one-hot vectors
  • One-hot vectors for words
  • Example) “cat”
  • No prior encoded
  • Permutation invariant

h1 h2 h3 h4

x1 x2 x3 x4

= c

hsi y1 y2 y3 y4 z1

z2

z3

z4

z0

ID Word 1 the 2 a 2093 cat

ecat =             0, . . ., 0, 1, 0, . . .,            

>

2093-th element

slide-13
SLIDE 13

Neural Machine Translation

  • 1. Encode the source sentence into a vector


  • 2. Initialize the decoder based on 

  • 3. Update the decoder conditioned on

  • 4. Compute the target word distribution

  • 5. Sample and go back to 3 unless

h1 h2 h3 h4

x1 x2 x3 x4

= c

hsi y1 y2 y3 y4 z1

z2

z3

z4

z0

ht = ⇢ φenc(ht−1, xt), if t > 0 0,

  • therwise

c = hTx

z0 = finit(c)

c

zt = φdec(zt−1, c, yt−1)

p(yt|y<t, X) ∝ exp(φout(zt))

yt

yt = heosi

slide-14
SLIDE 14

Neural Machine Translation

This is not too great a model, because

h1 h2 h3 h4

x1 x2 x3 x4

= c

hsi y1 y2 y3 y4 z1

z2

z3

z4

z0

“You can’t cram the meaning of a whole %&!$# sentence into a single $&!#* vector!”

Ray Mooney

slide-15
SLIDE 15

Attention-based Neural MT

  • Encoder: Bidirectional RNN
  • Context-Dependent Word Vectors
  • Disambiguation of words’ meaning
  • I have a car vs. I have been a researcher
  • Distinguishing multiple occurrences of a single word
  • A black cat is chasing a white cat together with a brown cat.

x1 x2 x3 x4

− → h 1 − → h 2 − → h 3 − → h 4 ← − h 1 ← − h 2

← − h 3 ← − h 4

h1 h2 h3 h4 = c

slide-16
SLIDE 16

Attention-based Neural MT

  • Attention Mechanism
  • How relevant is given ?
  • : what has been translated so far
  • Normalized to sum to 1

x1 x2 x3 x4

− → h 1 − → h 2

− → h 3 − → h 4

← − h 1 ← − h 2

← − h 3 ← − h 4

h1 h2 h3 h4

= c hsi

y1 y2 z1 z2 z0

fscore e2,3

hj

zt−1

zt−1

αj,t = exp(ej,t) P

j0 exp(ej0,t)

slide-17
SLIDE 17

Attention-based Neural MT

  • Decoder
  • Dynamic context
  • As usual with the simple decoder

x1 x2 x3 x4

− → h 1 − → h 2

− → h 3 − → h 4

← − h 1 ← − h 2

← − h 3 ← − h 4

h1

h2

h3 h4 = c hsi y1 y2 z1 z2 z0 z3

+

α1,3 α2,3 α3,3 α4,3

ct

ct = X

j

αj,thj

p(yt|y<t, X) ∝ exp(φout(zt)) zt = φdec(zt−1, ct, yt−1)

slide-18
SLIDE 18

Word-level Sentence-wise Bilingual Translation Subword-level

slide-19
SLIDE 19

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

Word-level Translation

slide-20
SLIDE 20

Why do we use tokens (words)?

  • Input and Output: sequences of one-hot vectors
  • Issues with word-level modelling
  • “산업통산자원부” => 


“산업” “통산” “자원” “부” =>
 Ministry of Trade, Industry and Energy

  • “run”, “runner”, “ran”, “running”: 


all independent from each other?

  • 我高能里: What if no blank spaces?
  • Typo?

ID Word 1 the 2 a 2093 cat

ecat =             0, . . ., 0, 1, 0, . . .,            

>

2093-th element

slide-21
SLIDE 21

−33 −32 −31 −30 −29 −28 −27 −13 −12.5 −12 −11.5 −11 −10.5

Canada European Europe Union Canadian EU Africa Assembly African North Germany Kingdom Ontario Iraq British Japan

Can we use something smaller?

  • Input and Output: sequences of one-hot vectors
  • What does a neural net do with one-hot vectors?
  • It learns the vector of each index that

preserves its important similarities according to the objective.

  • It does not care whether each index

corresponds to a word.

ID Word 1 the 2 a 2093 cat

ecat =             0, . . ., 0, 1, 0, . . .,            

>

2093-th element

slide-22
SLIDE 22

Can we use something smaller?

  • Input and Output: sequences of one-hot vectors
  • Can we just use characters directly? — Maybe. Maybe not.
  • Orthography => Meaning: highly nonlinear
  • example: quite, quiet, quit
  • Computation complexity increases greatly
  • English sentence: 


23 words/sentence vs. 115 letters/sentence

slide-23
SLIDE 23

Encoding a character sequence

  • Yoon Kim et al. (AAAI 2015)
  • 1. Segment the sentence into a sequence of words

  • 2. Each word is a sequence of characters

  • 3. n-gram convolutions + max-pooling
  • 4. Highway network
  • Ling et al. (EMNLP 2015, arXiv 2015), Ballesteros et al.

(EMNLP 2015): with recurrent nets

  • Zhang et al. (NIPS 2015): no word segmentation

X = (x1, x2, . . . , xTx)

xt = (ct

1, ct 2, . . . , ct Tt)

slide-24
SLIDE 24

Decoding a character sequence

  • Wang Ling et al. (arXiv 2015)
  • Word-level recurrent net

  • Character-level recurrent net
  • 1. Given the word vector
  • 2. Update the state

  • 3. Decode the next character

  • 4. Repeat 2—3, if is not <eow>

ht = φword(ht−1, f(xt))

ht

zt = φchar(zt−1, ct−1) p(ct| . . .) ∝ exp (gct(zt))

˜ ct

slide-25
SLIDE 25

Has it been solved? — No

  • Two research directions
  • 1. Can we segment and encode*/decode

simultaneously, or avoid segmentation in general?

  • 2. Is the character the ultimate level?
  • Some say “bytes” (Gillick et al., arXiv 2015)
  • But, not for all languages,
  • Korean: 


consonant+vowel(+consonant) => syllable
 Unicode encodes each and every syllable

경 ㄱ ㅕ ㅇ 역

* (Zhang et al., NIPS 2015)

slide-26
SLIDE 26

Subword-level Sentence-wise Bilingual Translation Larger-Context

slide-27
SLIDE 27

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

S e n t e n c e

  • w

i s e T r a n s l a t i

  • n
slide-28
SLIDE 28

Let’s talk about Q&A

  • Input and Output: Question and Answer
  • Question: Where did James go after he went to the grocery store?
  • Answer Candidates
  • 1. his deck
  • 2. his freezer
  • 3. a fast food restaurant
  • 4. his room

Impossible to answer!

slide-29
SLIDE 29

Context matters..

  • Question: Where did James go after he went to the grocery store?
  • Answer Candidates: his deck, his freezer, a fast food restaurant, his room

James the Turtle was always getting in trouble. Sometimes he’d reach into the freezer and empty

  • ut all the food. Other times he’d sled on the deck and get a splinter. His aunt Jane tried as hard as

she could to keep him out of trouble, but he was sneaky and got into lots of trouble behind her back. One day, James thought he would go into town and see what kind of trouble he could get into. He went to the grocery store and pulled all the pudding off the shelves and ate two jars. Then he walked to the fast food restaurant and ordered 15 bags of fries. He didn’t pay, and instead headed home.

slide-30
SLIDE 30

Context matters

  • What does context tell us?
  • Theme of a document
  • What does context tell us in practice?
  • What are the words that are more likely to appear in this document?

While it's not flawless, some motivations and scenarios remain somewhat underdeveloped or questionable; Ex_Machina is a stunning Sci-Fi vision that is also a fully formed thinking man's thriller. With a jaw droopingly good turn from the soon to be megastar Vikander, ____?____ is another excellent example of what makes the ____?____

Context Following Sentence

slide-31
SLIDE 31

Larger-Context Language Modelling

  • Language modelling as “document modelling” instead of “sentence modelling”


(Wang & Cho, arXiv 2015; Ji et al., arXiv 2015)


  • Simplest approach (Wang & Cho, arXiv 2015)
  • Bag of all the words from the 


previous sentences

  • RNN Language model 


conditioned on this bag-of-words P(D) ≈ P(S1)P(S2) · · · P(SN) P(D) ≈ P(S1)P(S2|S1) · · · P(SN|SN−n, . . . , SN−1)

vs. hsi

z1 z2 z3 z4 z0 w1 w2 w3 w4 (Sl−n, Sl−n+1, . . . , Sl−1)

n

slide-32
SLIDE 32

Larger-Context Language Modelling

  • Late Fusion of LSTM (Wang & Cho, arXiv 2015)
  • Let the memory cell model intra-sentence dependencies
  • Let the inter-sentence dependencies be fused in later

Early Fusion Late Fusion

c

slide-33
SLIDE 33

Larger-Context Language Modelling

  • It helps obviously (Wang & Cho, arXiv 2015)
  • Especially with the late fusion of context
slide-34
SLIDE 34

Larger-Context Language Modelling

  • “What are the words that are more likely to appear in this document?”
  • Open-class words: nouns, adjectives, verbs and adverbs

IMDB PTB

slide-35
SLIDE 35

Larger-Context Machine Translation

  • Toward Larger-Context Machine Translation (Jean & Cho, Work in Progress*)
  • How to represent the source and target contexts?
  • Which is conditioned on the context, encoder, decoder or both?

hsi y1 y2 y3 y4

z1 z2 z3

z4

c z0

h1 h2 h3 h4 x1

x2 x3 x4 =

  • Xl−n, Xl−n+1, . . . , Xl−1
  • Y l−n, Y l−n+1, . . . , Y l−1

fsummary

* Unless Chris Dyer uploads it on arXiv tomorrow

slide-36
SLIDE 36

Larger-Context Language Processing

  • Beyond Machine Translation toward Larger-Context Language Processing
  • How do we efficiently learn and represent larger context? 


(1) Hierarchical Model?
 (2) Something other than BPTT?

Document Chapter 1 Chapter 2 Chapter 2 Section 1 Section 2 Section 2 Paragraph 1 Paragraph 2

Summarize! Summarize! Summarize!

slide-37
SLIDE 37

Larger-Context Language Processing

  • Hierarchical Model for Dialogue Modelling (Serbani et al., 2015; Sordoni et al., 2015)


Utterance-level RNN + Dialogue-level RNN

slide-38
SLIDE 38

Larger-Context Language Processing

  • Beyond Machine Translation toward Larger-Context Language Processing
  • How do we blend intra-document context and world knowledge?

Document Chapter 1 Chapter 2 World Knowledge

P(Sl|S1, S2, . . . , Sl−1, D1, D2, . . . , DM)

slide-39
SLIDE 39

Subword-level Larger-Context Bilingual Translation Multilingual

slide-40
SLIDE 40

I really enjoyed this film. (I, really, enjoyed, this, film,.)

Word segmentation, tokenization, …

I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you.

http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv

Machine Translation

(J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

Detokenization, …

Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, alors ce pourrait être pour vous.

Google Translate

Bilingual Translation

slide-41
SLIDE 41

Language Transfer?

  • Does knowing one language F very well help learn a new language E?
  • i.e., is there a positive language transfer from F to E?
  • Unsolved question for both humans and machines…
slide-42
SLIDE 42

Multilingual neural machine translation

  • source languages and target languages
  • A single model for language pairs
  • Bilingual parallel corpora available: no -way parallel corpus assumed
  • Goals, Expectations:
  • 1. Avoid the quadratic explosion of the # of parameters
  • 2. As good quality as single-pair models
  • 3. Better quality for low-resource language pairs: positive language transfer
  • 4. Faster convergence rather a hope!

N M N × M

max {N, M}

slide-43
SLIDE 43
  • Straightforward and beautiful! (Luong et al., arXiv 2015)

Multilingual sequence-to-sequence model

hsi

y1 y2 y3

z1 z2

z3

z0 h1

h2 h3 h4 x1 x2

x3

x4

hsi

y1 y2 y3

z1 z2 z3 z0

h1 h2 h3 h4 x1 x2 x3 x4

Language-agnostic vector space

slide-44
SLIDE 44
  • Straightforward and beautiful! (Luong et al., arXiv 2015)
  • Properties:
  • # of parameters: instead of
  • Language agnostic sentence vectors: Interlingua? Semantics w/o syntax?
  • But, What about the attention mechanism?

Multilingual sequence-to-sequence model

hsi

y1

y2 y3 z1 z2

z3

z0 h1 h2 h3 h4

x1 x2 x3 x4 hsi

y1 y2 y3

z1

z2 z3 z0 h1 h2 h3 h4

x1 x2 x3 x4

O(N × M) O(N + M)

slide-45
SLIDE 45
  • But, What about the attention mechanism?
  • Big Question: Is the attention mechanism universal or language pair-specific?
  • Or, we give up and go for 1-to-N or N-to-1 translation (Dong et al., ACL 2015)

Multilingual sequence-to-sequence model

hsi y1 y2 y3 z1 z2

z3 z0 h1

h2 h3 h4 x1 x2 x3 x4

hsi y1 y2 y3 z1 z2 z3

z0

h1 h2 h3 h4 x1 x2

x3 x4

slide-46
SLIDE 46
  • Empirical verification of the universal attention mechanism (Firat & Cho, Work-in-Progress)
  • En, De, Fi => En, De: trained with (en,de) and (en,fi) parallel corpora

Is the attention mechanism universal?

x1 x2

x3

x4

− → h 1 − → h 2

− → h 3

− → h 4 ← − h 1 ← − h 2 ← − h 3 ← − h 4

h1

h2 h3 h4

English Encoder

x1 x2 x3 x4

− → h 1 − → h 2

− → h 3 − → h 4

← − h 1 ← − h 2

← − h 3 ← − h 4

h1 h2 h3 h4

German Encoder

x1 x2

x3

x4

− → h 1 − → h 2

− → h 3 − → h 4

← − h 1 ← − h 2

← − h 3 ← − h 4

h1 h2 h3 h4

Finnish Encoder

fscore

Shared Attention

hsi y1 y2 z1 z2 z0 z3 α1,3 α2,3 α3,3 α4,3

English Decoder

hsi y1 y2

z1

z2 z0 z3 α1,3 α2,3 α3,3 α4,3

German Decoder

slide-47
SLIDE 47
  • Empirical verification of the universal attention mechanism (Firat & Cho, Work-in-Progress)
  • En, De, Fi => En, De: trained with (en,de) and (en,fi) parallel corpora



 
 
 
 
 


  • It seems like it works (!?) after 8 iterations on the model architecture

Is the attention mechanism universal?

BLEU En->De 21.03 De->En 25.77 Fi->En 13.32

Single-pair models

BLEU Diff En->De 20.45

  • 0.58

De->En 24.53

  • 1.24

Fi->En 14.33 +1.09

One Multilingual Model

slide-48
SLIDE 48
  • Empirical verification of the universal attention mechanism (Firat & Cho, Work-in-Progress)
  • Controlled corpus sizes: 40%, 20%, 10% and 5% of (en,fi) corpus
  • Curves with circles (single-pair model) vs. without any marker (multilingual model)
  • Better generalization: but much to be analyzed in the future

Is there positive language transfer from high- to low-resource pairs?

slide-49
SLIDE 49
  • How far can we push this attention-based model?

Multilingual to Multimodal and Multitask Learning

x1 x2

x3

x4

− → h 1 − → h 2 − → h 3 − → h 4 ← − h 1 ← − h 2 ← − h 3

← − h 4

h1 h2 h3 h4

English Encoder

x1 x2 x3 x4

− → h 1 − → h 2

− → h 3

− → h 4 ← − h 1 ← − h 2 ← − h 3 ← − h 4

h1 h2 h3 h4

German Encoder

Annotation Vectors

hj

j

Σ =1

Convolutional Neural Network

Image Encoder

− → h 1 − → h 2 − → h 3 − → h 4

← − h 1 ← − h 2

← − h 3 ← − h 4

h1 h2 h3 h4

Speech Encoder

English Decoder Finnish Decoder Speech Decoder Image Decoder

fscore

Attention

Modality-agnostic space

slide-50
SLIDE 50

Subword-level Larger-Context Multilingual Translation

New Territory of Machine Translation

slide-51
SLIDE 51

Why is this a talk at a NIPS workshop not at ACL?

slide-52
SLIDE 52

Single Frame Feature Extraction Recognition Engine

Man Woman Motorcycle Bus

Remember this?

slide-53
SLIDE 53

Single Frame Recognition Engine

Man Woman Motorcycle Bus

Deep learning brought us to…

slide-54
SLIDE 54

Single Frame Recognition Engine Man Woman Motorcycle Bus

I see four three issues here as well

  • 1. Feature-level processing
  • 2. Single modality
  • 3. Static scene: what happened before?
  • 4. Recognition engine tried too much: where’s my library/Google?
slide-55
SLIDE 55

This is not enough for AI

  • 1. Multimodal direct perception
  • 2. Long-term environment understanding
  • 3. Knowledge accumulation and search
  • 4. Active discovery/learning*
  • 5. Planning*
  • 6. so on…

* I think DeepMind and Facebook are all doing these…

slide-56
SLIDE 56

MT/NLP as a test bed* for AI

  • 1. Multimodal direct perception
  • 2. Long-term environment understanding
  • 3. Knowledge accumulation and search
  • 4. Active discovery/learning
  • 5. Planning
  • 1. Multilingual Machine Translation
  • 2. Character-Level Language Processing
  • 3. Longer-Context Language Processing
  • 4. Research
  • 5. Creative Writing

(Artificial) Intelligence Natural Language Understanding

* Oh, so many people might hate me, but probably not at NIPS

slide-57
SLIDE 57