Rapid Adaptation of Machine Translation to New Languages Graham - - PowerPoint PPT Presentation

rapid adaptation of machine translation to new languages
SMART_READER_LITE
LIVE PREVIEW

Rapid Adaptation of Machine Translation to New Languages Graham - - PowerPoint PPT Presentation

Rapid Adaptation of Machine Translation to New Languages Graham Neubig, Junjie Hu @ EMNLP 11/2/2018 Inspiration: Rapid Disaster Response - #HiruNews #StandBy


slide-1
SLIDE 1

Rapid Adaptation of Machine Translation to New Languages

Graham Neubig, Junjie Hu @ EMNLP 11/2/2018

slide-2
SLIDE 2

Inspiration: Rapid Disaster Response

Disaster in Sri Lanka

Photo Credit: Wikimedia Commons

නාවලය යටකළ වැ වර - #HiruNews #StandBy ගංවර හා නායයා තවයෙග පතට ප වූ වලට සහන සැපයූ ෙඡා කඩාය දවට ඉලා පළ කළ සමාජ ජාල පවුඩ පසුය නවල දනට ලැ .

slide-3
SLIDE 3

How can we effectively and rapidly adapt MT to new languages?

slide-4
SLIDE 4

Some Crazy Ideas

  • Cross-lingual transfer: can we create a machine

translation system by transferring across language boundaries? [Zoph+16]

  • Zero-shot transfer: can we do it with no data in the low-

resource language?

slide-5
SLIDE 5

Multi-lingual Training

[Firat+16, Johnson+17, Ha+17]

  • Train a large multi-lingual MT system, and apply it to a low-resource language

fra por rus tur bel aze

...

eng

slide-6
SLIDE 6

Two Multilingual Training Paradigms

  • Warm-start training: (indicated w/ "+")
  • We already have some data in the test language
  • Train a model starting with that data
  • Cold-start training: (indicated w/ "-")
  • We initially have no data in the test language
  • Possibilities for completely unsupervised transfer
  • Suitable for rapid adaptation to new languages

fra por rus tur

bel

aze . eng fra por rus tur

bel

aze . eng

x

slide-7
SLIDE 7

Experiments: Training Setting

  • TED multi-lingual corpus (Qi et al. 2018)


https://github.com/neulab/word-embeddings-for-nmt

  • 57 source languages, plus English
  • Testbed languages: Azerbaijani (aze), Belarusian (bel), Galician (glg), Slovak (slk)
  • Related languages: Turkish (tur), Russian (rus), Portuguese (por), Czech (ces)
slide-8
SLIDE 8

Systems

  • Test Systems:
  • Single-source Neural MT (Sing.): Test source language only
  • Bi-source Neural MT (Bi.): Test source language and related source
  • All-source Neural MT (All): All source languages
  • Other Baselines:
  • Phrase-based MT: Shown to be strong in low-resource settings
  • Unsupervised MT [Artetxe+17]: Learn system using only monolingual data in

source/target languages (cited as effective in low-resource settings)

slide-9
SLIDE 9

How does Cross-lingual Transfer Help?

  • Unsupervised translation not competitive
  • Without transfer, NMT worse than PBMT
  • With transfer NMT significantly better (transfer barely helped PBMT)

7.5 15 22.5 30 aze/tur bel/rus glg/por slk/ces

PBMT Unsupervised NMT Sing. NMT Bi+ NMT All+

slide-10
SLIDE 10

How Does Cold-start Compare?

  • Large drop, but still much better than nothing
  • Up to 15 BLEU with no training data in test language

7.5 15 22.5 30 aze/tur bel/rus glg/por slk/ces

NMT Bi+ NMT All+ NMT Bi- NMT All-

slide-11
SLIDE 11

Adaptation to New Languages

  • Training on all languages can be less effective, esp. in cold-start case
  • Can we further adapt to new languages?
  • Problem: overfitting

aze eng

Adaptation (All→Sing.)

tur aze eng

Adaptation w/ Similar Language Regularization (All→Bi.)

fra por rus tur

bel

aze ... eng

Pre-training

slide-12
SLIDE 12

Warm-start + Adaptation

  • Adaptation helps!
  • Helps more w/ similar language regularization

7.5 15 22.5 30 aze/tur bel/rus glg/por slk/ces

NMT Sing. NMT Bi+ NMT All+ All+ -> Sing. All+ -> Bi

slide-13
SLIDE 13

Cold-start + Adaptation

  • Adaptation w/ similar-language regularization gains more
  • Approaches quality of warm-start; doesn't need data a-priori

7.5 15 22.5 30 aze/tur bel/rus glg/por slk/ces

NMT Sing. NMT Bi- NMT All- All- -> Sing. All- -> Bi All+ -> Bi

slide-14
SLIDE 14

How Fast can we Adapt?

1 2 3 4 5 6 7 8 9 10 0.03 0.06 0.09 0.12 0.15 0.18 0.21 Sing. Bi All-→Sing. All-→Bi All-→Bi 1-1 Hours Training BLEU

Cold-start adaptation reaches good point faster than training from scratch

slide-15
SLIDE 15

Take-aways

  • NMT with massively multi-lingual cross-lingual transfer: a stable recipe for low-

resource translation

  • Better results than phrase-based, unsupervised MT in real low-resource

languages

  • Adaptation w/ similar language regularization: simple and effective, even in cold-

start scenarios

https://github.com/neubig/rapid-adaptation

Questions?