[PPT] - 11-731 Machine Translation Speech 2 Speech Translation Speech PowerPoint Presentation

SLIDE 1

11-731 Machine Translation

Speech 2 Speech Translation

SLIDE 2

Speech Translation

Three part systems

Three part systems

ASR

ASR -

> Translation

> Translation -

> TTS

> TTS

System configurations

System configurations

One way

One way – – phrasal phrasal

One way

One way – – broadcast/lecture broadcast/lecture

1.5 way

1.5 way – – phrasal with limited answers phrasal with limited answers

Two way

Two way – – full two way full two way

SLIDE 3

Machine Translation Technologies

Phrasal

Phrasal

Phrase to phrase look up

Phrase to phrase look up

Template:

Template:

Template fillers, fixed translation

Template fillers, fixed translation

Interlingua

Interlingua

Translation into meaning representation

Translation into meaning representation

Statistical Machine Translation

Statistical Machine Translation

From large collect of parallel text

From large collect of parallel text

Classification base translation

Classification base translation

Identify classes and deal directly with them

Identify classes and deal directly with them

SLIDE 4

Simple Translation

Phrase to Phrase

Phrase to Phrase

Greetings

Greetings

Do you need medical attention?

Do you need medical attention?

Relatively easy to build, but limited use

Relatively easy to build, but limited use

Template translations

Template translations

The next train leaves at TIME from gate

The next train leaves at TIME from gate GATE GATE form PLACE form PLACE

Limited but still useful

Limited but still useful

SLIDE 5

SPEECH Translation

Speech isn’t text

Speech isn’t text

Different style, hard to find lots of

Different style, hard to find lots of exaples exaples

Speech isn’t fluent

Speech isn’t fluent

False starts, hesitations, ungrammatical

False starts, hesitations, ungrammatical

ASR never makes errors

ASR never makes errors ☺ ☺

SLIDE 6

One Way: Broadcast

One speaker

One speaker

Lecturer: can modify language model

Lecturer: can modify language model

Multiple speakers

Multiple speakers

May be repeat speakers (News Anchor)

May be repeat speakers (News Anchor)

May had other noises: music etc

May had other noises: music etc

(TV programs)

(TV programs)

Doesn’t need to be real time (maybe)

Doesn’t need to be real time (maybe)

SLIDE 7

One Way: “Dialogue”

Voxtec’s Phraselator

– One way communication – Recognized “fixed” phrases – Lookup for translations – Very fast deployment for new languages.

SLIDE 8

Two Way: Dialog

Users can detect own errors and correct

Users can detect own errors and correct

Needs to be real time

Needs to be real time

One user may be much more familiar

One user may be much more familiar

How do you teach the other user

How do you teach the other user

Typically domain directed

Typically domain directed

SLIDE 9

Two way: Dialog

CMU System: Janus PDA version CMU SMT Cepstral Synthesis Mobile Tech models Platform: COTS PDA (Ipaq) VoxTec P2 Language: Iraqi/English, Thai/English Chinese, Japanese etc

SLIDE 10

Speech Technology Issues

ASR:

ASR:

Disfluencies

Disfluencies, dialects, speaking style , dialects, speaking style

Unfamiliarity with system

Unfamiliarity with system

TTS:

TTS:

MT output isn’t always fluent

MT output isn’t always fluent

TTS says it anyway

TTS says it anyway

Can be hard to understand

Can be hard to understand

SLIDE 11

Speech Technology Issues

Spoken not Written Languages

Spoken not Written Languages

Arabic

Arabic vs vs Arabic Dialects Arabic Dialects

Mixture of languages

Mixture of languages

Politeness levels

Politeness levels

Gender in speech

Gender in speech

SLIDE 12

Transtac: Two S2S System

DARPA developed for

DARPA developed for

Check points, medical and civil defense

Check points, medical and civil defense

Requirements

Requirements

Two way

Two way

Eyes

Eyes-

free (no screen)

free (no screen)

Portable

Portable

Usable by real

Usable by real usersS usersS

SLIDE 13

Transtac System

Laptop secured in Backpack Optional speech control Push-to-Talk Buttons Close-talking Microphone Small powerful Speakers

SLIDE 14

Transtac System Details

Two way system

Two way system

2 ASR systems: English and Iraqi

2 ASR systems: English and Iraqi

2 way statistical translation

2 way statistical translation

2 synthesizers

2 synthesizers

Push

Push-

to

to-

talk system

talk system

(Users don’t like “translate everything mode”)

(Users don’t like “translate everything mode”)

Echo back ASR result

Echo back ASR result

And then translation

And then translation

SLIDE 15

Iraqi Language

Iraqi Arabic is a dialect

Iraqi Arabic is a dialect

Most Iraqi’s write Modern Standard Arabic

Most Iraqi’s write Modern Standard Arabic

Most Iraqi’s do not write their own dialect

Most Iraqi’s do not write their own dialect

No standardized spelling

No standardized spelling

Transtac

Transtac project invented one project invented one

But Iraqi’s may not be used to it

But Iraqi’s may not be used to it

Arabic (MSA and dialects)

Arabic (MSA and dialects)

Do not write short vowels in words

Do not write short vowels in words

SLIDE 16

Data for Training

Collected human mediated dialogs

Collected human mediated dialogs

Human acts as a machine

Human acts as a machine

Passed a microphone back an forward

Passed a microphone back an forward

Try to get people not to talk at same time

Try to get people not to talk at same time

Large number of collections (over 4 years)

Large number of collections (over 4 years)

650 thousand sentences pairs

650 thousand sentences pairs

Many different speakers

Many different speakers

Hand transcribed by experts (in Iraqi spelling)

Hand transcribed by experts (in Iraqi spelling)

Hand translate (Source sentences and Interpreter’s)

Hand translate (Source sentences and Interpreter’s)

SLIDE 17

Iraqi ASR

Acoustic model from Iraqi data

Acoustic model from Iraqi data

Based on MSA

Based on MSA phoneset phoneset

Needs to be small fast models

Needs to be small fast models

Discriminative Training

Discriminative Training

Speaker specific adaptation

Speaker specific adaptation

Lexicon

Lexicon

Based on LDC provided lexicon

Based on LDC provided lexicon

Multiple pronunciations/typos still a problem

Multiple pronunciations/typos still a problem

Statistically trained LTS rules

Statistically trained LTS rules

Language Model

Language Model

Trained on Iraqi input (and translated output)

Trained on Iraqi input (and translated output)

SLIDE 18

English ASR

Acoustic model

Acoustic model

Originally using other models

Originally using other models

Then trained from collected data

Then trained from collected data

(Mostly military personnel)

(Mostly military personnel)

Lexicon

Lexicon

Existing lexicon but needed to add Military speak:

Existing lexicon but needed to add Military speak: MRAP, IED MRAP, IED

Language model

Language model

Trained from data provided

Trained from data provided

Trained from “similar” data found on the web

Trained from “similar” data found on the web

Training from hand created “typical” examples

Training from hand created “typical” examples

SLIDE 19

TTS

Standard English TTS

Standard English TTS

Appropriate “command” voice

Appropriate “command” voice

Unit selection

Unit selection

Added lots of military vocabulary

Added lots of military vocabulary

Iraqi TTS

Iraqi TTS

Recorded from Iraqi radio announcer

Recorded from Iraqi radio announcer

Based on example sentences in the domain

Based on example sentences in the domain

LDC lexicon and LTS rules (same as ASR)

LDC lexicon and LTS rules (same as ASR)

Hand tuned

Hand tuned

SLIDE 20

S2S Interface Issues

How do you teach people to use the system

How do you teach people to use the system

“

“Transtac Transtac say instructions” say instructions”

Not really sufficient

Not really sufficient

How can you tell it translated correctly

How can you tell it translated correctly

Give (speech) feedback.

Give (speech) feedback.

  Backtranslation

Backtranslation

  ASR echo back

ASR echo back

SLIDE 21

S2S Interface Issues

How do you translate names

How do you translate names

A correct translation/transliteration is hard to

A correct translation/transliteration is hard to understand understand

Mark names in translations

Mark names in translations

“My name is … Abdullah”

“My name is … Abdullah”

“He lives on … al

“He lives on … al-

Aqar

Aqar … street” … street”

SLIDE 22

S2S Evaluation (Transtac)

Offline tests

Offline tests

ASR

ASR-

>Text and Text

>Text and Text-

>Text

>Text

Compare to translation references

Compare to translation references

WER and “BLEU” score

WER and “BLEU” score

Online tests

Online tests

Concept transfer (through defined scenarios)

Concept transfer (through defined scenarios)

Speed (number of concepts per minute)

Speed (number of concepts per minute)

(English speech masking)

(English speech masking)

Utility tests

Utility tests

Does it really work

Does it really work

SLIDE 23

Transtac Participants

Developer groups

Developer groups

IBM

IBM

SRI

SRI

BBN

BBN

CMU

CMU

USC

USC

Evaluations

Evaluations

Twice a year in Iraqi (somewhere in DC)

Twice a year in Iraqi (somewhere in DC)

One surprise language (Farsi,

One surprise language (Farsi, Bahasa Bahasa Malay) Malay)

Other evaluations with military groups

Other evaluations with military groups

SLIDE 24

Does it work??

Yes, mostly

Yes, mostly

27 concepts out of 30

27 concepts out of 30-

ish turns

ish turns

Systems are mostly similar

Systems are mostly similar

But some better than others

But some better than others

Other techniques

Other techniques

Belt/holster based PC with handheld speaker

Belt/holster based PC with handheld speaker

Small PC in pouch

Small PC in pouch

Chest mounted array microphone

Chest mounted array microphone

SLIDE 25

S2S ASR Advanced issues

Tight coupling

Tight coupling

ASR should output N

ASR should output N-

best

best

Translated all (lattice)

Translated all (lattice)

Choose best translation

Choose best translation

(MT as a LM for ASR)

(MT as a LM for ASR)

Remove

Remove disfluencies/hestitations disfluencies/hestitations

Add more relevant data

Add more relevant data

Automatically convert past tense/third person data to

Automatically convert past tense/third person data to present tense/ present tense/first+second first+second person … person …

SLIDE 26

S2S TTS Advance Issues

MT output isn’t grammatical

MT output isn’t grammatical

TTS doesn’t care and just says it

TTS doesn’t care and just says it

TTS should try to say MT output with more

TTS should try to say MT output with more breaks. breaks.

TTS (unit selection)

TTS (unit selection)

As a LM on MT output

As a LM on MT output

Choose the best translation on what is said best

Choose the best translation on what is said best

SLIDE 27

11-731 Machine Translation

Speech 2 Speech Translation

Speech Translation

Three part systems

ASR -

> Translation -

> TTS

System configurations

One way – – phrasal phrasal

One way – – broadcast/lecture broadcast/lecture

1.5 way – – phrasal with limited answers phrasal with limited answers

Two way – – full two way full two way

Machine Translation Technologies

Phrasal

Phrase to phrase look up

Template:

Template fillers, fixed translation

Interlingua

Translation into meaning representation

Statistical Machine Translation

From large collect of parallel text

Classification base translation

Identify classes and deal directly with them

Simple Translation

Phrase to Phrase

Greetings

Do you need medical attention?

Relatively easy to build, but limited use

Template translations

The next train leaves at TIME from gate GATE GATE form PLACE form PLACE

Limited but still useful

SPEECH Translation

Speech isn’t text

Different style, hard to find lots of exaples exaples

Speech isn’t fluent

False starts, hesitations, ungrammatical

ASR never makes errors ☺ ☺

One Way: Broadcast

One speaker

Lecturer: can modify language model

Multiple speakers

May be repeat speakers (News Anchor)

May had other noises: music etc

(TV programs)

Doesn’t need to be real time (maybe)

One Way: “Dialogue”

– One way communication – Recognized “fixed” phrases – Lookup for translations – *Very* fast deployment for new languages.

Two Way: Dialog

Users can detect own errors and correct

Needs to be real time

One user may be much more familiar

How do you teach the other user

Typically domain directed

Two way: Dialog

CMU System: Janus PDA version CMU SMT Cepstral Synthesis Mobile Tech models Platform: COTS PDA (Ipaq) VoxTec P2 Language: Iraqi/English, Thai/English Chinese, Japanese etc

Speech Technology Issues

ASR:

Disfluencies, dialects, speaking style , dialects, speaking style

Unfamiliarity with system

TTS:

MT output isn’t always fluent

TTS says it anyway

Can be hard to understand

Speech Technology Issues

Spoken not Written Languages

Arabic vs vs Arabic Dialects Arabic Dialects

Mixture of languages

Politeness levels

Gender in speech

Transtac: Two S2S System

DARPA developed for

Check points, medical and civil defense

Requirements

Two way

Eyes-

free (no screen)

Portable

Usable by real usersS usersS

Transtac System

Transtac System Details

– One way communication – Recognized “fixed” phrases – Lookup for translations – Very fast deployment for new languages.