11-731 Machine Translation Speech 2 Speech Translation Speech - - PowerPoint PPT Presentation
11-731 Machine Translation Speech 2 Speech Translation Speech - - PowerPoint PPT Presentation
11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System
SLIDE 1
SLIDE 2
Speech Translation
- Three part systems
Three part systems
- ASR
ASR -
- > Translation
> Translation -
- > TTS
> TTS
- System configurations
System configurations
- One way
One way – – phrasal phrasal
- One way
One way – – broadcast/lecture broadcast/lecture
- 1.5 way
1.5 way – – phrasal with limited answers phrasal with limited answers
- Two way
Two way – – full two way full two way
SLIDE 3
Machine Translation Technologies
- Phrasal
Phrasal
- Phrase to phrase look up
Phrase to phrase look up
- Template:
Template:
- Template fillers, fixed translation
Template fillers, fixed translation
- Interlingua
Interlingua
- Translation into meaning representation
Translation into meaning representation
- Statistical Machine Translation
Statistical Machine Translation
- From large collect of parallel text
From large collect of parallel text
- Classification base translation
Classification base translation
- Identify classes and deal directly with them
Identify classes and deal directly with them
SLIDE 4
Simple Translation
- Phrase to Phrase
Phrase to Phrase
- Greetings
Greetings
- Do you need medical attention?
Do you need medical attention?
- Relatively easy to build, but limited use
Relatively easy to build, but limited use
- Template translations
Template translations
- The next train leaves at TIME from gate
The next train leaves at TIME from gate GATE GATE form PLACE form PLACE
- Limited but still useful
Limited but still useful
SLIDE 5
SPEECH Translation
- Speech isn’t text
Speech isn’t text
- Different style, hard to find lots of
Different style, hard to find lots of exaples exaples
- Speech isn’t fluent
Speech isn’t fluent
- False starts, hesitations, ungrammatical
False starts, hesitations, ungrammatical
- ASR never makes errors
ASR never makes errors ☺ ☺
SLIDE 6
One Way: Broadcast
- One speaker
One speaker
- Lecturer: can modify language model
Lecturer: can modify language model
- Multiple speakers
Multiple speakers
- May be repeat speakers (News Anchor)
May be repeat speakers (News Anchor)
- May had other noises: music etc
May had other noises: music etc
- (TV programs)
(TV programs)
- Doesn’t need to be real time (maybe)
Doesn’t need to be real time (maybe)
SLIDE 7
One Way: “Dialogue”
- Voxtec’s Phraselator
– One way communication – Recognized “fixed” phrases – Lookup for translations – *Very* fast deployment for new languages.
SLIDE 8
Two Way: Dialog
- Users can detect own errors and correct
Users can detect own errors and correct
- Needs to be real time
Needs to be real time
- One user may be much more familiar
One user may be much more familiar
- How do you teach the other user
How do you teach the other user
- Typically domain directed
Typically domain directed
SLIDE 9
Two way: Dialog
CMU System: Janus PDA version CMU SMT Cepstral Synthesis Mobile Tech models Platform: COTS PDA (Ipaq) VoxTec P2 Language: Iraqi/English, Thai/English Chinese, Japanese etc
SLIDE 10
Speech Technology Issues
- ASR:
ASR:
- Disfluencies
Disfluencies, dialects, speaking style , dialects, speaking style
- Unfamiliarity with system
Unfamiliarity with system
- TTS:
TTS:
- MT output isn’t always fluent
MT output isn’t always fluent
- TTS says it anyway
TTS says it anyway
- Can be hard to understand
Can be hard to understand
SLIDE 11
Speech Technology Issues
- Spoken not Written Languages
Spoken not Written Languages
- Arabic
Arabic vs vs Arabic Dialects Arabic Dialects
- Mixture of languages
Mixture of languages
- Politeness levels
Politeness levels
- Gender in speech
Gender in speech
SLIDE 12
Transtac: Two S2S System
- DARPA developed for
DARPA developed for
- Check points, medical and civil defense
Check points, medical and civil defense
- Requirements
Requirements
- Two way
Two way
- Eyes
Eyes-
- free (no screen)
free (no screen)
- Portable
Portable
- Usable by real
Usable by real usersS usersS
SLIDE 13
Transtac System
Laptop secured in Backpack Optional speech control Push-to-Talk Buttons Close-talking Microphone Small powerful Speakers
SLIDE 14
Transtac System Details
- Two way system
Two way system
- 2 ASR systems: English and Iraqi
2 ASR systems: English and Iraqi
- 2 way statistical translation
2 way statistical translation
- 2 synthesizers
2 synthesizers
- Push
Push-
- to
to-
- talk system
talk system
- (Users don’t like “translate everything mode”)
(Users don’t like “translate everything mode”)
- Echo back ASR result
Echo back ASR result
- And then translation
And then translation
SLIDE 15
Iraqi Language
- Iraqi Arabic is a dialect
Iraqi Arabic is a dialect
- Most Iraqi’s write Modern Standard Arabic
Most Iraqi’s write Modern Standard Arabic
- Most Iraqi’s do not write their own dialect
Most Iraqi’s do not write their own dialect
- No standardized spelling
No standardized spelling
- Transtac
Transtac project invented one project invented one
- But Iraqi’s may not be used to it
But Iraqi’s may not be used to it
- Arabic (MSA and dialects)
Arabic (MSA and dialects)
- Do not write short vowels in words
Do not write short vowels in words
SLIDE 16
Data for Training
- Collected human mediated dialogs
Collected human mediated dialogs
- Human acts as a machine
Human acts as a machine
- Passed a microphone back an forward
Passed a microphone back an forward
- Try to get people not to talk at same time
Try to get people not to talk at same time
- Large number of collections (over 4 years)
Large number of collections (over 4 years)
- 650 thousand sentences pairs
650 thousand sentences pairs
- Many different speakers
Many different speakers
- Hand transcribed by experts (in Iraqi spelling)
Hand transcribed by experts (in Iraqi spelling)
- Hand translate (Source sentences and Interpreter’s)
Hand translate (Source sentences and Interpreter’s)
SLIDE 17
Iraqi ASR
- Acoustic model from Iraqi data
Acoustic model from Iraqi data
- Based on MSA
Based on MSA phoneset phoneset
- Needs to be small fast models
Needs to be small fast models
- Discriminative Training
Discriminative Training
- Speaker specific adaptation
Speaker specific adaptation
- Lexicon
Lexicon
- Based on LDC provided lexicon
Based on LDC provided lexicon
- Multiple pronunciations/typos still a problem
Multiple pronunciations/typos still a problem
- Statistically trained LTS rules
Statistically trained LTS rules
- Language Model
Language Model
- Trained on Iraqi input (and translated output)
Trained on Iraqi input (and translated output)
SLIDE 18
English ASR
- Acoustic model
Acoustic model
- Originally using other models
Originally using other models
- Then trained from collected data
Then trained from collected data
- (Mostly military personnel)
(Mostly military personnel)
- Lexicon
Lexicon
- Existing lexicon but needed to add Military speak:
Existing lexicon but needed to add Military speak: MRAP, IED MRAP, IED
- Language model
Language model
- Trained from data provided
Trained from data provided
- Trained from “similar” data found on the web
Trained from “similar” data found on the web
- Training from hand created “typical” examples
Training from hand created “typical” examples
SLIDE 19
TTS
- Standard English TTS
Standard English TTS
- Appropriate “command” voice
Appropriate “command” voice
- Unit selection
Unit selection
- Added lots of military vocabulary
Added lots of military vocabulary
- Iraqi TTS
Iraqi TTS
- Recorded from Iraqi radio announcer
Recorded from Iraqi radio announcer
- Based on example sentences in the domain
Based on example sentences in the domain
- LDC lexicon and LTS rules (same as ASR)
LDC lexicon and LTS rules (same as ASR)
- Hand tuned
Hand tuned
SLIDE 20
S2S Interface Issues
- How do you teach people to use the system
How do you teach people to use the system
- “
“Transtac Transtac say instructions” say instructions”
- Not really sufficient
Not really sufficient
- How can you tell it translated correctly
How can you tell it translated correctly
- Give (speech) feedback.
Give (speech) feedback.
Backtranslation
Backtranslation
ASR echo back
ASR echo back
SLIDE 21
S2S Interface Issues
- How do you translate names
How do you translate names
- A correct translation/transliteration is hard to
A correct translation/transliteration is hard to understand understand
- Mark names in translations
Mark names in translations
- “My name is … Abdullah”
“My name is … Abdullah”
- “He lives on … al
“He lives on … al-
- Aqar
Aqar … street” … street”
SLIDE 22
S2S Evaluation (Transtac)
- Offline tests
Offline tests
- ASR
ASR-
- >Text and Text
>Text and Text-
- >Text
>Text
- Compare to translation references
Compare to translation references
- WER and “BLEU” score
WER and “BLEU” score
- Online tests
Online tests
- Concept transfer (through defined scenarios)
Concept transfer (through defined scenarios)
- Speed (number of concepts per minute)
Speed (number of concepts per minute)
- (English speech masking)
(English speech masking)
- Utility tests
Utility tests
- Does it really work
Does it really work
SLIDE 23
Transtac Participants
- Developer groups
Developer groups
- IBM
IBM
- SRI
SRI
- BBN
BBN
- CMU
CMU
- USC
USC
- Evaluations
Evaluations
- Twice a year in Iraqi (somewhere in DC)
Twice a year in Iraqi (somewhere in DC)
- One surprise language (Farsi,
One surprise language (Farsi, Bahasa Bahasa Malay) Malay)
- Other evaluations with military groups
Other evaluations with military groups
SLIDE 24
Does it work??
- Yes, mostly
Yes, mostly
- 27 concepts out of 30
27 concepts out of 30-
- ish turns
ish turns
- Systems are mostly similar
Systems are mostly similar
- But some better than others
But some better than others
- Other techniques
Other techniques
- Belt/holster based PC with handheld speaker
Belt/holster based PC with handheld speaker
- Small PC in pouch
Small PC in pouch
- Chest mounted array microphone
Chest mounted array microphone
SLIDE 25
S2S ASR Advanced issues
- Tight coupling
Tight coupling
- ASR should output N
ASR should output N-
- best
best
- Translated all (lattice)
Translated all (lattice)
- Choose best translation
Choose best translation
- (MT as a LM for ASR)
(MT as a LM for ASR)
- Remove
Remove disfluencies/hestitations disfluencies/hestitations
- Add more relevant data
Add more relevant data
- Automatically convert past tense/third person data to
Automatically convert past tense/third person data to present tense/ present tense/first+second first+second person … person …
SLIDE 26
S2S TTS Advance Issues
- MT output isn’t grammatical
MT output isn’t grammatical
- TTS doesn’t care and just says it
TTS doesn’t care and just says it
- TTS should try to say MT output with more
TTS should try to say MT output with more breaks. breaks.
- TTS (unit selection)
TTS (unit selection)
- As a LM on MT output
As a LM on MT output
- Choose the best translation on what is said best
Choose the best translation on what is said best
SLIDE 27