Integra(on*of** human*and*machine* transla(on* * Marco'Turchi' - - PDF document

integra on of human and machine transla on
SMART_READER_LITE
LIVE PREVIEW

Integra(on*of** human*and*machine* transla(on* * Marco'Turchi' - - PDF document

Integra(on*of** human*and*machine* transla(on* * Marco'Turchi' Fondazione'Bruno'Kessler' Trento,'January'2014' Slides'by:' 1' Marcello'Federico'and'Ma2eo'Negri'' Motivation ! Human translation (HT) worldwide demand for translation


slide-1
SLIDE 1

Integra(on*of** human*and*machine* transla(on* *

Slides'by:' Marcello'Federico'and'Ma2eo'Negri''

1'

Marco'Turchi' Fondazione'Bruno'Kessler' Trento,'January'2014'

Motivation

! Human translation (HT)

– worldwide demand for translation services has accelerated,

due to globalization and growth of the Information Society

! Gap between MT and HT – MT has improved significantly but independently from HT – MT research has not directly addressed how to improve HT – Today professional translators barely use MT ! The unavoidable adoption of MT – Post-editing experiments have shown great promise – Integration of HT and MT is still an open problem!'

2'

slide-2
SLIDE 2

! How do human translators work? ! What tools do they use? ! How is productivity measured? ! How can MT help human translators? ! What are important problems to solve? ! Why should MT researchers care? ! Why should translators care?

Questions

3'

! Typical translation-industry workflow ! Computer assisted translation tools ! Simple MT-CAT integration ! The MateCat project ! research challenges ! new MT features ! Matecat tool ! case studies ! Matecat activities! ! Conclusions

Outline

4'

slide-3
SLIDE 3

Translation Project

Language Service Provider

All our translators got a CAT tool!

Scenario

5'

Scenario

I’m'the'' project' manager'

6'

slide-4
SLIDE 4

Computer Assisted Translation (CAT) is the dominant technology in the translation industry

CAT tools: special text editors supporting many document formats and integrating information from different sources.

7'

! Source/target text is split into segments

! Translation progresses segment by segment ! Provides helps from different sources:

! spell checkers ! dictionaries ! terminology managers ! concordancers ! translation memory (TM) ! and recently machine translation (MT)

CAT Tools

8'

slide-5
SLIDE 5

CAT*Tool*

9'

Vanilla CAT Tool

10'

slide-6
SLIDE 6

!

Terms: words and compound words that in specific contexts have specific meanings

!

e.g. “mouse” in Agriculture vs Information Technology (IT)

!

Termbase: database consisting of terms and related information, usually in multilingual format.

!

e.g.

Terminology*

11'

Term* Domain* It* Es* Fr* mouse' agriculture' topo' ratón' …' mouse' IT' mouse' ratón' …' file' Legal' archivio' archivador' …' file' IT' file' archivio' …'

Terminology*

Terminology database Term: Source: Target:

IT-Italian EN-English concorrenza sleale

Domain:

LAW Done Search Term reference

Enc Giuridica,Treccani,Roma,vol.VII,1988,s.v.concorrenza II;Codice Civile art.2598

Italiano 29/09/2009 Date Domain: LAW Term Reliability 3 (reliable) concorrenza sleale Term unfair competition English 29/09/2009 Date Definition Definition reference 3 (Dict of Accounting,Collin-Joliffe,1992 )

an attempt to do better than another company by using techniques which are not fair,such as importing foreign goods at very low prices or by wrongly criticising a competitor's products

12'

slide-7
SLIDE 7

Concordance*

! Concordance: occurrence of a word in a texts together

with its context.

! Bilingual concordancer show use of words in parallel texts.

13'

Concordancer*

Bilingual concordance Search string: Select corpus:

Alice in Wonderland rabbit EQUAL TO Done She felt very sleepy, when suddenly a White rabbit with pink eyes ran close by her. nor did Alice think it so unusual to hear the rabbit say to itself "Oh dear! Oh dear! I shall be too late!" But when the rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for she remembered that she had never before seen a rabbit with either a waistcoat-pocket or a watch to take out of it, and she ran across the field after it, and was just in time to see it pop down a large rabbit-hole under the hedge. The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had no time to think about stopping herself before she found herself falling down what seemed to be a very deep well.

Source: Target:

EN-English ZH-Chinese 她感到昏昏欲睡,就在此

  • A“哎呀!哎

呀!我要”她也不 然而当兔子居然从背心口袋中掏出一只表,瞧了 瞧,然后又匆匆赶路

  • 兔子洞像隧道一
  • word'alignment'

informaTon'??'

14'

slide-8
SLIDE 8

! Incrementally stores translated segments. Given a new

source segment it looks for perfect or fuzzy matches

!

Matches are ranked (100%-matches on top) and presented to the user as translation candidates for post- editing

!

A TM can be shared among and simultaneously updated by several translators working on the same project

! TMs model the style and terminology of the customers

Translation*Memory*

15'

Translation*Memory*

16'

slide-9
SLIDE 9

When does it help?

! on highly repetitive, such as technical manuals ! on new versions of previously translated manuals ! when several translators are working on the same project

How does it help?

! speeds up translation process ! ensures consistency across different translators

Limitations

! number of useful matches found is generally small (5-10%)

Translation*Memory*

17'

Machine translation decomposes the translation process into a sequence of rule applications. In statistical MT:

!

word alignment models and translation rules automatically learned from large parallel corpus

!

much less human effort is needed

!

requires huge amounts of data, the more, the better!

!

translation process as a search problem that computes an

  • ptimal sequence of translation rules to apply

!

according to the strategy used to apply the rules, the translation process may generate linear or hierarchical structures.

Machine*Transla(on**

18'

slide-10
SLIDE 10

Machine*Transla(on*

When does it help?

! language pairs supported by large parallel data ! translation directions between close languages ! training data represent well task data

How does it help?

! provides good draft translation to start with ! avoid translating easy/repetitive fragments

Limitations

! translations may lack of global coherence ! bad translations cause waste of time, loss of trust

19'

Capabilities TM MT Can it start from scratch? Does it improve during usage? Can it instantly learn a new translation? Does it consider context of the segment? Can it retrieve 100% matches? Can it create new 100% matches?

TM versus MT

TM and MT are rather complementary!

✔' ✔' ✔' ✔' ✔'

20'

slide-11
SLIDE 11

Machine*Transla(on**

21'

Simple MT Integration

TM'backed'up'by'MT'

How'to'evaluate'the'impact'of'MT?''

22'

slide-12
SLIDE 12

Human productivity

Daily productivity of translators is highly variable … and also translations vary significantly among translations To evaluated the impact of MT technology we have to consider both subjective and objective criteria:

! variations in productivity ! effort: e.g. human TER ! speed: e.g. word/hour, sec/word (post-editing time)

23'

! References*as*human*post>edi(ons*

! Perform'human'postZediTng'to'transform'the'hypothesis'into'the' closest'acceptable'translaTon' '

! Criterion:'the'less'the'number'of'edits,'the'be2er'the' hypothesis'(same'as'TER)' ' ! HTER''

! intuiTve'measure'of'MT'quality' ! highest'correlaTon'with'human'judgments'' ! semanTc'equivalence'is'considered' ! possible'subsTtute'for'human'evaluaTons'because'less'subjecTve' ! expensive:'3'to'7'minutes'per'sentence'for'a'human'to'annotate' ! not'suitable'for'use'in'the'development'cycle'of'an'MT'''''

Human-Targeted TER (HTER)

24'

slide-13
SLIDE 13

! Seconds'needed'to'postZedit'a'sentence' ! normalized'version'in'seconds'per'word'

! li2le'Tme'='good'translaTon' ! large'Tme'='bad'translaTon'

! Usually'includes:'

! reading'Tme' ! searching''for'informaTon'on'external'resources' ! typing'Tme' ! extra'Tme'for'secondary'acTvity'(e.g.'correcTon)'

'' ! High'variability'across'sentences'and'translators'

Post-editing time

25'

Baseline*system''

  • Commercial'CAT'tool:'SDL'Trados'Studio''
  • Commercial'MT'engine:'Google'Translate''
  • Commercial'TM'server:'MyMemory'

Simple MT Integration

Preliminary Experiments: 2 documents x 2 directions x 4 translations = 16 translators

26'

slide-14
SLIDE 14

Simple MT Integration So,'MT'helps!'What'next?'

27' 28'

slide-15
SLIDE 15

Project Acronym MateCat Project Title Machine Translation Enhanced Computer Assisted Translation Funding scheme STREP FP7-ICT-2011-7 Grant # 287688 Duration 36 months,1 Nov 2011- 30 Oct 2014 Consortium Fondazione Bruno Kessler - Italy Universite Le Mans - France The University of Edinburgh – United Kingdom Translated srl - Italy Effort 349 person-month (= 9.7 full-time-equivalent/year) Budget 3,368K € Funding 2,650K €

29'

Fact Sheet

Strategic

Optimal and seamlessly integration of MT into the CAT workflow Enhancement of productivity and user experience of translators

Research

New operating conditions for MT that match with the CAT application New MT features: self-tuning, user-adaptive and informative Orient scientific community to get critical mass (with Casmacat)

Technology

Enterprise level CAT tool integrating new MT features Field testing to measure progress on HT productivity High impact through full open source solution (Moses, IRSTLM, …)

Goals of MateCat

30'

slide-16
SLIDE 16

New MT Features

31'

! MT has improved significantly but independently from HT' ' ! RepeTTve'errors'annoy'human'translators' ''

New MT Features

! Improve'MT'quality'learning'from:'

! previous'translated'texts'(selfZtuning)' ! user'feedback'(userZadaptaTon)'

! Help'translators'with'metaZinformaTon:'

! Quality'EsTmaTon'scores'(InformaTve'MT)' ! Terminology'help'(InformaTve'MT)''

32'

slide-17
SLIDE 17

Self-Tuning MT

! Domain*adapta(on*

! EasyZtoZuse'tuning'of'MT'from'huge'amounts'of' generic'data'and'smaller'amounts'of'project'specific' data.'

! Project*adapta(on*

! MT'adaptaTon'as'soon'as'documents'are'translated' ''

! Document*analysis*

! Build'documentZadapted'models'to'improve' translaTon'coherence'

33'

i B u

1 2 3

A

ab Format Font Size

  • 1. MT stands for machine

translation.

  • 2. Our research aims to

make it more useful to translators.

  • 3. Our MT technology will

be seamlessly integrated into a Web-based CAT tool.

  • 4. All software will be

MateCat Tool

MT Server Self-tuning MT

Translated documents Models Project Adaptation

MT stands for machine translation. automatica. Our research aims to make it more useful to translators. Our MT technology will be seamlessly integrated into a Web- based CAT tool. MT e' l'acronimo di traduzione automatica. La nostra ricerca mira a renderla piu' utile per i traduttori. La nostra tecnologia di traduzione automatica sara' perfettamente integrata in un CAT tool basato su Web.

  • 1. MT e' l'acronimo di

traduzione automatica.

  • 2. La nostra ricerca mira a

renderla piu' utile per i traduttori.

  • 3. La nostra tecnologia di

traduzione automatica sara' perfettamente integrata in un CAT tool basato su Web.

  • 4. Tutto il software verra'

Your document has been saved in file MaeCat.xlif.

Self-tuning MT

34'

slide-18
SLIDE 18

i B u

1 2 3

A

ab Format Font Size

  • 1. MT stands for machine

translation.

  • 2. Our research aims to

make it more useful to translators.

  • 3. Our MT technology will

be seamlessly integrated into a Web-based CAT tool.

  • 4. All software will be

MateCat Tool

MT Server Self-tuning MT

Translated documents Models Project Adaptation

MT stands for machine translation. automatica. Our research aims to make it more useful to translators. Our MT technology will be seamlessly integrated into a Web- based CAT tool. MT e' l'acronimo di traduzione automatica. La nostra ricerca mira a renderla piu' utile per i traduttori. La nostra tecnologia di traduzione automatica sara' perfettamente integrata in un CAT tool basato su Web.

  • 1. MT e' l'acronimo di

traduzione automatica.

  • 2. La nostra ricerca mira a

renderla piu' utile per i traduttori.

  • 3. La nostra tecnologia di

traduzione automatica sara' perfettamente integrata in un CAT tool basato su Web.

  • 4. Tutto il software verra'

Your document has been saved in file MaeCat.xlif.

Self-tuning MT

35'

User-Adaptive MT

! On>line*adapta(on*

! Dynamically'adapt'MT'from'correcTons'and'other'feedback'by' the'user.'

! Context>aware*transla(on*

! Augment'MT'with'lexical/syntacTc'constraints,'to'consistently' translate'recurrent'expressions'or'to'disambiguated'anaphoric' expressions'

! Real>(me*processing*

! MT'adaptaTon'must'be'realZTme'and'transparent'to'the'user'

36'

slide-19
SLIDE 19

i B u

1 2 3

A

ab

Format Font Size

  • 1. MT stands for machine

translation.

  • 2. Our research aims to

make it more useful to translators.

  • 3. Our MT technology will

be seamlessly integrated into a Web-based CAT tool.

  • 4. All software will be

MateCat Tool MT stands for machine tanslation. MT sta per traduzione automatica. MT e' l'acronimo di traduzione automatica. SRC MT USR

MT Server User- adaptive MT

User Feedback Models On-Line Learning

  • 1. MT e' l'acronimo di

traduzione automatica. 2.

User-adaptive MT

37'

i B u

1 2 3

A

ab

Format Font Size

  • 1. MT stands for machine

translation.

  • 2. Our research aims to

make it more useful to translators.

  • 3. Our MT technology will

be seamlessly integrated into a Web-based CAT tool.

  • 4. All software will be

MateCat Tool MT stands for machine tanslation. MT sta per traduzione automatica. MT e' l'acronimo di traduzione automatica. SRC MT USR

MT Server User- adaptive MT

User Feedback Models On-Line Learning

  • 1. MT e' l'acronimo di

traduzione automatica. 2.

User-adaptive MT

38'

slide-20
SLIDE 20

Informative MT

! Terminology*help*

! Methods'able'to'learn'terminology'use'and'provide' suggesTons'on'demand'

! Confidence*measures*

! EsTmate'postZediTng'effort'to'correct'sentences' ! Pinpoint'words'of'the''MT'output'more'likely'to'be'wrong'

! Enriched*MT*output*

! Display'alternaTve'MT'outputs,'TM'matches,'or' terminology'hits' ! Help'user'to'determine'parts'of'the'MT'output'to'be' edited'

39'

i B u

1 2 3

A

ab Format Font Size

  • 1. MT stands for machine

translation.

  • 2. Our research aims to

make it more useful to translators.

  • 3. Our MT technology will

be seamlessly integrated into a Web-based CAT tool.

  • 4. All software will be

MateCat Tool

MT Server

MT and TM suggestions Filtering and ranking

Translation matches

Informative MT

Our research aims to make it more useful to translators. SRC

Source MT decoder

  • 1. MT e' l'acronimo di

traduzione automatica. 2.

QE engine

La nostra ricerca mira a renderla piu' utile per i traduttori Our research aims to make it more useful to translators. Our goal is to make it more useful to interpreters. Il nostro obiettivo e' di renderlo piu' utile per gli interpreti. MT 90% TM 75% La nostra ricerca mira a renderla piu' utile per i traduttori

TM Server

Informative MT

40'

slide-21
SLIDE 21

i B u

1 2 3

A

ab Format Font Size

  • 1. MT stands for machine

translation.

  • 2. Our research aims to

make it more useful to translators.

  • 3. Our MT technology will

be seamlessly integrated into a Web-based CAT tool.

  • 4. All software will be

MateCat Tool

MT Server

MT and TM suggestions Filtering and ranking

Translation matches

Informative MT

Our research aims to make it more useful to translators. SRC

Source MT decoder

  • 1. MT e' l'acronimo di

traduzione automatica. 2.

QE engine

La nostra ricerca mira a renderla piu' utile per i traduttori Our research aims to make it more useful to translators. Our goal is to make it more useful to interpreters. Il nostro obiettivo e' di renderlo piu' utile per gli interpreti. MT 90% TM 75% La nostra ricerca mira a renderla piu' utile per i traduttori

TM Server

Informative MT

41'

SuggesTon' from'TM' PostZediTng' SuggesTon' from'SMT'

MateCat Tool

Simple.'Web'based.'

42'

slide-22
SLIDE 22

43'

MateCat Tool

Data'collecTon' and'logging'for' inZdepth'analysis'

MateCat Tool

44'

slide-23
SLIDE 23

FIRST*FIELD*TEST* (Self>tuning*MT)*

45'

Day 1 Translation of 50% of doc with MT1 (domain adapted) Day 2 translation of rest of doc with MT2 (project adapted)

Test Protocol

46'

slide-24
SLIDE 24

Time*to*edit*(words/hour)*

The'average'translaTon'speed' *

Post>edi(ng*effort*(human*TER)*

The'average'percentage'of'word'changes'

47'

Key Performance Indicators

Average*gain:**22.25%*****************Average*gain:*10.71%*****

Self-tuning MT

This'protocol'introduces'secondary'effects:'' learning'curve'of'users'with'system'and'document'''

48'

slide-25
SLIDE 25

SECOND*FIELD*TEST*

49'

! Professional'translators' ! Language'pair:'EN>IT,'EN>FR' ! ContrasTve'tests:''

! domain'adapted'vs.'project/user'adapTve'MT'

! Legal'document:'15,000'words'

'

50'

Second Field Test

slide-26
SLIDE 26

Warm-up session (WU) First 20% of doc Post-edit domain adapted MT Test: Informative MT Field-test session (FT) Remaining 80% of doc Test: domain-adapted MT vs. project-adapted dynamic MT

51'

Experiment Design

0' 5' 10' 15' 20' 25' 30' 35' 40' 45' 50' ENZFR'' 8045' ENZFR''' 8047' ENZFR''' 8048' ENZIT'''' 8021' ENZIT'''' 8022' ENZIT'''' 8024' DA.STA' PA.DYN'

HTER*%*

consistent' improvements'by'5'out'

  • f'6'users'

52'

FT session: effort

slide-27
SLIDE 27

0' 50' 100' 150' 200' 250' ENZFR'' 8045' ENZFR''' 8047' ENZFR''' 8048' ENZIT'''' 8021' ENZIT'''' 8022' ENZIT'''' 8024' DA.STA' PA.DYN'

words/hour*

translaTon'speedZups' by'4'translators'

53'

FT session: productivity

54'

Informative MT

slide-28
SLIDE 28

! We'run'informaTve'on'the'WU'day'only'(to'reduce'risks)' ! QE'esTmaTon'could'not'impact'directly'on'producTvity' ! We'evaluated'QE'with'a'subj.'quesTonnaire''(8'users)'

55'

Informative MT

Matecat* Ac(vi(es!** *

56'

slide-29
SLIDE 29

28' PRESENTATIONS''

'

18'VENUES' 14'SCIENTIFIC'PUBLICATIONS'

'

3'TUTORIALS'

'

57'

Spreading the word

  • Who:'10'translaTon'students'

from'ISIT,'Trento'

  • When:'Summer'2013'
  • How*long:'20h'(one'week)'
  • What:'ENZIT'legal'texts'

'

58'

Summer Stage 2013

slide-30
SLIDE 30

59'

RHoK 2013

MateCat @ RHoK Translatathon, 1-2 June Trento (Italy)

! six'nonZprofessional'translator'volunteers' ! two'sw'localizaTon'projects:'' ! ProjectLibre:'3Kw,'EN'into'FR,'PTZBR,'ZH,'TR' ! Open'Hospital:'5Kw,'EN'into'AR'

60'

RHoK 2013

slide-31
SLIDE 31

MateCat @ RHoK Translatathon, 1-2 June Trento (Italy)

! conversion'scripts'between'.po'and'.xliff'formats' ! private'TM'and'Google'Translate'API' ! fourfold'producTvity'increase'wrt'to'2012!''

61'

RHoK 2013

! Over'3'million'words'translated'since'February'2013' ! Constant'feedback'from'project'managers'and'translators' ! Quick'and'effecTve'implementaTon'of'new'features'based'on' users’'needs'

62'

Using Matecat at Translated

slide-32
SLIDE 32

Applied' Language' SoluTons,' EC* Directorate>General* for* Transla(on,' ISIT' University' InsTtute' for' Language' Mediator,' Mercury' TranslaTons,' TAUS,' eBay,' Google,' Ooyala,' Eloquence,' SOSLANGUAGE,' AlfaBeta,' WTC' WORLD' TRANSLATION'CENTRE,'ITL'GROUP,'Nexo' CorporaTon,' TransZEdit' Group,' TEDx' Trento,…''

'

63'

User Group

Integration of human and machine translation:

! very challenging research problems ! researchers & translators finally collaborate

MateCat project:

! open-source experimental infrastructure ! to evaluate MT utility for translators

Interested to try our the MateCat Tool?

! http://matecatpro.translated.net/

Conclusions

64'

slide-33
SLIDE 33

*

Thank you!

*

*

65'