Focusing on Tighter Integration of CAT Tools and Corpora Milo - - PowerPoint PPT Presentation

focusing on tighter integration of cat tools and corpora
SMART_READER_LITE
LIVE PREVIEW

Focusing on Tighter Integration of CAT Tools and Corpora Milo - - PowerPoint PPT Presentation

milos.jakubicek@sketchengine.co.uk Focusing on Tighter Integration of CAT Tools and Corpora Milo Jakubek Translating and the Computer 38 London, November 17, 2016 Milo Jakubek (Lexical Computing) CAT Tools and Corpora Nov 17,


slide-1
SLIDE 1

Focusing on Tighter Integration

  • f CAT Tools and Corpora

Miloš Jakubíček

milos.jakubicek@sketchengine.co.uk

Translating and the Computer 38 London, November 17, 2016

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 1 / 12

slide-2
SLIDE 2

Background

Sketch Engine

  • online service, since 2003
  • large collections of texts (text corpora)
  • built-in tools for creating user corpora
  • effjcient search & advanced analysis
  • as of 2016:
  • corpora for 85 languages
  • annotation tools (PoS tagging, lemmatization) for ca 30 of them
  • 10,000s of users:
  • lexicographers
  • linguists
  • teachers and students
  • copywriters, terminologists and translators

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 2 / 12

slide-3
SLIDE 3

CAT tools landscape as of 2016

  • many advanced tools on the market
  • main focus on:
  • wide support of document formats, import/export options
  • project management and accounting
  • translation memory management
  • text formatting
  • …not so much on the language itself
  • in fact CAT tools are decade(s) behind the development in natural language

processing

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 3 / 12

slide-4
SLIDE 4

CAT tools and NLP

Why?

  • because they are not accurate enough?
  • because they are not available?
  • because the community is not aware of them?
  • because they are hard to use?

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 4 / 12

slide-5
SLIDE 5

CAT tools and NLP

  • accuracy: not perfect, but good enough to actually help
  • availability: big issue (including IP issues)
  • awareness: I don’t know!
  • user-friendliness: big issue

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 5 / 12

slide-6
SLIDE 6

NLP and translation

NLP is doing much more for machine translation than for human translation!

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 6 / 12

slide-7
SLIDE 7

NLP and translation

  • most NLP tools are more useful in a semi-automatic setting than a fully-automated
  • ne
  • community focus on MT + post-editing rather than exploiting NLP resources and

tools that MT is built on to help translators directly

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 7 / 12

slide-8
SLIDE 8

NLP offers

  • data
  • huge amounts, both monolingual and bilingual
  • used to some extent
  • tools
  • tokenizers/segmenters
  • part-of-speech taggers, morphological analyzers
  • parsers (of any kind)
  • language modelling and prediction tools

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 8 / 12

slide-9
SLIDE 9

Challenges

  • technical interoperation: solvable
  • legal issues: hopefully solvable too
  • workfmow adaptation and integration: a real challenge

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 9 / 12

slide-10
SLIDE 10

Workfmow adaptation and integration

  • urgent need for seamless integration of state-of-the-art NLP tools that will be:
  • transparent to the user
  • fit the workfmow
  • bringing measurable savings

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 10 / 12

slide-11
SLIDE 11

Sketch Engine integration with CAT tools

  • Sketch Engine & CAT tools: a typical example of all the aforementioned issues
  • just released: Sketch Engine plugin for SDL Trados Studio
  • Sketch Engine workshop: Friday, 2pm–5pm, Education Room
  • free 3-month trial for all participants

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 11 / 12

slide-12
SLIDE 12

Conclusions

  • NLP tools and resources can do A LOT for human translators
  • if integrated well into the translator environment

⇒ this is what needs to be worked on

Miloš Jakubíček (Lexical Computing) CAT Tools and Corpora Nov 17, 2016 12 / 12