Darmstadt Knowledge Processing Repository Based on UIMA Iryna - - PowerPoint PPT Presentation

darmstadt knowledge processing repository based on uima
SMART_READER_LITE
LIVE PREVIEW

Darmstadt Knowledge Processing Repository Based on UIMA Iryna - - PowerPoint PPT Presentation

Darmstadt Knowledge Processing Repository Based on UIMA Iryna Gurevych, Max Mhlhuser, Christof Mller, Jrgen Steimle, Markus Weimer, Torsten Zesch Ubiquitous Knowledge Processing Group Telecooperation, Computer Science Department


slide-1
SLIDE 1

Darmstadt Knowledge Processing Repository Based on UIMA

Iryna Gurevych, Max Mühlhäuser, Christof Müller, Jürgen Steimle, Markus Weimer, Torsten Zesch Ubiquitous Knowledge Processing Group Telecooperation, Computer Science Department Darmstadt University of Technology

slide-2
SLIDE 2

Telecooperation

slide-3
SLIDE 3

Telecooperation

slide-4
SLIDE 4

Darmstadt Knowledge Processing Software Repository SIR AQUA THESEUS

Ubiquitous Knowledge Processing

slide-5
SLIDE 5

AQUA

5

Automatic Quality Assessment and Feedback in eLearning 2.0 (AQUA)

slide-6
SLIDE 6

User Generated Discourse in Web 2.0

slide-7
SLIDE 7

AQUA – Anoto pen

slide-8
SLIDE 8

Natural Language Processing Machine Learning

AQUA - System Architecture

slide-9
SLIDE 9

AQUA – System Architecture

slide-10
SLIDE 10

SIR (in cooperation with Prof. Hinrichs)

  • Semantic Information Retrieval

Natural language expression of information need low level communication interface

Semantic search (SIR)

based on semantic relatedness Natural language expression of information need low level communication interface

Bridge the human – computer gap

slide-11
SLIDE 11

Information Retrieval (IR)

Document 3 Document 1 Document 2 Document ... Document ... Document ... Document ...

  • Keywords

Boolean, Vector Space, ...

slide-12
SLIDE 12

SIR-Project

Profession 3 Profession 1 Profession 2 Profession ...

Essay

Profession ... Profession ... Profession ...

Semantic Relatedness

Natural language expression of information need low level communication interface

Semantic search (SIR)

based on semantic relatedness

cake, computer, to read, ... baker, to program, quality assurance

slide-13
SLIDE 13

SIR Example

find good index terms compute semantic relatedness Compound Splitting Negation Detection WSD

slide-14
SLIDE 14

THESEUS - TEXO

  • Large-scale BMBF-Project, industry (SAP, Siemens, etc.)
  • Service Marketplaces in Web 2.0

Find services, both users and machines

  • Problem:

Only keyword-based search Lack of ontologies for semantic search

  • Solution:

Use natural language descriptions of web services Apply Semantic Information Retrieval Community Mining for optimized service selection Darmstadt Knowledge Processing Repository

slide-15
SLIDE 15

Data export

UIMA components

Project specific analysis Semantic analysis Syntactic analysis Morphological analysis Linguistic preprocessing Data import SIR AQUA THESEUS

Wikipedia reader, Forum reader, Plain text reader Tokenizer, Sentence splitter, Stopword tagger Stemmer, Lemmatizer, Compound Splitter PoS-Tagger, Parser NE tagger, Sentiment detector, WSD component Swear word tagger (AQUA), Negation detection (SIR) Indexer (Lucene, Terrier), ARFF export

slide-16
SLIDE 16

Advantages of UIMA

  • Components can be shared between projects
  • Shared model of thinking

“Reader + Annotators + Consumer” Configuration of components

  • Descriptive component orchestration
slide-17
SLIDE 17

Challenges

  • Agree on a type system

No automatic type mapping

  • Some rough edges in UIMA

No real plug’n’work with PEAR packages Using constraints to align annotations seems to be slow

slide-18
SLIDE 18

Wish list

  • Automatic type matching
  • Better tool support

Improving Eclipse plug-ins (robustness, features) Refactoring of UIMA components CPE runner ++ (automatic logging, performance monitor, etc.)

  • Plug’n’work approach
  • “Import by name” in CPEs

Or make ${CPM_HOME}/path also work for readers/consumers

  • Construct XML descriptors from Java annotations
  • More intuitive API
slide-19
SLIDE 19

Thank you very much! Thank you very much!

  • Acknowledgements:

DFG for funding “Semantic Information Retrieval” DFG for funding “Automatic Quality Assessment and Feedback in eLearning 2.0” http://www.ukp.tu-darmstadt.de/