Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and - - PDF document

hybrid nlp hybrid nlp
SMART_READER_LITE
LIVE PREVIEW

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and - - PDF document

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing Hybrid Architectures An Advanced Platform for Hybrid NLP: Deep Thought Applications for Hybrid Processing Conclusion and Outlook LTII


slide-1
SLIDE 1

Hybrid NLP Hybrid NLP

slide-2
SLIDE 2

LTII – SS 2008

O OUTLINE

UTLINE

  • Problems of Deep and Shallow Processing
  • Hybrid Architectures
  • An Advanced Platform for Hybrid NLP: Deep Thought
  • Applications for Hybrid Processing
  • Conclusion and Outlook
slide-3
SLIDE 3

LTII – SS 2008

D DEEP &

EEP & S

SHALLOW

HALLOW P

PROCESSING

ROCESSING

  • deep methods for morphological - syntactic - semantic

processing exploit our knowledge about the structure of human language

  • as opposed to shallow methods such as pattern

matching grammars, n-gram language models

  • deep methods are needed for getting at the meaning of

language input

  • shallow methods perform a partial or heavily under-

specified analysis sufficient for certain applications

slide-4
SLIDE 4

LTII – SS 2008

Sue gab Paul einen alten Pfennig. NP N A N Det V S/NP NP S NP N NP A N Det V VP NP S Sue gave Paul an old penny. NP

∃x[(old'(penny')) (x) ∧ (Past(give'(sue‘, paul‘, x)))]

slide-5
SLIDE 5

LTII – SS 2008

A APPLICATIONS

PPLICATIONS

  • Machine Translation

e.g. Systran, Logos, METAL-Comprendium, IBM PT

  • Access to Databases

e.g. Core Language Engine

slide-6
SLIDE 6

LTII – SS 2008

O ONCE

NCE U

UPON A

PON A T

TIME

IME

  • Broad industrial research in deep parsing
  • Xerox - LFG
  • Siemens - LFG
  • IBM Germany - HPSG
  • Hewlett Packard - GPSG and HPSG
  • IBM USA - PLNLP and Slot Grammar
  • Very large projects
  • EUROTRA
  • LILOG
  • LS-GRAM
slide-7
SLIDE 7

LTII – SS 2008

G GRAMMAR

RAMMAR F

FRAMEWORKS

RAMEWORKS

  • Head-Driven Phrase Structure Grammar (HPSG)
  • Lexical Functional Grammar (LFG)
  • Tree-Adjunction Grammar (TAG)
  • Categorial Grammar (CG)
  • Dependency Grammar (DG)
  • GB-Minimalist Program
slide-8
SLIDE 8

LTII – SS 2008

HPSG HPSG

  • Head-Driven Phrase Structure Grammar by

Pollard and Sag

  • Uniform formalism: typed feature structures
  • High degree of lexicalization: very few PS-rules,

rich lexicon structure

  • Ontological structure: Multiple inheritance type

hierarchy

slide-9
SLIDE 9

LTII – SS 2008

Problems Problems with with Deep Deep Analysis Analysis

  • Coverage (Development Time)
  • Robustness (Coping with Out-of-Grammar Input)
  • Efficiency (Runtime and Space Efficiency)
  • Specificity (Selection among Readings)
slide-10
SLIDE 10

LTII – SS 2008

Problems Problems with with Shallow Shallow Analysis Analysis

  • Accuracy
  • Problems with embeddings, grammatical control,

anaphora and modal as well a negative contexts. According to SVP Raul Lopez, Slator expected him to be appointed CEO of Crawford Inc. at the upcoming share holders meeting. After the retirement of Peter Smith, Mary Hopp was introduced by VP Brown as the new director of the marketing division. After every former US based vicepresident except Lisa Ronell served as Chairman of the Board, the shareholders for the first time appointed a non-US Chairperson.

slide-11
SLIDE 11

LTII – SS 2008

R REAL

EAL G

GRAMMARS

RAMMARS

  • LinGO - English Resource Grammar
  • 8.000 types
  • 100.000 lines of code
  • average feature structure > 300 nodes
  • German Grammar of equal size
  • Japanese and Norwegian grammars are getting

close

slide-12
SLIDE 12

LTII – SS 2008

International International Collaboration Collaboration

Tokyo

Tsujii Lab at the University of Tokyo Tsujii, Torisawa, Ninomiya, Taura, Yoshida, Mitsoishi,...

Stanford

HPSG Group at CSLI Sag, Flickinger, Copestake, Malouf, Carroll (Brighton),...

Saarbrücken

LT Lab at DFKI and Dept. of CL Oepen, Callmeier, Krieger, Kiefer, Ciortuz, Müller,...

QuickTime™ and a GIF decompressor are needed to see this picture

slide-13
SLIDE 13

LTII – SS 2008

tsdb

T THE

HE E

EVALUATION

VALUATION S

SETUP

ETUP

slide-14
SLIDE 14

LTII – SS 2008

R RESULTS

ESULTS

  • All participating systems have benefitted from the

systematic comparative evaluation

  • Currently the fastest system is the runtime parser PET

by Ulrich Callmeier (Saarbrücken)

  • But the other parsers also improved drastically,e.g.:
  • LKB (Stanford, Cambridge)
  • LILFES (Tokyo)
  • PAGE (Saarbrücken)
slide-15
SLIDE 15

LTII – SS 2008

R RESULTS

ESULTS

  • HPSG Parsing is now 2000 times faster than before
  • Normal-length sentences parse in 0.1 - 1.0 seconds
  • Steady increase in hardware efficiency will also help
slide-16
SLIDE 16

LTII – SS 2008

R REFERENCES

EFERENCES

  • D. Flickinger, S. Oepen, H. Uszkoreit, and J. Tsujii (eds.).
  • 2000. Journal of Natural Language Engineering 6 (2000) 1.

Special Issue on Efficient Processing with HPSG: Methods, Systems, Evaluation. Cambridge University Press. Cambridge.

  • A. Copestake. 2002. Implementing Typed Feature

Structure Grammars. CSLI Publications, Stanford. Building a Large Annotated Corpus of English:

  • S. Oepen, D. Flickinger, J. Tsujii, and H. Uszkoreit. 2002.

Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing. CSLI Publications, Stanford.

slide-17
SLIDE 17

LTII – SS 2008

T THE

HE C

CORE

ORE M

MACHINERY

ACHINERY

tsdb

LKB Development Platform LKB Development Platform PET Runtime Parser LKB Development Platform LKB Development Platform Application Japanese Grammar German Grammar English Grammar

Open Source Public Domain

slide-18
SLIDE 18

LTII – SS 2008

H HOWEVER

OWEVER

  • Back to the problems of
  • robustness
  • coverage
  • specificity
slide-19
SLIDE 19

LTII – SS 2008

A ASSUMPTIONS

SSUMPTIONS

  • Information extraction is not an alternative to deep

processing but a continuum between classification and "full" semantic analysis

  • Information Extraction via Text Enrichment
  • We can detect topics, names, binary relations, complex

relations, answers, etc.

  • Question: At what point is deep processing needed?
slide-20
SLIDE 20

LTII – SS 2008

A APPROACH

PPROACH

  • Lack of robustness and coverage remains a

serious problem for deep processing.

  • So we need to find applications, where deep

processing can improve detection without spoiling the performance.

  • Example: Relation extraction.
  • Let deep processing assist shallow methods.
slide-21
SLIDE 21

LTII – SS 2008

LT LT M METHODS

ETHODS

discrete non-discrete hybrid shallow deep HMM HMM-

  • based

based POS POS Tagger Tagger

slide-22
SLIDE 22

LTII – SS 2008

LT LT M METHODS

ETHODS

discrete non-discrete hybrid shallow deep HPSG HPSG-

  • Parser

Parser with with MRS MRS

slide-23
SLIDE 23

LTII – SS 2008

LT LT M METHODS

ETHODS

discrete non-discrete hybrid shallow deep PCF Parser PCF Parser

slide-24
SLIDE 24

LTII – SS 2008

LT LT M METHODS

ETHODS

discrete non-discrete hybrid shallow deep syntactic syntactic LFG LFG parser parser with with ME ME selection selection

slide-25
SLIDE 25

LTII – SS 2008

C COMBINATION

OMBINATION OF M

OF METHODS

ETHODS

discrete non-discrete hybrid shallow deep