Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and - - PDF document
Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and - - PDF document
Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing Hybrid Architectures An Advanced Platform for Hybrid NLP: Deep Thought Applications for Hybrid Processing Conclusion and Outlook LTII
SLIDE 1
SLIDE 2
LTII – SS 2008
O OUTLINE
UTLINE
- Problems of Deep and Shallow Processing
- Hybrid Architectures
- An Advanced Platform for Hybrid NLP: Deep Thought
- Applications for Hybrid Processing
- Conclusion and Outlook
SLIDE 3
LTII – SS 2008
D DEEP &
EEP & S
SHALLOW
HALLOW P
PROCESSING
ROCESSING
- deep methods for morphological - syntactic - semantic
processing exploit our knowledge about the structure of human language
- as opposed to shallow methods such as pattern
matching grammars, n-gram language models
- deep methods are needed for getting at the meaning of
language input
- shallow methods perform a partial or heavily under-
specified analysis sufficient for certain applications
SLIDE 4
LTII – SS 2008
Sue gab Paul einen alten Pfennig. NP N A N Det V S/NP NP S NP N NP A N Det V VP NP S Sue gave Paul an old penny. NP
∃x[(old'(penny')) (x) ∧ (Past(give'(sue‘, paul‘, x)))]
SLIDE 5
LTII – SS 2008
A APPLICATIONS
PPLICATIONS
- Machine Translation
e.g. Systran, Logos, METAL-Comprendium, IBM PT
- Access to Databases
e.g. Core Language Engine
SLIDE 6
LTII – SS 2008
O ONCE
NCE U
UPON A
PON A T
TIME
IME
- Broad industrial research in deep parsing
- Xerox - LFG
- Siemens - LFG
- IBM Germany - HPSG
- Hewlett Packard - GPSG and HPSG
- IBM USA - PLNLP and Slot Grammar
- Very large projects
- EUROTRA
- LILOG
- LS-GRAM
SLIDE 7
LTII – SS 2008
G GRAMMAR
RAMMAR F
FRAMEWORKS
RAMEWORKS
- Head-Driven Phrase Structure Grammar (HPSG)
- Lexical Functional Grammar (LFG)
- Tree-Adjunction Grammar (TAG)
- Categorial Grammar (CG)
- Dependency Grammar (DG)
- GB-Minimalist Program
SLIDE 8
LTII – SS 2008
HPSG HPSG
- Head-Driven Phrase Structure Grammar by
Pollard and Sag
- Uniform formalism: typed feature structures
- High degree of lexicalization: very few PS-rules,
rich lexicon structure
- Ontological structure: Multiple inheritance type
hierarchy
SLIDE 9
LTII – SS 2008
Problems Problems with with Deep Deep Analysis Analysis
- Coverage (Development Time)
- Robustness (Coping with Out-of-Grammar Input)
- Efficiency (Runtime and Space Efficiency)
- Specificity (Selection among Readings)
SLIDE 10
LTII – SS 2008
Problems Problems with with Shallow Shallow Analysis Analysis
- Accuracy
- Problems with embeddings, grammatical control,
anaphora and modal as well a negative contexts. According to SVP Raul Lopez, Slator expected him to be appointed CEO of Crawford Inc. at the upcoming share holders meeting. After the retirement of Peter Smith, Mary Hopp was introduced by VP Brown as the new director of the marketing division. After every former US based vicepresident except Lisa Ronell served as Chairman of the Board, the shareholders for the first time appointed a non-US Chairperson.
SLIDE 11
LTII – SS 2008
R REAL
EAL G
GRAMMARS
RAMMARS
- LinGO - English Resource Grammar
- 8.000 types
- 100.000 lines of code
- average feature structure > 300 nodes
- German Grammar of equal size
- Japanese and Norwegian grammars are getting
close
SLIDE 12
LTII – SS 2008
International International Collaboration Collaboration
Tokyo
Tsujii Lab at the University of Tokyo Tsujii, Torisawa, Ninomiya, Taura, Yoshida, Mitsoishi,...
Stanford
HPSG Group at CSLI Sag, Flickinger, Copestake, Malouf, Carroll (Brighton),...
Saarbrücken
LT Lab at DFKI and Dept. of CL Oepen, Callmeier, Krieger, Kiefer, Ciortuz, Müller,...
QuickTime™ and a GIF decompressor are needed to see this picture
SLIDE 13
LTII – SS 2008
tsdb
T THE
HE E
EVALUATION
VALUATION S
SETUP
ETUP
SLIDE 14
LTII – SS 2008
R RESULTS
ESULTS
- All participating systems have benefitted from the
systematic comparative evaluation
- Currently the fastest system is the runtime parser PET
by Ulrich Callmeier (Saarbrücken)
- But the other parsers also improved drastically,e.g.:
- LKB (Stanford, Cambridge)
- LILFES (Tokyo)
- PAGE (Saarbrücken)
SLIDE 15
LTII – SS 2008
R RESULTS
ESULTS
- HPSG Parsing is now 2000 times faster than before
- Normal-length sentences parse in 0.1 - 1.0 seconds
- Steady increase in hardware efficiency will also help
SLIDE 16
LTII – SS 2008
R REFERENCES
EFERENCES
- D. Flickinger, S. Oepen, H. Uszkoreit, and J. Tsujii (eds.).
- 2000. Journal of Natural Language Engineering 6 (2000) 1.
Special Issue on Efficient Processing with HPSG: Methods, Systems, Evaluation. Cambridge University Press. Cambridge.
- A. Copestake. 2002. Implementing Typed Feature
Structure Grammars. CSLI Publications, Stanford. Building a Large Annotated Corpus of English:
- S. Oepen, D. Flickinger, J. Tsujii, and H. Uszkoreit. 2002.
Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing. CSLI Publications, Stanford.
SLIDE 17
LTII – SS 2008
T THE
HE C
CORE
ORE M
MACHINERY
ACHINERY
tsdb
LKB Development Platform LKB Development Platform PET Runtime Parser LKB Development Platform LKB Development Platform Application Japanese Grammar German Grammar English Grammar
Open Source Public Domain
SLIDE 18
LTII – SS 2008
H HOWEVER
OWEVER
- Back to the problems of
- robustness
- coverage
- specificity
SLIDE 19
LTII – SS 2008
A ASSUMPTIONS
SSUMPTIONS
- Information extraction is not an alternative to deep
processing but a continuum between classification and "full" semantic analysis
- Information Extraction via Text Enrichment
- We can detect topics, names, binary relations, complex
relations, answers, etc.
- Question: At what point is deep processing needed?
SLIDE 20
LTII – SS 2008
A APPROACH
PPROACH
- Lack of robustness and coverage remains a
serious problem for deep processing.
- So we need to find applications, where deep
processing can improve detection without spoiling the performance.
- Example: Relation extraction.
- Let deep processing assist shallow methods.
SLIDE 21
LTII – SS 2008
LT LT M METHODS
ETHODS
discrete non-discrete hybrid shallow deep HMM HMM-
- based
based POS POS Tagger Tagger
SLIDE 22
LTII – SS 2008
LT LT M METHODS
ETHODS
discrete non-discrete hybrid shallow deep HPSG HPSG-
- Parser
Parser with with MRS MRS
SLIDE 23
LTII – SS 2008
LT LT M METHODS
ETHODS
discrete non-discrete hybrid shallow deep PCF Parser PCF Parser
SLIDE 24
LTII – SS 2008
LT LT M METHODS
ETHODS
discrete non-discrete hybrid shallow deep syntactic syntactic LFG LFG parser parser with with ME ME selection selection
SLIDE 25
LTII – SS 2008