Grammars and Parsing Grammars and Sentence Structure What makes a - - PowerPoint PPT Presentation

grammars and parsing
SMART_READER_LITE
LIVE PREVIEW

Grammars and Parsing Grammars and Sentence Structure What makes a - - PowerPoint PPT Presentation

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A Top-Down Parser A Bottom-Up Parser Transition Network Grammars Ch.3 Grammars and Parsing 1 Grammars and Sentence Structure Ex. John ate the


slide-1
SLIDE 1

Ch.3 Grammars and Parsing 1

Grammars and Parsing

  • Grammars and Sentence Structure
  • What makes a good grammar
  • A Top-Down Parser
  • A Bottom-Up Parser
  • Transition Network Grammars
slide-2
SLIDE 2

Ch.3 Grammars and Parsing 2

Grammars and Sentence Structure

  • Ex. John ate the cat

Tree Representation (S (NP (NAME John)) VP (V ate) ( NP ( ART the) (N cat))))

slide-3
SLIDE 3

Ch.3 Grammars and Parsing 3

Tree Terminology

  • Trees are special form of a graph which are

consisting of nodes connected by links

  • The node at the top is called the root
  • The nodes at the bottom are called leaves
  • An ancestor of node N is defined as N’s

parent

slide-4
SLIDE 4

Ch.3 Grammars and Parsing 4

A Simple Grammar

  • 1. S -> NP VP
  • 2. VP -> V NP
  • 3. NP -> NAME
  • 4. NP -> ART N
  • 5. NAME -> John
  • 6. V -> ate
  • 7. ART -> the
  • 8. N -> cat
slide-5
SLIDE 5

Ch.3 Grammars and Parsing 5

Context Free Grammar(CFG)

  • It consists of:

– Terminal symbols – Non-terminal symbols – Production Rules – Starting Symbol

slide-6
SLIDE 6

Ch.3 Grammars and Parsing 6

Derivation

  • A grammar is said to derive a sentence if

there is a sequence of rules that allow you to rewrite the start symbol into the sentence.

  • Two important processes are based on

derivations: sentence generation and parsing

  • There are two basic methods of parsing Top

down and Bottom up

slide-7
SLIDE 7

Ch.3 Grammars and Parsing 7

What makes a Good Grammar

  • Generality, the range of sentences the

grammar analyzes correctly

  • Selectivity, the range of non-sentences it

identifies as problematic

  • Understandability, the simplicity of the

grammar itself

slide-8
SLIDE 8

Ch.3 Grammars and Parsing 8

Writing a Grammar

  • Try to group words to form a constituent
  • Try to construct a new sentence that involves that

group of words in a conjunction with another group of words classified as the same type of constituent

e.g NP-NP I ate a hamburger and a hot dog

but if you define NP -> on NP the sentence I ate a hamburger and on the stove does not work and so the definition is wrong

slide-9
SLIDE 9

Ch.3 Grammars and Parsing 9

Writing a Grammar (cont.)

  • Another test involves inserting the proposed

constituent into other sentences that take the same category of constituent

e.g John’s hitting of Mary is a NP. It can be inserted

in the following two sentences: John’s hitting of Mary alarmed Sue (S->NP VP) I cannot explain John’s hitting of Mary (VP->VNP)

slide-10
SLIDE 10

Ch.3 Grammars and Parsing 10

Grammar Generative Capacity

  • Regular grammar, S->aS1, S1->bS2, S2->d

This grammar cannot generate ab, aabb, ….

  • Context Free Grammar, S->ab, S->aSb

This grammar cannot generate abc, aabbcc, ……

  • Context Sensitive Grammar xAy ->x z y

where A is a symbol x, y are ( possibly empty) sequence

  • f symbols, and z is a nonempty sequence of symbols
  • Type 0 Grammar are more general and allow

arbitrary rewrite rules

slide-11
SLIDE 11

Ch.3 Grammars and Parsing 11

Top Down Parsers

  • Start with the starting symbol and attempts to rewrite it

into a sequence of terminals

  • The state of the parse at any given time can be represented

as a list of symbols

e.g Starting in the state (s) and applying the rule S-> NP VP, the symbol list will be (NP VP)

  • The parser could continue until the state consisted entirely
  • f terminal symbols and then check the input sentence. A

better idea is to check the input as soon as it can

  • Rather than having a separate rule for each word, a

structure called the lexicon is used to store the possible categories for each word e.g. cried:V, dogs:N, the:ART

slide-12
SLIDE 12

Ch.3 Grammars and Parsing 12

Top Down Parsers (cont.)

  • With the lexicon specified, a grammar need not contain

any lexical rules

  • the current position in the sentence is to be included in the

representation of the state of the parse to memorize the number of terminals that have been matched.

E.g 1 The 2 dogs 3 cried 4 A typical parse state would be ((N VP)2)

  • A parsing algorithm that is guaranteed to find a parse if

there is one must systematically explore every possible state ( backtracking)

slide-13
SLIDE 13

Ch.3 Grammars and Parsing 13

A Simple Top Down Parsing Algorithm (Example)

Grammar

  • 1. S -> NP VP
  • 2. NP -> ART N
  • 3. NP -> ART ADJ N
  • 4. VP -> V
  • 5. VP -> V NP

Lexicon

cried:V, dogs:N,V, the:ART, old:ADJ,N, man:N,V

Sentences

  • 1. The dogs cried
  • 2. The old man cried
slide-14
SLIDE 14

Ch.3 Grammars and Parsing 14

Parsing as a Search procedure

  • You can think of Parsing as a search problem in

AI

  • Two strategies for searching Depth First Search

and Breadth First Search

  • DFS uses stack for the possibilities list
  • BFS uses queue for the possibilities list
  • Left recursion causes DFS to enter into an infinite

loop

slide-15
SLIDE 15

Ch.3 Grammars and Parsing 15

A Bottom-Up Chart Parser

  • Bottom up parser could be built simply by matching a

sequence of symbol against the LHS of a Production Rule.

  • This matching process can be formulated as a search

process

  • The state would simply consist of a symbol list, starting

with the words in the sentence.

  • Successor states could be generated by exploring all

possible ways to

– rewrite a word by its possible lexical categories – replace a sequence of symbols that matches the RHS of a grammar rule by its LHS symbol

slide-16
SLIDE 16

Ch.3 Grammars and Parsing 16

A Bottom-Up Chart Parser (cont.)

  • Such simple implementation is expensive
  • To avoid this problem, a data structure

called chart is introduced that allows the parser to store partial results

  • Matches are considered from the point of

view of one constituent, called the key

slide-17
SLIDE 17

Ch.3 Grammars and Parsing 17

Example

  • 1. S -> NP VP
  • 2. NP -> ART ADJ N
  • 3. NP -> ART N
  • 4. NP -> ADJ N
  • 5. VP -> V
  • 6. VP -> V NP
  • Assume that you are parsing a sentence that start with an
  • ART. With this ART as the key rule 2 and 3 are matched

because they start with ART

  • To record this for analyzing the next key, you need to record

that rules 2 and 3 could be continued at the point after the

  • ART. Thus you record

2’. NP-> ART@ADJ N 3’. NP -> ART @ N

slide-18
SLIDE 18

Ch.3 Grammars and Parsing 18

Example (cont.)

  • If the next input key is an ADJ, then rule 4 may be started,

and the modified rule 2’ may be extended to give

– 2’’ NP -> ART ADJ @ N

  • The chart maintains the constituents derived from the parse

in addition to the rules that have been matched partially. These rules are called active arcs

  • The basic operation of a chart-based parser involves

combining an active arc with a completed constituent. The result is either a new completed constituent or a new active arc that is an extension of the original active arc. New completed constituents are maintained on a list called the agenda

slide-19
SLIDE 19

Ch.3 Grammars and Parsing 19

A bottom-up charting algorithm

Do until there is no input left:

  • 1. If the agenda is empty, look up the interpretations for the

next word in the input and add them to the agenda

  • 2. Select a constituent from the agenda( let’s call it constituent

C from position p1 to p2)

  • 3. For each rule in the grammar of form X ->C X1….Xn add an

active arc of form X ->@CX1…Xn from position p1 to p2

  • 4. Add C to the chart using the arc extension algorithm.
slide-20
SLIDE 20

Ch.3 Grammars and Parsing 20

Arc Extension Algorithm

To add a constituent C from position p1 to p2

  • 1. Insert C into the chart from position p1 to p2
  • 2. For any active arc of the form X ->X1…@C…Xn from

position p0 to p1, add a new active arc X ->X1...C @…Xn from position p0 to p2.

  • 3. For any active arc of the form X -> X1 … Xn @C from

position p0 to p1, then add a new constituent of type X from p0 to p2 to the agenda.