Context-free Grammars CSCI 3130 Formal Languages and Automata Theory - - PowerPoint PPT Presentation

context free grammars
SMART_READER_LITE
LIVE PREVIEW

Context-free Grammars CSCI 3130 Formal Languages and Automata Theory - - PowerPoint PPT Presentation

1/34 Context-free Grammars CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2015 2/34 Precedence in Arithmetic Expressions 5 3 * 2 + or 5 3 2 + * bash$ python Python 2.7.9 (default,


slide-1
SLIDE 1

1/34

Context-free Grammars

CSCI 3130 Formal Languages and Automata Theory Siu On CHAN

Chinese University of Hong Kong

Fall 2015

slide-2
SLIDE 2

2/34

Precedence in Arithmetic Expressions

bash$ python Python 2.7.9 (default, Apr 2 2015, 15:33:21) >>> 2+3*5 17

* + 2 3 5

= 25

  • r

+ 2 * 3 5

= 17

slide-3
SLIDE 3

3/34

Grammars describe meaning

EXPR → EXPR + TERM EXPR → TERM TERM → TERM * NUM TERM → NUM NUM → 0-9 rules for valid (simple) arithmetic expressions EXPR EXPR TERM NUM 2 + TERM TERM NUM 3 * NUM 5 Rules always yield the correct meaning

slide-4
SLIDE 4

4/34

Grammar of English

SENTENCE → NOUN-PHRASE VERB-PHRASE a girl

NOUN-PHRASE

likes the boy

  • VERB-PHRASE

NOUN-PHRASE → A-NOUN

  • r → A-NOUN PREP-PHRASE

a girl

A-NOUN

a girl

A-NOUN

with a flower

  • PREP-PHRASE
slide-5
SLIDE 5

5/34

Grammar of English

NOUN-PHRASE → A-NOUN

  • r → A-NOUN PREP-PHRASE

a girl

A-NOUN

a girl

A-NOUN

with a flower

  • PREP-PHRASE

PREP-PHRASE → PREP NOUN-PHRASE with

  • PREP

a flower

  • NOUN-PHRASE

Recursive structure

slide-6
SLIDE 6

5/34

Grammar of English

NOUN-PHRASE → A-NOUN

  • r → A-NOUN PREP-PHRASE

a girl

A-NOUN

a girl

A-NOUN

with a flower

  • PREP-PHRASE

PREP-PHRASE → PREP NOUN-PHRASE with

  • PREP

a flower

  • NOUN-PHRASE

Recursive structure

slide-7
SLIDE 7

6/34

Grammar of (parts of) English

SENTENCE → NOUN-PHRASE VERB-PHRASE NOUN-PHRASE → A-NOUN NOUN-PHRASE → A-NOUN PREP-PHRASE VERB-PHRASE → CMPLX-VERB VERB-PHRASE → CMPLX-VERB PREP-PHRASE PREP-PHRASE → PREP A-NOUN A-NOUN → ARTICLE NOUN CMPLX-VERB → VERB NOUN-PHRASE CMPLX-VERB → VERB ARTICLE → a ARTICLE → the NOUN → boy NOUN → girl NOUN → flower VERB → likes VERB → touches VERB → sees PREP → with

slide-8
SLIDE 8

7/34

The meaning of sentences

a girl with a flower likes the boy

ARTICLENOUN PREP ARTICLE NOUN VERB ARTICLE NOUN

A-NOUN A-NOUN A-NOUN PREP-PHRASE NOUN-PHRASE CMPLX-VERB NOUN-PHRASE VERB-PHRASE SENTENCE

slide-9
SLIDE 9

7/34

The meaning of sentences

a girl with a flower likes the boy

ARTICLENOUN PREP ARTICLE NOUN VERB ARTICLE NOUN

A-NOUN A-NOUN A-NOUN PREP-PHRASE NOUN-PHRASE CMPLX-VERB NOUN-PHRASE VERB-PHRASE SENTENCE

slide-10
SLIDE 10

7/34

The meaning of sentences

a girl with a flower likes the boy

ARTICLENOUN PREP ARTICLE NOUN VERB ARTICLE NOUN

A-NOUN A-NOUN A-NOUN PREP-PHRASE NOUN-PHRASE CMPLX-VERB NOUN-PHRASE VERB-PHRASE SENTENCE

slide-11
SLIDE 11

8/34

Context-free grammar

A → 0A1 A → B B → # A, B are variables

0, 1 are terminals

A → 0A1 is a production A is the start variable A

0A1 00A11 000A111 000B111 000#111 derivation

slide-12
SLIDE 12

8/34

Context-free grammar

A → 0A1 A → B B → # A, B are variables

0, 1 are terminals

A → 0A1 is a production A is the start variable A ⇒ 0A1 ⇒ 00A11 ⇒ 000A111 ⇒ 000B111 ⇒ 000#111

derivation

slide-13
SLIDE 13

9/34

Context-free grammar

A context-free grammar is given by (V, Σ, R, S) where

◮ V is a finite set of variables or non-terminals ◮ Σ is a finite set of terminals ◮ R is a set of productions or substitution rules of the form

A → α A is a variable and α is a string of variables and terminals

◮ S ∈ V is a variable called the start variable

slide-14
SLIDE 14

10/34

Notation and conventions

E → E+E E → (E) E → N N → 0N N → 1N N → 0 N → 1

Variables: E, N Terminals: +, (, ), 0, 1 Start variable: E shorthand:

E → E+E | (E) | N N → 0N | 1N | 0 | 1

conventions: variables in UPPERCASE start variable comes first

slide-15
SLIDE 15

11/34

Derivation

derivation: a sequential application of productions

E ⇒ E+E ⇒ (E)+E ⇒ (E)+N ⇒ (E)+1 ⇒ (E+E)+1 ⇒ (N+E)+1 ⇒ (N+N)+1 ⇒ (N+1N)+1 ⇒ (N+10)+1 ⇒ (1+10)+1

derivation

E → E+E | (E) | N N → 0N | 1N | 0 | 1 α ⇒ β

application of one production

E

(1+10)+1 derivation

slide-16
SLIDE 16

11/34

Derivation

derivation: a sequential application of productions

E ⇒ E+E ⇒ (E)+E ⇒ (E)+N ⇒ (E)+1 ⇒ (E+E)+1 ⇒ (N+E)+1 ⇒ (N+N)+1 ⇒ (N+1N)+1 ⇒ (N+10)+1 ⇒ (1+10)+1

derivation

E → E+E | (E) | N N → 0N | 1N | 0 | 1 α ⇒ β

application of one production

E

⇒ (1+10)+1 α

⇒ β

derivation

slide-17
SLIDE 17

12/34

Context-free languages

The language of a CFG is the set of all strings at the end of a derivation

L(G) = {w ∈ Σ∗ | S

⇒ w}

Questions we will ask: I give you a CFG, what is the language? I give you a language, write a CFG for it

slide-18
SLIDE 18

13/34

Analysis example 1

A → 0A1 | B B → # L(G) = {0n#1n | n 0}

Can you derive: 00#11

A

0A1 00A11 00B11 00#11 #

A B

# 00#111 No: uneven number of 0s and 1s 00##11 No: too many #

slide-19
SLIDE 19

13/34

Analysis example 1

A → 0A1 | B B → # L(G) = {0n#1n | n 0}

Can you derive: 00#11

A ⇒ 0A1 ⇒ 00A11 ⇒ 00B11 ⇒ 00#11

#

A B

# 00#111 No: uneven number of 0s and 1s 00##11 No: too many #

slide-20
SLIDE 20

13/34

Analysis example 1

A → 0A1 | B B → # L(G) = {0n#1n | n 0}

Can you derive: 00#11

A ⇒ 0A1 ⇒ 00A11 ⇒ 00B11 ⇒ 00#11

#

A ⇒ B ⇒ #

00#111 No: uneven number of 0s and 1s 00##11 No: too many #

slide-21
SLIDE 21

13/34

Analysis example 1

A → 0A1 | B B → # L(G) = {0n#1n | n 0}

Can you derive: 00#11

A ⇒ 0A1 ⇒ 00A11 ⇒ 00B11 ⇒ 00#11

#

A ⇒ B ⇒ #

00#111 No: uneven number of 0s and 1s 00##11 No: too many #

slide-22
SLIDE 22

14/34

Analysis example 2

S → SS | (S) | ε

Can you derive ()

S ⇒ (S) ⇒ ()

(()())

S

(S) (SS) ((S)S) ((S)(S)) (()(S)) (()())

slide-23
SLIDE 23

14/34

Analysis example 2

S → SS | (S) | ε

Can you derive ()

S ⇒ (S) ⇒ ()

(()())

S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ ((S)(S)) ⇒ (()(S)) ⇒ (()())

slide-24
SLIDE 24

15/34

Parse trees

S → SS | (S) | ε

A parse tree gives a more compact representation

S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ ((S)(S)) ⇒ (()(S)) ⇒ (()()) S

(

S S

(

S ε

)

S

(

S ε

) )

slide-25
SLIDE 25

16/34

Parse trees

S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ ((S)(S)) ⇒ (()(S)) ⇒ (()()) S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ (()S) ⇒ (()(S)) ⇒ (()())

S

(

S S

(

S ε

)

S

(

S ε

) ) S ⇒ (S) ⇒ (SS) ⇒ (S(S)) ⇒ ((S)(S)) ⇒ (()(S)) ⇒ (()()) S ⇒ (S) ⇒ (SS) ⇒ (S(S)) ⇒ (S()) ⇒ ((S)()) ⇒ (()()) One parse tree can represent many derivations

slide-26
SLIDE 26

17/34

Analysis example 2

S → SS | (S) | ε

Can you derive (()() No: uneven number of ( and ) ())(() No: some prefix has too many )

slide-27
SLIDE 27

17/34

Analysis example 2

S → SS | (S) | ε

Can you derive (()() No: uneven number of ( and ) ())(() No: some prefix has too many )

slide-28
SLIDE 28

17/34

Analysis example 2

S → SS | (S) | ε

Can you derive (()() No: uneven number of ( and ) ())(() No: some prefix has too many )

slide-29
SLIDE 29

18/34

Analysis example 2

S → SS | (S) | ε L G w w has the same number of ( and )

no prefix of w has more ) than (

S S S S S S S S S

( ( ) ( ) ) ( ) Parsing rules: Divide w into blocks with same number of ( and ) Each block is in L G Parse each block recursively

slide-30
SLIDE 30

18/34

Analysis example 2

S → SS | (S) | ε L(G) = {w | w has the same number of ( and )

no prefix of w has more ) than (}

S S S S S ε S S ε S S ε

( ( ) ( ) ) ( ) Parsing rules: Divide w into blocks with same number of ( and ) Each block is in L(G) Parse each block recursively

slide-31
SLIDE 31

19/34

Design example 1

L = {0n1n | n 0}

These strings have recursive structure 00001111 000111 0011 01

ε S

0S1

slide-32
SLIDE 32

19/34

Design example 1

L = {0n1n | n 0}

These strings have recursive structure 00001111 000111 0011 01

ε S → 0S1 | ε

slide-33
SLIDE 33

20/34

Design example 2

L = {0n1n0m1m | n 0, m 0}

Examples: 010011 00110011 000111 These strings have two parts:

L L L L

n n

n L

m m

m

rules for L

S

0S 1

L is the same as L S S S S

0S 1

slide-34
SLIDE 34

20/34

Design example 2

L = {0n1n0m1m | n 0, m 0}

Examples: 010011 00110011 000111 These strings have two parts:

L = L1L2 L1 = {0n1n | n 0} L2 = {0m1m | m 0}

rules for L1 : S1 → 0S11 | ε

L2 is the same as L1 S → S1S1 S1 → 0S11 | ε

slide-35
SLIDE 35

21/34

Design example 3

L = {0n1m0m1n | n 0, m 0}

Examples: 011001 0011 1100 00110011 These strings have a nested structure:

  • uter part: 0n1n

inner part: 1m0m

S

0S1

I I

1I 0

slide-36
SLIDE 36

21/34

Design example 3

L = {0n1m0m1n | n 0, m 0}

Examples: 011001 0011 1100 00110011 These strings have a nested structure:

  • uter part: 0n1n

inner part: 1m0m

S → 0S1 | I I → 1I 0 | ε

slide-37
SLIDE 37

22/34

Design example 4

L = {x | x has two 0-blocks with the same number 0s}

01011, 001011001, 10010101000 allowed 11001000, 01111 not allowed 1 0 0 1

initial part

A

0 0 1 1 0 1 0 0

middle part

B

1 0 1 1 0

final part

C

A:

cannot end in 0

C:

cannot begin with 0

slide-38
SLIDE 38

22/34

Design example 4

L = {x | x has two 0-blocks with the same number 0s}

01011, 001011001, 10010101000 allowed 11001000, 01111 not allowed 1 0 0 1

initial part

A

0 0 1 1 0 1 0 0

  • middle part

B

1 0 1 1 0

final part

C

A:

cannot end in 0

C:

cannot begin with 0

slide-39
SLIDE 39

23/34

Design example 4

1 0 0 1

A

0 0 1 1 0 1 0 0

  • B

1 0 1 1 0

C

S → ABC A → ε | U1 U → 0U | 1U | ε C → ε | 1U B

0D0 0B0

D

1U1 1

A: ε, or ends in 1 C: ε, or begins with 1 U:

any string

B has recursive structure

0 0

D

1 1 0 1 0 0 same number of 0s at least one 0

D:

begins and ends in 1

slide-40
SLIDE 40

23/34

Design example 4

1 0 0 1

A

0 0 1 1 0 1 0 0

  • B

1 0 1 1 0

C

S → ABC A → ε | U1 U → 0U | 1U | ε C → ε | 1U B → 0D0 | 0B0 D → 1U1 | 1 A: ε, or ends in 1 C: ε, or begins with 1 U:

any string

B has recursive structure

0 0

D

1 1 0 1 0 0 same number of 0s at least one 0

D:

begins and ends in 1

slide-41
SLIDE 41

24/34

Context-free versus regular

Write a CFG for the language (0 + 1)∗111

S U111 U

0U 1U Can you do so for every regular language? Every regular language is context-free regular expression NFA DFA

slide-42
SLIDE 42

24/34

Context-free versus regular

Write a CFG for the language (0 + 1)∗111

S → U111 U → 0U | 1U | ε

Can you do so for every regular language? Every regular language is context-free regular expression NFA DFA

slide-43
SLIDE 43

24/34

Context-free versus regular

Write a CFG for the language (0 + 1)∗111

S → U111 U → 0U | 1U | ε

Can you do so for every regular language? Every regular language is context-free regular expression NFA DFA

slide-44
SLIDE 44

25/34

From regular to context-free

regular expression

⇒ CFG ∅

grammar with no rules

ε S → ε

a (alphabet symbol)

S → a E1 + E2 S → S1 | S2 E1E2 S → S1S2 E∗

1

S → SS1 | ε S becomes the new start variable

slide-45
SLIDE 45

26/34

Context-free versus regular

Is every context-free language regular?

S

0S1

L

0n1n

n

Is context-free but not regular regular context-free

slide-46
SLIDE 46

26/34

Context-free versus regular

Is every context-free language regular?

S → 0S1 L = {0n1n | n 0}

Is context-free but not regular regular context-free

slide-47
SLIDE 47

27/34

Ambiguity

slide-48
SLIDE 48

28/34

Ambiguity

E → E+E | E*E | (E) | N N → 1N | 2N | 1 | 2

1+2*2 * + 1 2 2

= 6

+ 1 * 2 2

= 5

A CFG is ambiguous if some string has more than one parse tree

slide-49
SLIDE 49

29/34

Example

Is S → SS|x ambiguous? Yes, because

S S S

x

S

x

S

x

S S

x

S S

x

S

x Two ways to derive xxx

slide-50
SLIDE 50

29/34

Example

Is S → SS|x ambiguous? Yes, because

S S S

x

S

x

S

x

S S

x

S S

x

S

x Two ways to derive xxx

slide-51
SLIDE 51

30/34

Disambiguation

S → SS|x ⇒ S → Sx|x S S S

x x x Sometimes we can rewrite the grammar to remove ambiguity

slide-52
SLIDE 52

31/34

Disambiguation

E → E+E | E*E | (E) | N N → 1N | 2N | 1 | 2

+ and * have the same precedence! Dived expression into terms and factors 2 * ( 1 + 2 * 2 )

F F T T F T

slide-53
SLIDE 53

32/34

Disambiguation

E → E+E | E*E | (E) | N N → 1N | 2N | 1 | 2

An expression is a sum of one or more terms

E → T | E+T

Each term is a product of one or more factors

T → F | T*F

Each factor is a parenthesized expression or a number

F → (E) | 1 | 2

slide-54
SLIDE 54

33/34

Parsing example

E → T | E+T T → F | T*F F → (E) | 1 | 2

Parse tree for 2+(1+1+2*2)+1

E E T T F

2 +

F

(

E E E T F

1 + T

F

1 +

T T F

2 * F 2 ) +

T F

1

slide-55
SLIDE 55

34/34

Disambiguation

Disambiguation is not always possible because There exists inherently ambiguous languages There is no general procedure for disambiguation In programming languages, ambiguity comes from the precedence rules, and we can resolve like in the example In English, ambiguity is sometimes a problem: I look at the dog with one eye

slide-56
SLIDE 56

34/34

Disambiguation

Disambiguation is not always possible because There exists inherently ambiguous languages There is no general procedure for disambiguation In programming languages, ambiguity comes from the precedence rules, and we can resolve like in the example In English, ambiguity is sometimes a problem:

  • I look at
  • the dog with one eye