[PPT] - CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis PowerPoint Presentation

SLIDE 1

CSE443 Compilers

Dr. Carl Alphonce

alphonce@buffalo.edu 343 Davis Hall

SLIDE 2

Phases of a compiler

Figure 1.6, page 5 of text

Syntactic structure

SLIDE 3

TOOLS

Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis: YACC/BISON (grammar -> parser)

SLIDE 4

Top-down & bottom-up

A top-down parser builds a parse tree from root to the leaves easier to construct by hand A bottom-up parser builds a parse tree from leaves to root Handles a larger class of grammars tools (yacc/bison) build bottom-up parsers

SLIDE 5

Our presentation First top-down, then bottom-up

Present top-down parsing first. Introduce necessary vocabulary and data structures. Move on to bottom-up parsing second.

SLIDE 6

vocab: look-ahead

The current symbol being scanned in the input is called the lookahead symbol.

token token token token token token PARSER

SLIDE 7

Top-down parsing

SLIDE 8

Top-down parsing

Start from grammar's start symbol Build parse tree so its yield matches input predictive parsing: a simple form of recursive descent parsing

SLIDE 9

If 𝛽∈(NUT)* then FIRST(𝛽) is "the set of terminals that appear as the first symbols of one or more strings of terminals generated from 𝛽." [p. 64] Ex: If A -> a 𝛾 then FIRST(A) = {a}

Ex. If A -> a 𝛾 | B then FIRST(A) = {a} ∪ FIRST(B)

FIRST(𝛽)

SLIDE 10

FIRST(𝛽)

First sets are considered when there are two (or more) productions to expand A ∈ N: A -> 𝛽 | 𝛾 Predictive parsing requires that FIRST(𝛽) ∩ FIRST(𝛾) = ∅

SLIDE 11

𝜁 productions

If lookahead symbol does not match first set, use 𝜁 production not to advance lookahead symbol but instead "discard" non-terminal:

ptexpt -> expr | 𝜁

"While parsing optexpr, if the lookahead symbol is not in FIRST(expr), then the 𝜁 production is used" [p. 66]

SLIDE 12

Left recursion

Grammars with left recursion are problematic for top-down parsers, as they lead to infinite regress.

SLIDE 13

Left recursion example

Grammar: expr -> expr + term | term term -> id FIRST sets for rule alternatives are not disjoint: FIRST(expr) = id FIRST(term) = id

expr + term expr + term expr + term term expr

SLIDE 14

Left recursion example

Grammar: expr -> expr + term | term term -> id FIRST sets for rule alternatives are not disjoint: FIRST(expr) = id FIRST(term) = id

expr + term expr + term expr + term term

𝛾 𝛽 𝛽 𝛽

expr

𝛾 𝛽

SLIDE 15

Rewriting grammar to remove left recursion

expr rule is of form A -> A 𝛽 | 𝛾 Rewrite as two rules A -> 𝛾 R R -> 𝛽 R | 𝜁

SLIDE 16

Back to example

Grammar is re- written as expr -> term R R -> + term R | 𝜁

expr R + term term

𝛾 𝛽 𝛽 𝛽

R + term R + term 𝜁 R

SLIDE 17

Ambiguity

A grammar G is ambiguous if ∃ 𝛕 ∈ 𝓜(G) that has two or more distinct parse trees. Example - dangling 'else':

if <expr> then if <expr> then <stmt> else <stmt> if <expr> then { if <expr> then <stmt> } else <stmt> if <expr> then { if <expr> then <stmt> else <stmt> }

SLIDE 18

dangling else resolution

usually resolved so else matches closest if- then we can re-write grammar to force this interpretation (ms = matched statement, os =

pen statement)

SLIDE 19

Left factoring

If two (or more) rules share a prefix then their FIRST sets do not distinguish between rule alternatives. If there is a choice point later in the rule, rewrite rule by factoring common prefix Example: rewrite A -> 𝛽 𝛾1 | 𝛽 𝛾2 as A -> 𝛽 A' A' -> 𝛾1 | 𝛾2

SLIDE 20

Predictive parsing:

a special case of recursive-descent parsing that does not require backtracking

Each non-terminal A ∈ N has an associated procedure: void A() { choose an A-production A -> X1 X2 … Xk for (i = 1 to k) { if (xi ∈ N) { call xi() } else if (xi = current input symbol) { advance input to next symbol } else error } }

SLIDE 21

Predictive parsing:

a special case of recursive-descent parsing that does not require backtracking

Each non-terminal A ∈ N has an associated procedure: void A() { choose an A-production A -> X1 X2 … Xk for (i = 1 to k) { if (xi ∈ N) { call xi() } else if (xi = current input symbol) { advance input to next symbol } else error } }

There is non-determinism in choice of production. If "wrong" choice is made the parser will need to revisit its choice by backtracking. A predictive parser can always made the correct choice here.

SLIDE 22

FIRST(X)

if X ∈ T then FIRST(X) = { X } if X ∈ N and X -> Y1 Y2 … Yk ∈ P for k≥1, then add a ∈ T to FIRST(X) if ∃i s.t. a ∈ FIRST(Yi) and 𝜁 ∈ FIRST(Yj) ∀ j < i (i.e. Y1 Y2 … Yk ⇒* 𝜁 ) if 𝜁 ∈ FIRST(Yj) ∀ j < k add 𝜁 to FIRST(X)

SLIDE 23

FOLLOW(X)

Place $ in FOLLOW(S), where S is the start symbol ($ is an end marker) if A -> 𝛽B𝛾 ∈ P, then FIRST(𝛾) - {𝜁} is in FOLLOW(B) if A -> 𝛽B ∈ P or A -> 𝛽B𝛾 ∈ P where 𝜁 ∈ FIRST(𝛾), then everything in FOLLOW(A) is in FOLLOW(B)

SLIDE 24

Table-driven predictive parsing Algorithm 4.32 (p. 224)

INPUT: Grammar G = (N,T,P,S) OUTPUT: Parsing table M For each production A -> 𝛽 of G:

1. For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to

M[A,a]

2. If 𝜁 ∈ FIRST(𝛽), then for each terminal b in

FOLLOW(A), add A -> 𝛽 to M[A,b]

3. If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to

M[A,$]

SLIDE 25

Example

G given by its productions: E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id

For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A - > 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]

SLIDE 26

FIRST SETS

FIRST(F) = { ( , id } FIRST(T) = FIRST(F) = { ( , id } FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , 𝜁 } FIRST(T') = { * , 𝜁 }

For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A - > 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]

if X ∈ T then FIRST(X) = { X } if X ∈ N and X -> Y1 Y2 … Yk ∈ P for k≥1, then add a ∈ T to FIRST(X) if ∃i s.t. a ∈ FIRST(Yi) and 𝜁 ∈ FIRST(Yj) ∀ j < i (i.e. Y1 Y2 … Yk ⇒* 𝜁 ) if 𝜁 ∈ FIRST(Yj) ∀ j < k add 𝜁 to FIRST(X) E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id

SLIDE 27

For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A - > 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]

Place $ in FOLLOW(S), where S is the start symbol ($ is an end marker) if A -> 𝛽B𝛾 ∈ P, then FIRST(𝛾) - {𝜁} is in FOLLOW(B) if A -> 𝛽B ∈ P or A -> 𝛽B𝛾 ∈ P where 𝜁 ∈ FIRST(𝛾), then everything in FOLLOW(A) is in FOLLOW(B) E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id

FOLLOW SETS

FOLLOW(E) = { ) , $ } FOLLOW(E') = FOLLOW(E) = { ) , $ } FOLLOW(T) = { + , ) , $ } FOLLOW(T') = FOLLOW(T) = { + , ) , $ } FOLLOW(F) = { + , * , ) , $ }

SLIDE 28

Parse-table M

NON TERMINALS

id + * ( ) $ E

E -> T E' E -> T E'

E'

E' -> + T E' E' -> 𝜁 E' -> 𝜁

T

T -> F T' T -> F T'

T'

T' -> 𝜁 T' -> * F T T' -> 𝜁 T' -> 𝜁

F

F -> id F -> ( E )

For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A -> 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]

FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , 𝜁 } FIRST(T') = { * , 𝜁 } FOLLOW(E') = FOLLOW(E) = { ) , $ } FOLLOW(T') = FOLLOW(T) = { + , ) , $ } FOLLOW(F) = { + , * , ) , $ } E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id

SLIDE 29

Algorithm 4.34 [p. 226]

INPUT: A string 𝜕 and a parsing table M for a grammar G=(N,T,P,S). OUTPUT: If 𝜕∈𝓜(G), a leftmost derivation of 𝜕, error otherwise

$ S $ 𝜕

stack input

utput

parser M

SLIDE 30

Algorithm 4.34 [p. 226]

Let a be the first symbol of 𝜕 Let X be the top stack symbol while (X ≠ $) { if (X == a) { pop the stack, advance a in 𝜕 } else if (X is a terminal) { error } else if (M[X,a] is blank) { error } else if (M[X,a] is X -> Y1 Y2 … Yk) {

utput X -> Y1 Y2 … Yk

pop the stack push Yk … Y2 Y1 onto the stack } Let X be the top stack symbol } Accept if a == X == $