CSE443 Compilers
- Dr. Carl Alphonce
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - - PowerPoint PPT Presentation
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Phases of a Syntactic compiler structure Figure 1.6, page 5 of text TOOLS Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis: YACC/BISON (grammar
Figure 1.6, page 5 of text
A top-down parser builds a parse tree from root to the leaves easier to construct by hand A bottom-up parser builds a parse tree from leaves to root Handles a larger class of grammars tools (yacc/bison) build bottom-up parsers
token token token token token token PARSER
If 𝛽∈(NUT)* then FIRST(𝛽) is "the set of terminals that appear as the first symbols of one or more strings of terminals generated from 𝛽." [p. 64] Ex: If A -> a 𝛾 then FIRST(A) = {a}
If lookahead symbol does not match first set, use 𝜁 production not to advance lookahead symbol but instead "discard" non-terminal:
"While parsing optexpr, if the lookahead symbol is not in FIRST(expr), then the 𝜁 production is used" [p. 66]
Grammar: expr -> expr + term | term term -> id FIRST sets for rule alternatives are not disjoint: FIRST(expr) = id FIRST(term) = id
expr + term expr + term expr + term term expr
Grammar: expr -> expr + term | term term -> id FIRST sets for rule alternatives are not disjoint: FIRST(expr) = id FIRST(term) = id
expr + term expr + term expr + term term
expr
expr R + term term
R + term R + term 𝜁 R
A grammar G is ambiguous if ∃ 𝛕 ∈ 𝓜(G) that has two or more distinct parse trees. Example - dangling 'else':
if <expr> then if <expr> then <stmt> else <stmt> if <expr> then { if <expr> then <stmt> } else <stmt> if <expr> then { if <expr> then <stmt> else <stmt> }
usually resolved so else matches closest if- then we can re-write grammar to force this interpretation (ms = matched statement, os =
<stmt> -> <ms> | <os> <ms> -> if <expr> then <ms> else <ms> | … <os> -> if <expr> then <stmt> | if <expr> then <ms> else <os>
If two (or more) rules share a prefix then their FIRST sets do not distinguish between rule alternatives. If there is a choice point later in the rule, rewrite rule by factoring common prefix Example: rewrite A -> 𝛽 𝛾1 | 𝛽 𝛾2 as A -> 𝛽 A' A' -> 𝛾1 | 𝛾2
a special case of recursive-descent parsing that does not require backtracking
Each non-terminal A ∈ N has an associated procedure: void A() { choose an A-production A -> X1 X2 … Xk for (i = 1 to k) { if (xi ∈ N) { call xi() } else if (xi = current input symbol) { advance input to next symbol } else error } }
a special case of recursive-descent parsing that does not require backtracking
Each non-terminal A ∈ N has an associated procedure: void A() { choose an A-production A -> X1 X2 … Xk for (i = 1 to k) { if (xi ∈ N) { call xi() } else if (xi = current input symbol) { advance input to next symbol } else error } }
There is non-determinism in choice of production. If "wrong" choice is made the parser will need to revisit its choice by backtracking. A predictive parser can always made the correct choice here.
if X ∈ T then FIRST(X) = { X } if X ∈ N and X -> Y1 Y2 … Yk ∈ P for k≥1, then add a ∈ T to FIRST(X) if ∃i s.t. a ∈ FIRST(Yi) and 𝜁 ∈ FIRST(Yj) ∀ j < i (i.e. Y1 Y2 … Yk ⇒* 𝜁 ) if 𝜁 ∈ FIRST(Yj) ∀ j < k add 𝜁 to FIRST(X)
INPUT: Grammar G = (N,T,P,S) OUTPUT: Parsing table M For each production A -> 𝛽 of G:
M[A,a]
FOLLOW(A), add A -> 𝛽 to M[A,b]
M[A,$]
For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A - > 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]
FIRST(F) = { ( , id } FIRST(T) = FIRST(F) = { ( , id } FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , 𝜁 } FIRST(T') = { * , 𝜁 }
For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A - > 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]
if X ∈ T then FIRST(X) = { X } if X ∈ N and X -> Y1 Y2 … Yk ∈ P for k≥1, then add a ∈ T to FIRST(X) if ∃i s.t. a ∈ FIRST(Yi) and 𝜁 ∈ FIRST(Yj) ∀ j < i (i.e. Y1 Y2 … Yk ⇒* 𝜁 ) if 𝜁 ∈ FIRST(Yj) ∀ j < k add 𝜁 to FIRST(X) E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id
For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A - > 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]
Place $ in FOLLOW(S), where S is the start symbol ($ is an end marker) if A -> 𝛽B𝛾 ∈ P, then FIRST(𝛾) - {𝜁} is in FOLLOW(B) if A -> 𝛽B ∈ P or A -> 𝛽B𝛾 ∈ P where 𝜁 ∈ FIRST(𝛾), then everything in FOLLOW(A) is in FOLLOW(B) E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id
FOLLOW(E) = { ) , $ } FOLLOW(E') = FOLLOW(E) = { ) , $ } FOLLOW(T) = { + , ) , $ } FOLLOW(T') = FOLLOW(T) = { + , ) , $ } FOLLOW(F) = { + , * , ) , $ }
NON TERMINALS
id + * ( ) $ E
E -> T E' E -> T E'
E'
E' -> + T E' E' -> 𝜁 E' -> 𝜁
T
T -> F T' T -> F T'
T'
T' -> 𝜁 T' -> * F T T' -> 𝜁 T' -> 𝜁
F
F -> id F -> ( E )
For each production A -> 𝛽 of G: For each terminal a ∈ FIRST(𝛽), add A -> 𝛽 to M[A,a] If 𝜁 ∈ FIRST(𝛽), then for each terminal b in FOLLOW(A), add A -> 𝛽 to M[A,b] If 𝜁 ∈ FIRST(𝛽) and $ ∈ FOLLOW(A), add A -> 𝛽 to M[A,$]
FIRST(E) = FIRST(T) = FIRST(F) = { ( , id } FIRST(E') = { + , 𝜁 } FIRST(T') = { * , 𝜁 } FOLLOW(E') = FOLLOW(E) = { ) , $ } FOLLOW(T') = FOLLOW(T) = { + , ) , $ } FOLLOW(F) = { + , * , ) , $ } E -> T E' E' -> + T E' | 𝜁 T -> F T' T' -> * F T' | 𝜁 F -> ( E ) | id
INPUT: A string 𝜕 and a parsing table M for a grammar G=(N,T,P,S). OUTPUT: If 𝜕∈𝓜(G), a leftmost derivation of 𝜕, error otherwise
$ S $ 𝜕
stack input
parser M
Let a be the first symbol of 𝜕 Let X be the top stack symbol while (X ≠ $) { if (X == a) { pop the stack, advance a in 𝜕 } else if (X is a terminal) { error } else if (M[X,a] is blank) { error } else if (M[X,a] is X -> Y1 Y2 … Yk) {
pop the stack push Yk … Y2 Y1 onto the stack } Let X be the top stack symbol } Accept if a == X == $