Top Down Parsing

Eliminating Left Recursion

replace

E : E + T | T

with

E  : TE'
E' : +TE' | eps

Left Factoring

replace

S  : if (expr) S else S
   | if (expr) S
   | other

with (take common part out)

S  : if (expr) S S'
   | other
S' : else S 
   | eps

Recursive Descent

import re
 
def tokenize(expression):
    """
    look for numbers or brackets or operators
    """
    token_pattern = r"(\d+|[()\^+\-*/])"
    tokens = re.findall(token_pattern, expression)
    return tokens
 
# parse functions for each non terminal
def parse_factor(tokens):
    """
    F -> (E) | number
    """
    token = tokens.pop(0)
    if token == "(":
        result = parse_expression(tokens)
        # HACK: prolly should check below statement
        tokens.pop(0) # get rid of )
        return result
    else:
        return float(token)
 
def parse_exponent(tokens):
    """
    X -> F ^ X | F
    """
    left = parse_factor(tokens)
    while tokens and tokens[0] == "^":
        op = tokens.pop(0)
        right = parse_exponent(tokens)
        if op == "^":
            left **= right
    return left
 
def parse_term(tokens):
    """
    T -> X*T | X/T | X
    """
    left = parse_exponent(tokens)
    while tokens and tokens[0] in "*/":
        op = tokens.pop(0)
        right = parse_exponent(tokens)
        # NOTE: backtracking happening here
        # to eliminate start predicting shit (LL(1))
        if op == "*":
            left *= right
        else:
            left /= right
    return left
 
def parse_expression(tokens):
    """
    E -> T + E | T - E | T
    """
    left = parse_term(tokens)
    while tokens and tokens[0] in "+-":
        op = tokens.pop(0)
        right = parse_term(tokens)
        if op == "+":
            left += right
        else: 
            left -= right
    return left
 
def evaluate_expression(expression):
    tokens = tokenize(expression)
    result = parse_expression(tokens)
    return result
 
expr = "init"
while expr:
    try:
        print("parser: ", end='')
        expr = input()
        print(evaluate_expression(expr))
    except EOFError:
        print()
        print("parser: exiting")
        break

First and Follow sets

First :

"""
A := BC
"""
if eps not in first(B):
	first(A) = first(B) 
else if eps in first(B):
	first(A) = (first(B) - {eps}) U (first(C))

Follow:

"""
A := BC
"""
follow(C) = follow(A)
if eps not in follow(C):
	follow(B) = first(C) 
else if eps in follow(C):
	follow(B) = (first(C) - {eps}) U (follow(A))

compute first sets first
- go from bottom level production to top
then compute follow set
- go from top to bottom but might require multiple scans
examples: https://www.gatevidyalay.com/first-and-follow-compiler-design/
theory: https://cap.ecn.purdue.edu/compilers/lectures/Week3/lecture-3.3.pdf

LL(1) Grammars

left to right leftmost derivation with 1 lookahead token
A grammar is LL(1) iff whenever A → α | β are two distinct productions, the following three conditions hold:
1. For no terminal a do both α and β derive strings beginning with a.
2. At most one of α and β can derive the empty string.
3. If β derives the empty string, then α does not derive any string beginning with a terminal in FOLLOW(A).
  Likewise, if α derives the empty string, then β does not derive any string beginning with a terminal in FOLLOW(A).

for each production "A -> a":
	for each terminal "a" in first("A"):
		M["A"]["a"] = "A -> a"
	if eps in first("A"):
		for each "b" in follow("A"):
			M["A"]["b"] = "A -> eps"

example 1

S : +SS
  | *SS
  | a

$FIRST(S) = {+, *, a}$ $\text{FOLLOW(S)} = \set{+, *, a, \$ }$

Non terminal	a	+	*	$
S	S → a	S → +SS	S → *SS

example 2

S : (S)S
  | eps

$FIRST(S) = {(, ε}$ $\text{FOLLOW(S)} = \set{\rparen, \$ }$

	(	)	$
S	S → (S)S	S → $ε$	S → $ε$

Bottom Up Parsing

Shift-Reduce Method

Shift: shift the next input symbol onto the top of the stack
Reduce: the right end of the string to be reduced must be on the top of the stack. Locate the left end on the stack and replace with appropriate non-terminal
Accept: announce successful completion of parsing

stack	input	action
$	a * a $	shift
$ a	* a $	reduce F → a
$ F	* a $	reduce T → F
$ T	* a $	shift
$ T *	a $	shift
$ T * a	$	reduce F → a
$ T * F	$	reduce T → T * F
$ T	$	reduce E → T
$ E	$	accept

Bottom of stack always has $
End of input always has $

CAUTION

not good with ambiguous grammars

if you have time and want some more intuition on the automaton: https://blog.jeffsmits.net/lr-parsing-recursive-ascent/

LR Parsers (LR(0))

automaton: https://www.youtube.com/watch?v=BxMFn7aelBk
table: https://www.youtube.com/watch?v=YjEck_rmrQs
Items: productions that keep track of how much has been consumed
Closure: Include every possible production from a main production
Augmented Grammar: extra state to add an accept

LR(0) parsers

$L R (0)$ parsers don’t lookahead, so you reduce on every input token for a state

Simple LR Parsers

Now you are looking ahead one token. So you will only reduce when you encounter symbols in the follow set of the required non-terminal, instead of filling up every cell.

CLR Parsers

Example of LR(1) parser. So you make LR(1) automaton which includes a lookahead token derived from the first/follow sets

peak: https://youtu.be/sh_X56otRdU?si=JH0TW6yPzWRzowZ3
slightest touch of theory: https://cs.nyu.edu/~gottlieb/courses/2000s/2006-07-fall/compilers/lectures/lecture-07.html

similar to SLR parser but a bit more specific. Also takes a lot more memory and everyone hates it

LALR Parsers

bruh: https://www.youtube.com/playlist?list=PLxP0p—aBHmJKqrSFBatFypgviqGRxMKOi
you’re still wasting computation by doing the same stuff as CLR, but then you just combine rows and states to create a slightly smaller parsing table
just use SLR atp
also may lead to random fuckass conflicts on an originally non-conflicted grammar wow

its preferred over SLR just because of this lol

subzcuber's notes

Explorer

Parsers

Top Down Parsing

Eliminating Left Recursion

Left Factoring

Recursive Descent

First and Follow sets

LL(1) Grammars

Bottom Up Parsing

Shift-Reduce Method

LR Parsers (LR(0))

Simple LR Parsers

CLR Parsers

LALR Parsers

Graph View

Table of Contents

Backlinks