- Rikki Gibson
- Miles Van de Wetering
- Jaden Diefenbaugh
We implemented an object-oriented interface for PLY that removed all dependencies on globally-declared elements. This allows the user to interact with PLY alongside arbitrary Python code. To do so, we designed a system of classes that represent the previously necessary global elements for PLY to consume.
We decided against modifying PLY directly because the codebase for it was too messy and complex.
An encapsulation of an entire language file for PLY. Contains:
- A list of
GrammarRules, which contain rules for parsing and matching - A list of
TokenPatterns, the regex-like strings that match tokens and substrings of the input together A Grammar instance will have its FIRST and FOLLOW sets, as well as checks for if it is in LL(1) or not, performed when it is initialized.
This class contains all information necessary to parse a token. It includes:
- A
stringthat functions as a regex to match the token - A
functionthat consumes and produces a LexToken object (the PLY class for tokens) - A
stringname of the token
A GrammarRule represents one rule in a grammar. It has:
- A
stringrepresenting the LHS of the rule - A list of
Productions that represent one production of the rule - A
functioncalled when the rule is parsed
A Production is a series of symbols (each which represents a TokenPattern or a GrammarRule). It consists of:
- A list of
TokenPatterns or/andGrammarRules
To convert this class-based representation of a PLY to something the PLY engine can parse, we generate a Python module object from the given instances of the classes above, which we then pass to PLY. The module has attributes set on it based on the PLY conventions for names of lexing and parsing functions.
Our LL(1) parser has two steps: validating the grammar is indeed LL(1), and generating the LL(1) parse tree. The validation checks for three conditions: FIRST/FIRST conflicts, FIRST/FOLLOW conflicts, and left recursion. If any of the three are found, PLY's Yacc parser (LALR) is used instead. These conditions are checked in:
has_firstfirstconflicthas_firstfollowconflicthas_leftrecursion
The parsing algorithm recursively generates the parse tree using the given grammar, a stack of the current symbols, and the list of remaining input tokens. The current list of symbols are expanded according to the LL(1) parsing algorithm, in depth-first order. Once a terminal is reached, the terminal's token (the left-most token) is consumed and returned. A non-terminal, once parsed, returns an array containing the non-terminal's name and the parse trees of each of its expanded production's symbols' parse trees. The parsing continues at the next-lowest unparsed symbols.
While all three of us worked together on the entire project, Miles & Rikki did the majority of the PLY Interface and LL(1) Parser code. Jaden also wrote the report.
- Install PLY.
python example.pyto see our example language and syntax tree produced using our LL(1) parser.- See the
evil_rulesandgodly_rulesobjects inexample.pyto understand how the object-oriented interface to PLY works.