TPG grammars are contained in the doc string of the parser class. TPG grammars may contain three parts:
See figure 5.1 for a generic TPG grammar.
Comments in TPG start with # and run until the end of the line.
# This is a comment
|
Some options can be set at the beginning of TPG grammars. The syntax for options is:
The lexer option tells TPG which lexer to use.
The word_boundary options tells the lexer to search for word boundaries after identifiers.
The sre module accepts some options to define the behaviour of the compiled regular expressions. These options can be changed for each parser.
Python code section are not handled by TPG. TPG won’t complain about syntax errors in Python code sections, it is Python’s job. They are copied verbatim to the generated Python parser.
Before TPG 3, Python code is enclosed in double curly brackets. That means that Python code must not contain to consecutive close brackets. You can avoid this by writting } } (with a space) instead of }} (without space). This syntaxe is still available but the new syntax may be more readable. The new syntax uses $ to delimit code sections. When several $ sections are consecutive they are seen as a single section.
Python code can appear in several parts of a grammar. Since indentation has a special meaning in Python it is important to know how TPG handles spaces and tabulations at the beginning of the lines.
When TPG encounters some Python code it removes in all non blank lines the spaces and tabulations that are common to every lines. TPG considers spaces and tabulations as the same character so it is important to always use the same indentation style. Thus it is advised not to mix spaces and tabulations in indentation. Then this code will be reindented when generated according to its location (in a class, in a method or in global space).
The figure 5.2 shows how TPG handles indentation.
|
TPG parsers are tpg.Parser classes. The grammar is the doc string of the class.
As TPG parsers are just Python classes, you can use them as normal classes. If you redefine the __init__ method, don’t forget to call tpg.Parser.__init__.
Each rule will be translated into a method of the parser.