This document tries to explain how the syntax matching works. You propably
don't want to know, but it's became so complex that I keep forgetting how
it works.
1. How it Works(tm)
Before parsing the program reads all .syntax files for parsing
information, and .list files for list of registers etc. There
is no limit for number of files, or syntaxes in them.
All compares are case insensitive.
For every input line:
- Extend tabs to spaces and removes comments.
- Try to match it with every rule in main.syntax.
Parsing rules:
- Normal characters are matched exactly.
- Space in parse description matches any amount of space in input line.
- `text´ inside '<>' is considered a special tag and parsed accordingly,
after successfull match it's value is stored into a variable of the
same name (`text´).
After matching parse line the syntax file may assign values to any
variables. Assignments are recognized from '@' as the first char of line.
Output rules:
- Normal characters (including space) are copied to output.
- `text´ inside '<>' is replaced with value of a variable called `text´.
2. Files
.syntax file syntax:
-
parse rule
[@varname=value]
output format
-
.list file syntax:
[@varname=value]
-
val1
val2
...
4. Tags
Format of a tag is following: <tagnameN:hh>c, where
- tagname is the name of the tag and the variable where the result is stored.
- N an optional number (0-9) to store the result in different variable.
- hh an optional hex bitmask (00-ff) for list matches, defaults to ff.
every .list file with same name should have an unique bit set in hh.
- c next character in parse rule (has meaning with some tags)
Tag parsing rules, when tag is a:
- list name find the longest match and store it into variable.
- ´text´ copy chars to var until next char in parse rule.
- syntax name
* copy chars to temp buffer until next char in parse rule.
* try to match buffer contents with syntax rules.
* all variables are preserved (no communication between syntaxes).
* the output of amatching rule is stored in the variable.
Special tags:
<nl> output a new line character
<cm> matches a '#'
5. Current set of files
Syntax files:
main.syntax the basic syntax file
num.syntax converts hex,dec,oct,bin numbers
address.syntax converts addresses inside '[]'
mathnum.syntax matches complex immediate numbers
List files:
op.nn.list opcodes
reg.nn.list registers, (sets 'r' to b/w/l matching the register size)
+-.nn.list + or - sign, sets 'sgn' to empty or -
size.nn.list byte,dword... memory sizes, sets 'm' to match the proper b/w/l
data.nn.list dd, db... sizes used in variable creation
discard.nn.list short... discarded