The pyparsing module is an alternative approach to creating and
executing simple grammars, vs. the traditional lex/yacc approach, or the
use of regular expressions. With pyparsing, you don't need to learn a
new syntax for defining grammars or matching expressions - the parsing
module provides a library of classes that you use to construct the
grammar directly in Python.
Here is a program to parse "Hello, World!" (or any greeting
of the form "<salutation>, <addressee>!"):
The Python representation of the grammar is quite readable, owing to
the self-explanatory class names, and the use of '+', '|' and '^'
operators.
The parsed results returned from parseString() can be accessed as a
nested list, a dictionary, or an object with named attributes.
The pyparsing module handles some of the problems that are typically
vexing when writing text parsers:
|
basestring
str(object) -> string
|
|
_Constants
|
|
ParseBaseException
base exception class for all parsing runtime exceptions
|
|
ParseException
exception thrown when parse expressions don't match class;
supported attributes by name are:
|
|
ParseFatalException
user-throwable exception thrown when inconsistent parse content is
found; stops all parsing immediately
|
|
ParseSyntaxException
just like ParseFatalException, but thrown internally when an
ErrorStop indicates that parsing is to stop immediately because an
unbacktrackable syntax error has been found
|
|
RecursiveGrammarException
exception thrown by validate() if the grammar could be improperly
recursive
|
|
_ParseResultsWithOffset
|
|
ParseResults
Structured parse results, to provide multiple means of access to
the parsed data:
|
|
ParserElement
Abstract base level parser element class.
|
|
Token
Abstract ParserElement subclass, for defining atomic matching
patterns.
|
|
Empty
An empty token, will always match.
|
|
NoMatch
A token that will never match.
|
|
Literal
Token to exactly match a specified string.
|
|
_L
Token to exactly match a specified string.
|
|
Keyword
Token to exactly match a specified string as a keyword, that is, it
must be immediately followed by a non-keyword character.
|
|
CaselessLiteral
Token to match a specified string, ignoring case of letters.
|
|
CaselessKeyword
|
|
Word
Token for matching words composed of allowed character sets.
|
|
Regex
Token for matching strings that match a given regular expression.
|
|
QuotedString
Token for matching strings that are delimited by quoting
characters.
|
|
CharsNotIn
Token for matching words composed of characters *not* in a given
set.
|
|
White
Special matching class for matching whitespace.
|
|
_PositionToken
|
|
GoToColumn
Token to advance to a specific column of input text; useful for
tabular report scraping.
|
|
LineStart
Matches if current position is at the beginning of a line within
the parse string
|
|
LineEnd
Matches if current position is at the end of a line within the
parse string
|
|
StringStart
Matches if current position is at the beginning of the parse string
|
|
StringEnd
Matches if current position is at the end of the parse string
|
|
WordStart
Matches if the current position is at the beginning of a Word, and
is not preceded by any character in a given set of wordChars
(default=printables).
|
|
WordEnd
Matches if the current position is at the end of a Word, and is not
followed by any character in a given set of wordChars
(default=printables).
|
|
ParseExpression
Abstract subclass of ParserElement, for combining and
post-processing parsed tokens.
|
|
And
Requires all given ParseExpressions to be found in the given order.
|
|
Or
Requires that at least one ParseExpression is found.
|
|
MatchFirst
Requires that at least one ParseExpression is found.
|
|
Each
Requires all given ParseExpressions to be found, but in any order.
|
|
ParseElementEnhance
Abstract subclass of ParserElement, for combining and
post-processing parsed tokens.
|
|
FollowedBy
Lookahead matching of the given parse expression.
|
|
NotAny
Lookahead to disallow matching with the given parse expression.
|
|
ZeroOrMore
Optional repetition of zero or more of the given expression.
|
|
OneOrMore
Repetition of one or more of the given expression.
|
|
_NullToken
|
|
Optional
Optional matching of the given expression.
|
|
SkipTo
Token for skipping over all undefined text until the matched
expression is found.
|
|
Forward
Forward declaration of an expression to be defined later - used for
recursive grammars, such as algebraic infix notation.
|
|
_ForwardNoRecurse
|
|
TokenConverter
Abstract subclass of ParseExpression, for converting parsed
results.
|
|
Upcase
Converter to upper case all matching tokens.
|
|
Combine
Converter to concatenate all matching tokens to a single string.
|
|
Group
Converter to return the matched tokens as a list - useful for
returning tokens of ZeroOrMore and OneOrMore expressions.
|
|
Dict
Converter to return a repetitive expression as a list, but also as
a dictionary.
|
|
Suppress
Converter for ignoring the results of a parsed expression.
|
|
OnlyOnce
Wrapper for parse actions, to ensure they are only called once.
|
|
_ustr(obj)
Drop-in replacement for str(obj) that tries to be Unicode friendly. |
source code
|
|
character
|
unichr(i)
Return a string of one character with ordinal i; 0 <= i < 256. |
|
|
|
|
|
|
|
col(loc,
strg)
Returns current column within a string, counting newlines as line
separators. |
source code
|
|
|
lineno(loc,
strg)
Returns current line number within a string, counting newlines as
line separators. |
source code
|
|
|
line(loc,
strg)
Returns the line of text containing loc within a string, counting
newlines as line separators. |
source code
|
|
|
_defaultStartDebugAction(instring,
loc,
expr) |
source code
|
|
|
_defaultSuccessDebugAction(instring,
startloc,
endloc,
expr,
toks) |
source code
|
|
|
_defaultExceptionDebugAction(instring,
loc,
expr,
exc) |
source code
|
|
|
nullDebugAction(*args)
'Do-nothing' debug action, to suppress debugging output during
parsing. |
source code
|
|
|
traceParseAction(f)
Decorator for debugging parse actions. |
source code
|
|
|
delimitedList(expr,
delim=' , ' ,
combine=False)
Helper to define a delimited list of expressions - the delimiter
defaults to ','. |
source code
|
|
|
|
|
|
|
matchPreviousLiteral(expr)
Helper to define an expression that is indirectly defined from the
tokens matched in a previous expression, that is, it looks for a
'repeat' of a previous expression. |
source code
|
|
|
matchPreviousExpr(expr)
Helper to define an expression that is indirectly defined from the
tokens matched in a previous expression, that is, it looks for a
'repeat' of a previous expression. |
source code
|
|
|
|
|
oneOf(strs,
caseless=False,
useRegex=True)
Helper to quickly define a set of alternative Literals, and makes
sure to do longest-first testing when there is a conflict, regardless
of the input order, but returns a MatchFirst for best performance. |
source code
|
|
|
dictOf(key,
value)
Helper to easily and clearly define a dictionary by specifying the
respective patterns for the key and value. |
source code
|
|
|
|
|
|
|
srange(s)
Helper to easily define string ranges for use in Word construction. |
source code
|
|
|
matchOnlyAtCol(n)
Helper method for defining parse actions that require matching at a
specific column in the input text. |
source code
|
|
|
|
|
|
|
upcaseTokens(s,
l,
t)
Helper parse action to convert tokens to upper case. |
source code
|
|
|
downcaseTokens(s,
l,
t)
Helper parse action to convert tokens to lower case. |
source code
|
|
|
keepOriginalText(s,
startLoc,
t)
Helper parse action to preserve original parsed text, overriding any
nested parse actions. |
source code
|
|
|
getTokensEndLoc()
Method to be called from within a parse action to determine the end
location of the parsed tokens. |
source code
|
|
|
_makeTags(tagStr,
xml)
Internal helper to construct opening and closing tag expressions,
given a tag name |
source code
|
|
|
makeHTMLTags(tagStr)
Helper to construct opening and closing tag expressions for HTML,
given a tag name |
source code
|
|
|
makeXMLTags(tagStr)
Helper to construct opening and closing tag expressions for XML,
given a tag name |
source code
|
|
|
withAttribute(*args,
**attrDict)
Helper to create a validating parse action to be used with start tags
created with makeXMLTags or makeHTMLTags. |
source code
|
|
|
operatorPrecedence(baseExpr,
opList)
Helper method for constructing grammars of expressions made up of
operators working in a precedence hierarchy. |
source code
|
|
|
nestedExpr(opener=' ( ' ,
closer=' ) ' ,
content=None,
ignoreExpr=quotedString using single or double quotes)
Helper method for defining nested lists enclosed in opening and
closing delimiters ("(" and ")" are the default). |
source code
|
|
|
indentedBlock(blockStatementExpr,
indentStack,
indent=True)
Helper method for defining space-delimited indentation blocks, such
as those used to define block statements in Python source code. |
source code
|
|
|
|
|
__doc__ = ...
|
|
__versionTime__ = ' 2 October 2008 00:44 '
|
|
_PY3K = False
|
|
_MAX_INT = 2147483647
|
|
alphas = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
|
|
nums = ' 0123456789 '
|
|
hexnums = ' 0123456789ABCDEFabcdef '
|
|
alphanums = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW ...
|
|
_bslash = ' \\ '
|
|
printables = ' 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL ...
|
|
_optionalNotMatched = _NullToken()
|
|
empty = empty
|
|
lineStart = lineStart
|
|
lineEnd = lineEnd
|
|
stringStart = stringStart
|
|
stringEnd = stringEnd
|
|
_escapedPunc = W:(\,\[]-...)
|
|
_printables_less_backslash = ' 0123456789abcdefghijklmnopqrstuv ...
|
|
_escapedHexChar = Combine:({Suppress:("\0x") W:(0123...)})
|
|
_escapedOctChar = Combine:({Suppress:("\") W:(0,0123...)})
|
|
_singleChar = {W:(\,\[]-...) | Combine:({Suppress:("\0x") W:(0...
|
|
_charRange = Group:({{W:(\,\[]-...) | Combine:({Suppress:("\0x...
|
|
_reBracketExpr = {"[" ["^"] Group:({{Group:({{W:(\,\[]-...) | ...
|
|
opAssoc = _Constants()
|
|
dblQuotedString = string enclosed in double quotes
|
|
sglQuotedString = string enclosed in single quotes
|
|
quotedString = quotedString using single or double quotes
|
|
unicodeString = Combine:({"u" quotedString using single or dou...
|
|
alphas8bit = u' ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîï ...
|
|
punc8bit = u' ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿×÷ '
|
|
commonHTMLEntity = Combine:({"&" Re:('gt|lt|amp|nbsp|quot') ";"})
|
|
_htmlEntityMap = { ' amp ' : ' & ' , ' gt ' : ' > ' , ' lt ' : ' < ' , ' nbsp ' : ' ...
|
|
cStyleComment = C style comment
|
|
htmlComment = Re:('<!--[\\s\\S]*?-->')
|
|
restOfLine = Re:('.*')
|
|
dblSlashComment = // comment
|
|
cppStyleComment = C++ style comment
|
|
javaStyleComment = C++ style comment
|
|
pythonStyleComment = Python style comment
|
|
_noncomma = ' 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLM ...
|
|
_commasepitem = commaItem
|
|
commaSeparatedList = commaSeparatedList
|
|
anyCloseTag = </W:(abcd...,abcd...)>
|
|
anyOpenTag = <W:(abcd...,abcd...)>
|
|
c = ' ~ '
|