For the formal definition of XPath, see The XPath 1.0 Recommendation. More gentle introductions the ZVON XPath Tutorial, and ProSolutions XPath tutorial.
4XPath can be used to either evaluate an XPath expression on the fly or to create a reusable pre-parsed object. The second approach is useful if you evaluate the same expression over and over again. You always need a context in order to evaluate an expression. This is at the minimum a DOM node. You can use any DOM implementation that conforms to the Python binding for DOM Level 2. Examples are 4DOM, minidom, cDomlette or pDomlette. The latter two have some specializations that 4XPath can take advantage of for optimization.
In the following discussions, we'll use the following two XML documents (set up as Python strings). The second is simply the first with all the elements in a namespace.
xmlstring = """<employees> <employee id="100">Memnon</employee> <employee id="101">Emathion</employee> <employee id="102">Castor</employee> <employee id="103">Polydeuces</employee> </employees> """ nsxmlstring = """<employees xmlns='http://spam.com/foo'> <employee id="100">Memnon</employee> <employee id="101">Emathion</employee> <employee id="102">Castor</employee> <employee id="103">Polydeuces</employee> </employees> """
To use a one-off XPath expression just start by importing the 4XPath module
from xml import xpath
Then set up the DOM document to be used as the context.
from Ft.Lib.pDomlette import PyExpatReader reader = PyExpatReader() doc = reader.fromString(xmlstring)
Now just evaluate the XPaths you want.
result = xpath.Evaluate('/*', contextNode=doc)
Which basically selects the document element. Note that the return value of this XPath is a node set, which is implemented as a Python list of DOM nodes. For instance "print result" would display
[<Domlette Element Node at 820fc74: name='employees' with 0 attributes and 9 children>]
Or similar (the "820fc74" represents the memory location of the Python object and will almost certainly be different for you). The following are the mappings from XPath object types to the Python objects returned by evaluation
So, if you follow the following examples
>>> print repr(xpath.Evaluate('/employees/employee', contextNode=doc)) [<Domlette Element Node at 8217784: name='employee' with 1 attributes and 1 children>, <Domlette Element Node at 81ddbf4: name='employee' with 1 attributes and 1 children>, <Domlette Element Node at 810c234: name='employee' with 1 attributes and 1 children>, <Domlette Element Node at 82131c4: name='employee' with 1 attributes and 1 children>] >>> print repr(xpath.Evaluate('/employees/employee[1]/@id', contextNode=doc)) [<Domlette Attribute Node at 82190fc: name='id', value='100'>] >>> print repr(xpath.Evaluate('string(/employees/employee[1]/@id)', contextNode=doc)) u'100' >>> print repr(xpath.Evaluate('number(/employees/employee[1]/@id)', contextNode=doc)) 100.0 >>> print repr(xpath.Evaluate('/employees/employee[1]/@id = "100"', contextNode=doc)) 1
Note the u"100" returned by one of the expressions. This is the new unicode type introduced in Python 2.0. In Python 1.5.2, the return value would be a simple string objec: "100".
In our examples so far, we've set the context node directly. This actually creates a context with the given node, a context list size of 1 and position of 1; no namespace mappings will be defined. But this isn't always what we want to do. Sometimes we want to specify the other context elements. The most common reason for this is to set up a namespace mapping. For instance, if we try to do the above processing on the nsxmlstring document:
nsdoc = reader.fromString(nsxmlstring) print xpath.Evaluate('/employees/employee', contextNode=nsdoc)
We get an empty node-set. If you remember your XPath and think carefully, you'll see why. The node test "employees" strictly matches an element node with no namespace. Since the employee element in the nsxmlstring document is actually in the "http://spam.com/foo" namespace, the node test fails. The solution is to use a namespace prefix in the XPath expression which is mapped to the right namespace. Remember that this is so even though we don't use a namespace prefix in the nsxmlstring document.
So how do we set up the namespace mappings we need? This is where the ability to set the full context comes in. We can create a 4XPath Context object with the node we want and the namespace mappings we want
from xml.xpath.Context import Context con = Context(nsdoc, processorNss={'x', 'http://spam.com/foo'})
The first argument is the context node. Then we specify a keyword argument "processorNss" which is a dictionary with the prefixes to map as keys and the namespace URIs as values. Note: don't try to set an empty string as a prefix: this is illegal. Now we can use the context object we created.
print xpath.Evaluate('/x:employees/x:employee', context=con)
And we get what we expect: a node set with four entries.
If your usage pattern is more along the lines of repeated evaluation of a particular expression against different contexts or documents, you probably want to parse the expression for the sake of performance.
from xml import xpath from xml.xpath.Context import Context from Ft.Lib.pDomlette import PyExpatReader reader = PyExpatReader() doc = reader.fromString(xmlstring) expr = xpath.Compile('/employees/employee')
To evaluate it, we are required to have a full context object, not just a bare node. Then we can use the "evaluate" method of the parsed expression object, passing in the context:
con = Context(nsdoc) expr.evaluate(con)
Note: you needn't bother with this if you are using pDomlette or cDomlette.
Many XPath constructs require a sorting of nodes according to XML document order. This can be an expensive operation if the DOM implementation is not already primed for this, so XPath allows users of such implementations to pre-index documents for faster sorting. To do so:
from xml.xpath import Util ... Util.IndexDocument(document_node) ...XPath operations... Util.FreeDocumentIndex(document_node)
Do be sure to free the index to avoid memory leaks. Also note that it's a bad idea to mutate any node in the document while it is indexed.
4XPath core module: provides the basic API for 4XPath
Module Summary
Global Function Summary | |
Evaluate
|
Evaluates an XPath expression |
Compile
|
Compile an XPath expression for quicker evaluation. |
RegisterExtensionModules
|
Register XPath extension functions contained in Python modules. |
Global Function Details |
Evaluate(expr, contextNode, context)
Evaluates an XPath expression
ParametersReturn Value
expr
of type string
XPath expression to be evaluated
contextNode
of type Python DOM binding node object
The context node, which will be used as the sole entry in the context node list. If None, the context argument must be an xml.xpath.Context object. Defaults to None
context
of type xml.xpath.Context
The user-specified context. If None, the contextNode argument must be a valid DOM node. Defaults to None
Throws
- Ft.Lib.boolean, float, string or list of DOM nodes
The result of the XPath expresssion evaluation.
Compile(expr)
Compile an XPath expression for quicker evaluation.
ParametersReturn Value
expr
of type string
XPath expression to be compiled
Throws
- xml.xpath.ParsedExpr
A pre-compiled XPath expression object
RegisterExtensionModules(moduleList)
Register XPath extension functions contained in Python modules.
ParametersReturn Value
moduleList
of type list of strings each representing a fully-qualified module name. Each module must follow the 4XPath extension protocol.
Each module is imported, and any XPath extension functions conteined therein are available to the 4XPath run-time.
None
XPath context
Module Summary
Class Summary | |
Context
|
Represents the context used for XPath processing at any given point |
Represents the context used for XPath processing at any given point
Attribute Summary | |
node
|
The context node, as used for computing XPath expressions |
position
|
The context node's position in the context node list, as returned by the XPath position() function |
size
|
The size of the context node list |
varBindings
|
Maps variable and parameters by expanded name to the value of the variable |
processorNss
|
provides expansion from namespace prefixes to uris for expanded names in name tests, variable names, etc. |
Method Summary | |
|
|
nss
|
Get a dictionary representing namespace nodes defined at the context node |
Method Details |
__init__(node, position, size, varBindings, processorNss)
ParametersReturn Value
node
of type Python DOM binding node object
The context node, as used for computing XPath expressions
position
of type positive integer
The context node's position in the context node list, as returned by the XPath position() function
size
of type positive integer
The size of the context node list
varBindings
of type dictionary with keys a tuple of two strings and value a string, integer, BooleanType or node set (list of nodes)
Maps variable and parameters by expanded name to the value of the variable. Defaults to an empty dictionary.
processorNss
of type dictionary with string key and value
provides expansion from namespace prefixes to uris for expanded names in name tests, variable names, etc. Defaults to an empty dictionary.
None
nss()
Get a dictionary representing namespace nodes defined at the context node
ParametersNoneReturn Value
- dictionary with string key and string value
Maps prefixes to namespace URIs