The elementtree.HTMLTreeBuilder Module

Tools to build element trees from HTML files.

Module Contents

HTMLTreeBuilder(builder=None) (class) [#]

ElementTree builder for HTML source code.

For more information about this class, see The HTMLTreeBuilder Class.

parse(source) [#]

Parse an HTML document or document fragment.

source
A filename or file object containing HTML data.
Returns:
An ElementTree instance

TreeBuilder (variable) [#]

An alias for the HTMLTreeBuilder class.

The HTMLTreeBuilder Class

HTMLTreeBuilder(builder=None) (class) [#]

ElementTree builder for HTML source code. This builder converts an HTML document or fragment to an ElementTree.

The parser is relatively picky, and requires balanced tags for most elements. However, elements belonging to the following group are automatically closed: P, LI, TR, TH, and TD. In addition, the parser automatically inserts end tags immediately after the start tag, and ignores any end tags for the following group: IMG, HR, META, and LINK.

close() [#]

Flush parser buffers, and return the root element.

Returns:
An Element instance.