Home | Trees | Indices | Help |
|
---|
|
|
|||
UnitClass The class of units that will be instantiated and used by this class |
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
Inherited from Inherited from Inherited from Inherited from Inherited from Inherited from |
|
|||
Inherited from |
|
|||
MARKINGTAGS =
Text in these tags that will be extracted from the HTML document |
|||
MARKINGATTRS =
Text from tags with these attributes will be extracted from the HTML document |
|||
INCLUDEATTRS =
Text from these attributes are extracted |
|||
SELF_CLOSING_TAGS =
HTML self-closing tags. |
|||
ENCODING_RE = re.compile(r'
|
|||
Inherited from Inherited from Inherited from Inherited from |
|
|||
Inherited from |
|
Initialize and reset this instance.
|
Returns the encoding of the html text. We look for 'charset=' within a meta tag to do this. |
Replaces all instances of process instruction with placeholders, and returns the new text and a dictionary of tags. The current implementation replaces <?foo?> with <?md5(foo)?>. The hash => code conversions are stored in self.pidict for later use in restoring the real PHP. The purpose of this is to remove all potential "tag-like" code from inside PHP. The hash looks nothing like an HTML tag, but the following PHP: $a < $b ? $c : ($d > $e ? $f : $g) looks like it contains an HTML tag: < $b ? $c : ($d > to nearly any regex. Hence, we replace all contents of PHP with simple strings to help our regexes out. |
parser to process the given source string
|
|
|
|
|
Handle entries in the form &#NNNN; e.g. ⃡
|
Handle named entities of the form &aaaa; e.g. ’
|
|
|
|
MARKINGTAGSText in these tags that will be extracted from the HTML document
|
SELF_CLOSING_TAGSHTML self-closing tags. Tags that should be specified as <img /> but might be <img>. Reference
|
ENCODING_RE
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Fri Nov 19 17:46:54 2010 | http://epydoc.sourceforge.net |