Package translate :: Package storage :: Module html :: Class htmlfile
[hide private]
[frames] | no frames]

Class htmlfile

source code


Nested Classes [hide private]
  UnitClass
The class of units that will be instantiated and used by this class
Instance Methods [hide private]
 
__init__(self, includeuntaggeddata=None, inputfile=None, callback=None)
Initialize and reset this instance.
source code
 
_simple_callback(self, string) source code
 
guess_encoding(self, htmlsrc)
Returns the encoding of the html text.
source code
 
do_encoding(self, htmlsrc)
Return the html text properly encoded based on a charset.
source code
 
pi_escape(self, text)
Replaces all instances of process instruction with placeholders, and returns the new text and a dictionary of tags.
source code
 
pi_unescape(self, text)
Replaces the PHP placeholders in text with the real code
source code
 
parse(self, htmlsrc)
parser to process the given source string
source code
 
addhtmlblock(self, text) source code
 
has_translatable_content(self, text)
Check if the supplied HTML snippet has any content that needs to be translated.
source code
 
buildtag(self, tag, attrs=None, startend=False)
Create an HTML tag
source code
 
startblock(self, tag, attrs=None) source code
 
endblock(self) source code
 
handle_starttag(self, tag, attrs) source code
 
handle_startendtag(self, tag, attrs) source code
 
handle_endtag(self, tag) source code
 
handle_data(self, data) source code
 
handle_charref(self, name)
Handle entries in the form &#NNNN; e.g.
source code
 
handle_entityref(self, name)
Handle named entities of the form &aaaa; e.g.
source code
 
handle_comment(self, data) source code
 
handle_pi(self, data) source code

Inherited from HTMLParser.HTMLParser: check_for_whole_start_tag, clear_cdata_mode, close, error, feed, get_starttag_text, goahead, handle_decl, parse_endtag, parse_pi, parse_starttag, reset, set_cdata_mode, unescape, unknown_decl

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_declaration, parse_marked_section, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name

Inherited from base.TranslationStore: __getstate__, __setstate__, __str__, add_unit_to_index, addsourceunit, addunit, detect_encoding, findid, findunit, findunits, getids, getprojectstyle, getsourcelanguage, gettargetlanguage, getunits, isempty, makeindex, remove_unit_from_index, require_index, save, savefile, setprojectstyle, setsourcelanguage, settargetlanguage, translate, unit_iter

Inherited from base.TranslationStore (private): _assignname

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__

Class Methods [hide private]

Inherited from base.TranslationStore: parsefile, parsestring

Class Variables [hide private]
  MARKINGTAGS = ['p', 'title', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6...
Text in these tags that will be extracted from the HTML document
  MARKINGATTRS = []
Text from tags with these attributes will be extracted from the HTML document
  INCLUDEATTRS = ['alt', 'summary', 'standby', 'abbr', 'content']
Text from these attributes are extracted
  SELF_CLOSING_TAGS = [u'area', u'base', u'basefont', u'br', u'c...
HTML self-closing tags.
  ENCODING_RE = re.compile(r'(?ix)<meta.*content.*=.*?charset.*?...

Inherited from HTMLParser.HTMLParser: CDATA_CONTENT_ELEMENTS

Inherited from markupbase.ParserBase (private): _decl_otherchars

Inherited from base.TranslationStore: Extensions, Mimetypes, Name, suggestions_in_format

Inherited from base.TranslationStore (private): _binary

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, includeuntaggeddata=None, inputfile=None, callback=None)
(Constructor)

source code 

Initialize and reset this instance.

Overrides: object.__init__
(inherited documentation)

guess_encoding(self, htmlsrc)

source code 

Returns the encoding of the html text.

We look for 'charset=' within a meta tag to do this.

pi_escape(self, text)

source code 

Replaces all instances of process instruction with placeholders, and returns the new text and a dictionary of tags. The current implementation replaces <?foo?> with <?md5(foo)?>. The hash => code conversions are stored in self.pidict for later use in restoring the real PHP.

The purpose of this is to remove all potential "tag-like" code from inside PHP. The hash looks nothing like an HTML tag, but the following PHP:

 $a < $b ? $c : ($d > $e ? $f : $g)

looks like it contains an HTML tag:

 < $b ? $c : ($d >

to nearly any regex. Hence, we replace all contents of PHP with simple strings to help our regexes out.

parse(self, htmlsrc)

source code 

parser to process the given source string

Overrides: base.TranslationStore.parse
(inherited documentation)

handle_starttag(self, tag, attrs)

source code 
Overrides: HTMLParser.HTMLParser.handle_starttag

handle_startendtag(self, tag, attrs)

source code 
Overrides: HTMLParser.HTMLParser.handle_startendtag

handle_endtag(self, tag)

source code 
Overrides: HTMLParser.HTMLParser.handle_endtag

handle_data(self, data)

source code 
Overrides: HTMLParser.HTMLParser.handle_data

handle_charref(self, name)

source code 

Handle entries in the form &#NNNN; e.g. &#8417;

Overrides: HTMLParser.HTMLParser.handle_charref

handle_entityref(self, name)

source code 

Handle named entities of the form &aaaa; e.g. &rsquo;

Overrides: HTMLParser.HTMLParser.handle_entityref

handle_comment(self, data)

source code 
Overrides: HTMLParser.HTMLParser.handle_comment

handle_pi(self, data)

source code 
Overrides: HTMLParser.HTMLParser.handle_pi

Class Variable Details [hide private]

MARKINGTAGS

Text in these tags that will be extracted from the HTML document

Value:
['p',
 'title',
 'h1',
 'h2',
 'h3',
 'h4',
 'h5',
 'h6',
...

SELF_CLOSING_TAGS

HTML self-closing tags. Tags that should be specified as <img /> but might be <img>. Reference

Value:
[u'area',
 u'base',
 u'basefont',
 u'br',
 u'col',
 u'frame',
 u'hr',
 u'img',
...

ENCODING_RE

Value:
re.compile(r'(?ix)<meta.*content.*=.*?charset.*?=\s*?([^\s]*)\s*?["\']\
\s*?>')