au.id.jericho.lib.html
Class HTMLElements

java.lang.Object
  extended by HTMLElements
All Implemented Interfaces:
HTMLElementName

public final class HTMLElements
extends java.lang.Object
implements HTMLElementName

Contains static methods which group HTML element names by the characteristics of their associated elements.

An HTML element is a normal element with a name that matches one of the HTML element names (ignoring case). This type of element spans the logical HTML element as described in the HTML 4.01 specification section 3.2.1, which may be implicitly terminated if it specifies an optional end tag.

The term Non-HTML element refers to a normal element with a name that does not match one of the HTML element names. This type of element must be either a single tag element or explicitly terminated.

All of the sets returned by the methods in this class may be modified to customise the behaviour of the parser. Care must be taken however to ensure that the sets only contain tag names in lower case.

Below is a table summarising the default characteristics of each HTML element. See also the index of elements in the HTML 4.01 specification for the official table containing similar information.

NameBox TypeStart TagEnd TagNestDepr.Description / Specification
AInline RequiredNF anchor
ABBRInline Required  abbreviated form (e.g., WWW, HTTP, etc.)
ACRONYMInline Required  acronym
ADDRESSBlock RequiredNF information on author
APPLETInline RequiredNFDJava applet
AREA  ForbiddenNF client-side image map area
BInline Required  bold text style
BASE  ForbiddenNF document base URI
BASEFONTInline ForbiddenNFDbase font size
BDOInline Required  I18N BiDi over-ride
BIGInline Required  large text style
BLOCKQUOTEBlock Required  long quotation
BODY OptionalOptional (details)NF document body
BRInline ForbiddenNF forced line break
BUTTONInline RequiredNF push button
CAPTION  RequiredNF table caption
CENTERBlock Required Dshorthand for DIV align=center
CITEInline Required  citation
CODEInline Required  computer code fragment
COL  ForbiddenNF table column
COLGROUP  Optional (details)NF table column group
DD  Optional (details)  definition description
DELInline Required  deleted text
DFNInline Required  instance definition
DIRBlock Required Ddirectory list
DIVBlock Required  generic language/style container
DLBlock Required  definition list
DT  Optional (details)  definition term
EMInline Required  emphasis
FIELDSETBlock Required  form control group
FONTInline Required Dlocal change to font
FORMBlock RequiredNF interactive form
FRAME  ForbiddenNF subwindow
FRAMESET  Required  window subdivision
H1Block Required  heading
H2Block Required  heading
H3Block Required  heading
H4Block Required  heading
H5Block Required  heading
H6Block Required  heading
HEAD OptionalOptional (details)NF document head
HRBlock ForbiddenNF horizontal rule
HTML OptionalOptional (details)NF document root element
IInline Required  italic text style
IFRAMEInline RequiredNF inline subwindow
IMGInline ForbiddenNF Embedded image
INPUTInline ForbiddenNF form control
INSInline Required  inserted text
ISINDEXBlock ForbiddenNFDsingle line prompt
KBDInline Required  text to be entered by the user
LABELInline RequiredNF form field label text
LEGEND  RequiredNF fieldset legend
LI  Optional (details)  list item
LINK  ForbiddenNF a media-independent link
MAPInline Required  client-side image map
MENUBlock Required Dmenu list
META  ForbiddenNF generic metainformation
NOFRAMESBlock Required  alternate content container for non frame-based rendering
NOSCRIPTBlock Required  alternate content container for non script-based rendering
OBJECTInline Required  generic embedded object
OLBlock Required  ordered list
OPTGROUP  RequiredNF option group
OPTION  Optional (details)NF selectable choice
PBlock Optional (details)NF paragraph
PARAM  ForbiddenNF named property value
PREBlock Required  preformatted text
QInline Required  short inline quotation
SInline Required Dstrike-through text style
SAMPInline Required  sample program output, scripts, etc.
SCRIPTInline RequiredNF script statements
SELECTInline RequiredNF option selector
SMALLInline Required  small text style
SPANInline Required  generic language/style container
STRIKEInline Required Dstrike-through text
STRONGInline Required  strong emphasis
STYLE  RequiredNF style info
SUBInline Required  subscript
SUPInline Required  superscript
TABLEBlock Required  table
TBODY OptionalOptional (details)  table body
TD  Optional (details)  table data cell
TEXTAREAInline RequiredNF multi-line text field
TFOOT  Optional (details)  table footer
TH  Optional (details)  table header cell
THEAD  Optional (details)  table header
TITLE  RequiredNF document title
TR  Optional (details)  table row
TTInline Required  teletype or monospaced text style
UInline Required Dunderlined text style
ULBlock Required  unordered list
VARInline Required  instance of a variable or program argument

See Also:
HTMLElementName, Element

Field Summary
 
Fields inherited from interface HTMLElementName
A, ABBR, ACRONYM, ADDRESS, APPLET, AREA, B, BASE, BASEFONT, BDO, BIG, BLOCKQUOTE, BODY, BR, BUTTON, CAPTION, CENTER, CITE, CODE, COL, COLGROUP, DD, DEL, DFN, DIR, DIV, DL, DT, EM, FIELDSET, FONT, FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, HEAD, HR, HTML, I, IFRAME, IMG, INPUT, INS, ISINDEX, KBD, LABEL, LEGEND, LI, LINK, MAP, MENU, META, NOFRAMES, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, P, PARAM, PRE, Q, S, SAMP, SCRIPT, SELECT, SMALL, SPAN, STRIKE, STRONG, STYLE, SUB, SUP, TABLE, TBODY, TD, TEXTAREA, TFOOT, TH, THEAD, TITLE, TR, TT, U, UL, VAR
 
Method Summary
static java.util.Set getBlockLevelElementNames()
          Returns a set containing the names of all the block-level elements.
static java.util.Set getDeprecatedElementNames()
          Returns a set containing the names of all deprecated elements in HTML 4.01.
static java.util.List getElementNames()
          Returns a list containing all of the HTML element names.
static java.util.Set getEndTagForbiddenElementNames()
          Returns a set containing the names of all of the HTML elements for which the end tag is forbidden.
static java.util.Set getEndTagOptionalElementNames()
          Returns a set containing the names of all of the HTML elements for which the end tag is optional.
static java.util.Set getEndTagRequiredElementNames()
          Returns a set containing the names of all of the HTML elements for which the end tag is required.
static java.util.Set getInlineLevelElementNames()
          Returns a set containing the names of all the inline-level elements.
static java.util.Set getNestingForbiddenElementNames()
          Returns a set containing the names of all of the HTML elements which should never contain elements of the same name, either as direct or indirect descendants.
static java.util.Set getNonterminatingElementNames(java.lang.String endTagOptionalElementName)
          Returns the names of elements that do NOT implicitly terminate an HTML element with the specified name.
static java.util.Set getStartTagOptionalElementNames()
          Returns a set containing the names of all of the HTML elements for which the start tag is optional.
static java.util.Set getTerminatingEndTagNames(java.lang.String endTagOptionalElementName)
          Returns the names of end tags that implicitly terminate an HTML element with the specified name.
static java.util.Set getTerminatingStartTagNames(java.lang.String endTagOptionalElementName)
          Returns the names of start tags that implicitly terminate an HTML element with the specified name.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getElementNames

public static final java.util.List getElementNames()
Returns a list containing all of the HTML element names.

The returned list is in alphabetical order.

Returns:
a list containing of all the HTML element names.

getBlockLevelElementNames

public static java.util.Set getBlockLevelElementNames()
Returns a set containing the names of all the block-level elements.

The element names contained in this set are:
ADDRESS, BLOCKQUOTE, CENTER, DIR, DIV, DL, FIELDSET, FORM, H1, H2, H3, H4, H5, H6, HR, ISINDEX, MENU, NOFRAMES, NOSCRIPT, OL, P, PRE, TABLE, UL

This set is defined in the HTML 4.01 Transitional DTD, but more detailed information can be found in the HTML 4.01 specification section 7.5.3 - Block-level and inline elements and the CSS2 specification section 9.2.1 - Block-level elements and block boxes.

The CSS2 display property can be used to override the normal box type of an element.

Returns:
a set containing the names of all the block-level elements.
See Also:
getInlineLevelElementNames()

getInlineLevelElementNames

public static java.util.Set getInlineLevelElementNames()
Returns a set containing the names of all the inline-level elements.

The element names contained in this set are:
A, ABBR, ACRONYM, APPLET, B, BASEFONT, BDO, BIG, BR, BUTTON, CITE, CODE, DEL, DFN, EM, FONT, I, IFRAME, IMG, INPUT, INS, KBD, LABEL, MAP, OBJECT, Q, S, SAMP, SCRIPT, SELECT, SMALL, SPAN, STRIKE, STRONG, SUB, SUP, TEXTAREA, TT, U, VAR

This set is defined in the HTML 4.01 Transitional DTD, but more detailed information can be found in the HTML 4.01 specification section 7.5.3 - Block-level and inline elements and the CSS2 specification section 9.2.2 - Inline-level elements and inline boxes.

The CSS2 display property can be used to override the normal box type of an element.

The HTML Document Type Definitions forbid the presence of block-level elements inside inline-level elements, but it is tolerated by all popular browsers in various situations, even in XHTML documents. The most notorious example of this is the common inclusion of block-level elements inside FONT elements.

Returns:
a set containing the names of all the inline-level elements.
See Also:
getBlockLevelElementNames()

getDeprecatedElementNames

public static java.util.Set getDeprecatedElementNames()
Returns a set containing the names of all deprecated elements in HTML 4.01.

Returns:
a set containing the names of all deprecated elements in HTML 4.01.

getEndTagForbiddenElementNames

public static java.util.Set getEndTagForbiddenElementNames()
Returns a set containing the names of all of the HTML elements for which the end tag is forbidden.

See the element parsing rules for HTML elements with forbidden end tags for more information.

The index of elements in the HTML 4.01 specification includes the letter 'F' in the "End Tag" column for elements whose end tag is forbidden.

Returns:
a set containing the names of all of the HTML elements for which the end tag is forbidden.
See Also:
getEndTagOptionalElementNames(), getEndTagRequiredElementNames()

getEndTagOptionalElementNames

public static java.util.Set getEndTagOptionalElementNames()
Returns a set containing the names of all of the HTML elements for which the end tag is optional.

Elements with these names may be implicitly terminated by a subsequent terminating start tag or terminating end tag. A list of the these terminating tags, and the names of non-terminating elements that can be nested within the element, can be found in the documentation of each relevant element in the HTMLElementName class.

See the element parsing rules for HTML elements with optional end tags for more information.

The index of elements in the HTML 4.01 specification includes the letter 'O' in the "End Tag" column for elements whose end tag is optional.

Returns:
a set containing the names of all of the HTML elements for which the end tag is optional.
See Also:
getEndTagForbiddenElementNames(), getEndTagRequiredElementNames()

getEndTagRequiredElementNames

public static java.util.Set getEndTagRequiredElementNames()
Returns a set containing the names of all of the HTML elements for which the end tag is required.

See the element parsing rules for HTML elements with required end tags for more information.

The index of elements in the HTML 4.01 specification leaves the "End Tag" column blank for elements whose end tag is required.

Returns:
a set containing the names of all of the HTML elements for which the end tag is required.
See Also:
getEndTagForbiddenElementNames(), getEndTagOptionalElementNames()

getStartTagOptionalElementNames

public static java.util.Set getStartTagOptionalElementNames()
Returns a set containing the names of all of the HTML elements for which the start tag is optional.

Elements with optional start tags must be present in the document object model (DOM) in certain locations, either forming part of the structure of the HTML document as a whole (e.g. the HTML, HEAD, and BODY elements), or forming part of the structure of a TABLE element (e.g. the TBODY element). The location of an omitted start tag in the document's object model can be inferred from the surrounding elements.

This library does not use this property in any way when parsing documents, and does not construct a document object model from the source, so no implied element is created where an optional start tag is omitted.

When the start tag has been omitted in the document text, the corresponding end tag should also be omitted.

The index of elements in the HTML 4.01 specification includes the letter 'O' in the "Start Tag" column for elements whose start tag is optional.

Returns:
a set containing the names of all of the HTML elements for which the start tag is optional.

getTerminatingStartTagNames

public static java.util.Set getTerminatingStartTagNames(java.lang.String endTagOptionalElementName)
Returns the names of start tags that implicitly terminate an HTML element with the specified name.

This method is only relevant to HTML elements for which the end tag is optional. It returns null if
getEndTagOptionalElementNames().contains(endTagOptionalElementName.toLowerCase())==null.

Parameters:
endTagOptionalElementName - the name of an element for which the end tag is optional.
Returns:
the names of start tags that implicitly terminate an HTML element with the specified name, or null if the name does not identify an element for which the end tag is optional.
See Also:
getTerminatingEndTagNames(String endTagOptionalElementName), getNonterminatingElementNames(String endTagOptionalElementName)

getTerminatingEndTagNames

public static java.util.Set getTerminatingEndTagNames(java.lang.String endTagOptionalElementName)
Returns the names of end tags that implicitly terminate an HTML element with the specified name.

This method is only relevant to HTML elements for which the end tag is optional. It returns null if
getEndTagOptionalElementNames().contains(endTagOptionalElementName.toLowerCase())==null.

Note that removing the tag name matching the specified element has no effect on the behaviour of the parser, as it is always assumed that a start tag is terminated by an end tag with a matching name.

Parameters:
endTagOptionalElementName - the name of an element for which the end tag is optional.
Returns:
the names of end tags that implicitly terminate an HTML element with the specified name, or null if the name does not identify an element for which the end tag is optional.
See Also:
getTerminatingStartTagNames(String endTagOptionalElementName), getNonterminatingElementNames(String endTagOptionalElementName)

getNonterminatingElementNames

public static java.util.Set getNonterminatingElementNames(java.lang.String endTagOptionalElementName)
Returns the names of elements that do NOT implicitly terminate an HTML element with the specified name. Neither can any tag nested inside any of these elements implicitly terminate the specified element, even if it is listed as one of the terminating start tags or terminating end tags.

This method is only relevant to HTML elements for which the end tag is optional. It returns null if
getEndTagOptionalElementNames().contains(endTagOptionalElementName.toLowerCase())==null.

Parameters:
endTagOptionalElementName - the name of an element for which the end tag is optional.
Returns:
the names of elements that do NOT implicitly terminate an HTML element with the specified name, or null if the name does not identify an element for which the end tag is optional.
See Also:
getTerminatingStartTagNames(String endTagOptionalElementName), getTerminatingEndTagNames(String endTagOptionalElementName)

getNestingForbiddenElementNames

public static java.util.Set getNestingForbiddenElementNames()
Returns a set containing the names of all of the HTML elements which should never contain elements of the same name, either as direct or indirect descendants.

Returns:
a set containing the names of all of the HTML elements which should never contain elements of the same name.