au.id.jericho.lib.html
Class SourceFormatter

java.lang.Object
  extended by SourceFormatter
All Implemented Interfaces:
CharStreamSource

public final class SourceFormatter
extends java.lang.Object
implements CharStreamSource

Formats HTML source by laying out each non-inline-level element on a new line with an appropriate indent.

Any indentation present in the original source text is removed.

Use one of the following methods to obtain the output:

The output text is functionally equivalent to the original source and should be rendered identically unless specified below.

The following points describe the process in general terms. Any aspect of the algorithm not specifically mentioned here is subject to change without notice in future versions.

Formatting an entire Source object performs a full sequential parse automatically.


Constructor Summary
SourceFormatter(Segment segment)
          Constructs a new SourceFormatter based on the specified Segment.
 
Method Summary
 boolean getCollapseWhiteSpace()
          Indicates whether white space in the text between the tags is to be collapsed.
 long getEstimatedMaximumOutputLength()
          Returns the estimated maximum number of characters in the output, or -1 if no estimate is available.
 boolean getIndentAllElements()
          Indicates whether all elements are to be indented, including inline-level elements and those with preformatted contents.
 java.lang.String getIndentString()
          Returns the string to be used for indentation.
 java.lang.String getNewLine()
          Returns the string to be used to represent a newline in the output.
 boolean getTidyTags()
          Indicates whether the original text of each tag is to be replaced with the output from its Tag.tidy() method.
 SourceFormatter setCollapseWhiteSpace(boolean collapseWhiteSpace)
          Sets whether white space in the text between the tags is to be collapsed.
 SourceFormatter setIndentAllElements(boolean indentAllElements)
          Sets whether all elements are to be indented, including inline-level elements and those with preformatted contents.
 SourceFormatter setIndentString(java.lang.String indentString)
          Sets the string to be used for indentation.
 SourceFormatter setNewLine(java.lang.String newLine)
          Sets the string to be used to represent a newline in the output.
 SourceFormatter setTidyTags(boolean tidyTags)
          Sets whether the original text of each tag is to be replaced with the output from its Tag.tidy() method.
 java.lang.String toString()
          Returns the output as a string.
 void writeTo(java.io.Writer writer)
          Writes the output to the specified Writer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SourceFormatter

public SourceFormatter(Segment segment)
Constructs a new SourceFormatter based on the specified Segment.

Parameters:
segment - the segment containing the HTML to be formatted.
See Also:
Source.getSourceFormatter()
Method Detail

writeTo

public void writeTo(java.io.Writer writer)
             throws java.io.IOException
Description copied from interface: CharStreamSource
Writes the output to the specified Writer.

Specified by:
writeTo in interface CharStreamSource
Parameters:
writer - the destination java.io.Writer for the output.
Throws:
java.io.IOException - if an I/O exception occurs.

getEstimatedMaximumOutputLength

public long getEstimatedMaximumOutputLength()
Description copied from interface: CharStreamSource
Returns the estimated maximum number of characters in the output, or -1 if no estimate is available.

The returned value should be used as a guide for efficiency purposes only, for example to set an initial StringBuffer capacity. There is no guarantee that the length of the output is indeed less than this value, as classes implementing this method often use assumptions based on typical usage to calculate the estimate.

Although implementations of this method should never return a value less than -1, users of this method must not assume that this will always be the case. Standard practice is to interpret any negative value as meaning that no estimate is available.

Specified by:
getEstimatedMaximumOutputLength in interface CharStreamSource
Returns:
the estimated maximum number of characters in the output, or -1 if no estimate is available.

toString

public java.lang.String toString()
Description copied from interface: CharStreamSource
Returns the output as a string.

Specified by:
toString in interface CharStreamSource
Overrides:
toString in class java.lang.Object
Returns:
the output as a string.

setIndentString

public SourceFormatter setIndentString(java.lang.String indentString)
Sets the string to be used for indentation.

The default value is a string containing a single tab character (U+0009).

The most commonly used indent strings are "\t" (single tab), " " (single space), "  " (2 spaces), and "    " (4 spaces).

Parameters:
indentString - the string to be used for indentation, must not be null.
Returns:
this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement.
See Also:
getIndentString()

getIndentString

public java.lang.String getIndentString()
Returns the string to be used for indentation.

See the setIndentString(String) method for a full description of this property.

Returns:
the string to be used for indentation.

setTidyTags

public SourceFormatter setTidyTags(boolean tidyTags)
Sets whether the original text of each tag is to be replaced with the output from its Tag.tidy() method.

The default value is false.

If this property is set to false, the tag from the original text is used, including all white space, but with any new lines indented at a depth one greater than that of the element.

Parameters:
tidyTags - specifies whether the original text of each tag is to be replaced with the output from its Tag.tidy() method.
Returns:
this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement.
See Also:
getTidyTags()

getTidyTags

public boolean getTidyTags()
Indicates whether the original text of each tag is to be replaced with the output from its Tag.tidy() method.

See the setTidyTags(boolean) method for a full description of this property.

Returns:
true if the original text of each tag is to be replaced with the output from its Tag.tidy() method, otherwise false.

setCollapseWhiteSpace

public SourceFormatter setCollapseWhiteSpace(boolean collapseWhiteSpace)
Sets whether white space in the text between the tags is to be collapsed.

The default value is false.

If this property is set to true, every string of one or more white space characters located outside of a tag is replaced with a single space in the output. White space located adjacent to a non-inline-level element tag (except server tags) may be removed.

Parameters:
collapseWhiteSpace - specifies whether white space in the text between the tags is to be collapsed.
Returns:
this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement.
See Also:
getCollapseWhiteSpace()

getCollapseWhiteSpace

public boolean getCollapseWhiteSpace()
Indicates whether white space in the text between the tags is to be collapsed.

See the setCollapseWhiteSpace(boolean collapseWhiteSpace) method for a full description of this property.

Returns:
true if white space in the text between the tags is to be collapsed, otherwise false.

setIndentAllElements

public SourceFormatter setIndentAllElements(boolean indentAllElements)
Sets whether all elements are to be indented, including inline-level elements and those with preformatted contents.

The default value is false.

If this property is set to true, every element appears indented on a new line, including inline-level elements.

This generates output that is a good representation of the actual document element hierarchy, but is very likely to introduce white space that compromises the functional equivalency of the document.

Parameters:
indentAllElements - specifies whether all elements are to be indented.
Returns:
this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement.
See Also:
getIndentAllElements()

getIndentAllElements

public boolean getIndentAllElements()
Indicates whether all elements are to be indented, including inline-level elements and those with preformatted contents.

See the setIndentAllElements(boolean) method for a full description of this property.

Returns:
true if all elements are to be indented, otherwise false.

setNewLine

public SourceFormatter setNewLine(java.lang.String newLine)
Sets the string to be used to represent a newline in the output.

The default is to use the same new line string as is used in the source document, which is determined via the Source.getNewLine() method. If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document, or using the value from the static Config.NewLine property.

Specifying a null argument resets the property to its default value, which is to use the same new line string as is used in the source document.

Parameters:
newLine - the string to be used to represent a newline in the output, may be null.
Returns:
this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement.
See Also:
getNewLine()

getNewLine

public java.lang.String getNewLine()
Returns the string to be used to represent a newline in the output.

See the setNewLine(String) method for a full description of this property.

Returns:
the string to be used to represent a newline in the output.