





























|
 |  |  |  | An alternative parser integration |  |  |  |  |
This note proposes an alternative method of integrating the
output of the SAX parsing of the Flow Object (FO) tree into
FOP processing. The pupose of the proposed changes is to
provide for better decomposition of the process of analysing
and rendering an fo tree such as is represented in the output
from initial (XSLT) processing of an XML source document.
Figure 1 is a schematic representation of the process of SAX
parsing of an input source. SAX parsing involves the
registration, with an object implementing the
XMLReader interface, of a
ContentHandler which contains a callback
routine for each of the event types encountered by the
parser, e.g., startDocument() ,
startElement() , characters() ,
endElement() and endDocument() .
Parsing is initiated by a call to the parser()
method of the XMLReader . Note that the call to
parser() and the calls to individual callback
methods are synchronous: parser() will only
return when the last callback method returns, and each
callback must complete before the next is called.
Figure 1

In the process of parsing, the hierarchical structure of the
original FO tree is flattened into a number of streams of
events of the same type which are reported in the sequence
in which they are encountered. Apart from that, the API
imposes no structure or constraint which expresses the
relationship between, e.g., a startElement event and the
endElement event for the same element. To the extent that
such relationship information is required, it must be
managed by the callback routines.
The most direct approach here is to build the tree
"invisibly"; to bury within the callback routines the
necessary code to construct the tree. In the simplest case,
the whole of the FO tree is built within the call to
parser() , and that in-memory tree is subsequently
processed to (a) validate the FO structure, and (b)
construct the Area tree. The problem with this approach is
the potential size of the FO tree in memory. FOP has
suffered from this problem in the past.
|
On the other hand, the callback code may become increasingly
complex as tree validation and the triggering of the Area
tree processing and subsequent rendering is moved into the
callbacks, typically the endElement() method.
In order to overcome acute memory problems, the FOP code was
recently modified in this way, to trigger Area tree building
and rendering in the endElement() method, when
the end of a page-sequence was detected.
The drawback with such a method is that it becomes difficult
to detemine the order of events and the circumstances in
which any particular processing events are triggered. When
the processing events are inherently self-contained, this is
irrelevant. But the more complex and context-dependent the
relationships are among the processing elements, the more
obscurity is engendered in the code by such "side-effect"
processing.
|
The experimental code uses a class XMLEvent
to provide the objects which are placed in the queue.
XMLEvent includes a variety of methods to access
elements in the queue. Namespace URIs encountered in
parsing are maintined in a static
HashMap where they are associated with a unique
integer index. This integer value is used in the signature
of some of the access methods.
XMLEvent getEvent(SyncedCircularBuffer events) -
This is the basis of all of the queue access methods. It
returns the next element from the queue, which may be a
pushback element.
XMLEvent getEndDocument(events) -
get and discard elements from the queue
until an ENDDOCUMENT element is found and returned.
XMLEvent expectEndDocument(events) -
If the next element on the queue is an ENDDOCUMENT event,
return it. Otherwise, push the element back and throw an
exception. Each of the get methods (except
getEvent() itself) has a corresponding
expect method.
XMLEvent get/expectStartElement(events) -
Return the next STARTELEMENT event from the queue.
XMLEvent get/expectStartElement(events, String
qName) -
Return the next STARTELEMENT with a QName matching
qName.
XMLEvent get/expectStartElement(events, int uriIndex,
String localName)
-
Return the next STARTELEMENT with a URI indicated by the
uriIndex and a local name matching localName.
XMLEvent get/expectStartElement(events, LinkedList list)
-
list contains instances of the nested class
UriLocalName , which hold a
uriIndex and a localName. Return
the next STARTELEMENT with a URI indicated by the
uriIndex and a local name matching
localName from any element of
list.
XMLEvent get/expectEndElement(events) -
Return the next ENDELEMENT.
XMLEvent get/expectEndElement(events, qName) -
Return the next ENDELEMENT with QName
qname.
XMLEvent get/expectEndElement(events, uriIndex, localName) -
Return the next ENDELEMENT with a URI indicated by the
uriIndex and a local name matching
localName.
XMLEvent get/expectEndElement(events, XMLEvent event)
-
Return the next ENDELEMENT with a URI matching the
uriIndex and localName
matching those in the event argument. This
is intended as a quick way to find the ENDELEMENT matching
a previously returned STARTELEMENT.
XMLEvent get/expectCharacters(events) -
Return the next CHARACTERS event.
|
This same principle can be extended to the other major
sub-systems of FOP processing. In each case, while it is
possible to hold a complete intermediate result in memory,
the memory costs of that approach are too high. The
sub-systems - xml parsing, FO tree construction, Area tree
construction and rendering - must run in parallel if the
footprint is to be kept manageable. By creating a series of
producer-consumer pairs linked by synchronized buffers,
logical isolation can be achieved while rates of processing
remain coupled. By introducing feedback loops conveying
information about the completion of processing of the
elements, sub-systems can dispose of or precis those
elements without having to be tightly coupled to downstream
processes.
Figure 3

|
|
|
|