SGML provides a mechanism to indicate that particular pieces of the document should be processed in a special way. These are termed “marked sections”.
As you would expect, being an SGML construct, a marked section
starts with <!
.
The first square bracket begins to delimit the marked section.
KEYWORD
describes how this marked
section should be processed by the parser.
The second square bracket indicates that the content of the marked section starts here.
The marked section is finished by closing the two square brackets,
and then returning to the document context from the SGML context with
>
.
These keywords denote the marked sections content model, and allow you to change it from the default.
When an SGML parser is processing a document it keeps track of what is called the “content model”.
Briefly, the content model describes what sort of content the parser is expecting to see, and what it will do with it when it finds it.
The two content models you will probably find most useful are
CDATA
and RCDATA
.
CDATA
is for “Character Data”.
If the parser is in this content model then it is expecting to see
characters, and characters only. In this model the
<
and &
symbols lose their special status, and will be treated as ordinary
characters.
RCDATA
is for “Entity references and
character data” If the parser is in this content model then it
is expecting to see characters and entities.
<
loses its special status, but
&
will still be treated as
starting the beginning of a general entity.
This is particularly useful if you are including some verbatim
text that contains lots of <
and
&
characters. While you
could go through the text ensuring that every
<
is converted to a
<
and every &
is converted to a &
, it can be
easier to mark the section as only containing CDATA. When the SGML
parser encounters this it will ignore the
<
and &
symbols
embedded in the content.
When you use CDATA
or
RCDATA
in examples of text marked up in SGML,
keep in mind that the content of CDATA
is not
validated. You have to check the included SGML text using other
means. You could, for example, write the example in another
document, validate the example code, and then paste it to your
CDATA
content.
<para>Here is an example of how you would include some text that contained many <literal><</literal> and <literal>&</literal> symbols. The sample text is a fragment of HTML. The surrounding text (<para> and <programlisting>) are from DocBook.</para> <programlisting> <![CDATA[ <p>This is a sample that shows you some of the elements within HTML. Since the angle brackets are used so many times, it is simpler to say the whole example is a CDATA marked section than to use the entity names for the left and right angle brackets throughout.</p> <ul> <li>This is a listitem</li> <li>This is a second listitem</li> <li>This is a third listitem</li> </ul> <p>This is the end of the example.</p> ]]> </programlisting>
If you look at the source for this document you will see this technique used throughout.
If the keyword is INCLUDE
then the contents
of the marked section will be processed. If the keyword is
IGNORE
then the marked section is ignored and
will not be processed. It will not appear in the output.
INCLUDE
and
IGNORE
in marked sections<![ INCLUDE [ This text will be processed and included. ]]> <![ IGNORE [ This text will not be processed or included. ]]>
By itself, this is not too useful. If you wanted to remove text from your document you could cut it out, or wrap it in comments.
It becomes more useful when you realize you can use parameter entities to control this. Remember that parameter entities can only be used in SGML contexts, and the keyword of a marked section is an SGML context.
For example, suppose that you produced a hard-copy version of some documentation and an electronic version. In the electronic version you wanted to include some extra content that was not to appear in the hard-copy.
Create a parameter entity, and set its value to
INCLUDE
. Write your document, using marked
sections to delimit content that should only appear in the
electronic version. In these marked sections use the parameter
entity in place of the keyword.
When you want to produce the hard-copy version of the document,
change the parameter entity's value to IGNORE
and
reprocess the document.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY % electronic.copy "INCLUDE"> ]]> ... <![ %electronic.copy [ This content should only appear in the electronic version of the document. ]]>
When producing the hard-copy version, change the entity's definition to:
<!ENTITY % electronic.copy "IGNORE">
On reprocessing the document, the marked sections that use
%electronic.copy
as their keyword will be
ignored.
Create a new file, section.xml
, that
contains the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY % text.output "INCLUDE"> ]> <html> <head> <title>An example using marked sections</title> </head> <body> <p>This paragraph <![CDATA[contains many < characters (< < < < <) so it is easier to wrap it in a CDATA marked section ]]></p> <![IGNORE[ <p>This paragraph will definitely not be included in the output.</p> ]]> <![%text.output [ <p>This paragraph might appear in the output, or it might not.</p> <p>Its appearance is controlled by the %text.output parameter entity.</p> ]]> </body> </html>
Normalize this file using sgmlnorm(1) and examine the output. Notice which paragraphs have appeared, which have disappeared, and what has happened to the content of the CDATA marked section.
Change the definition of the text.output
entity from INCLUDE
to
IGNORE
. Re-normalize the file, and examine the
output to see what has changed.
All FreeBSD documents are available for download at http://ftp.FreeBSD.org/pub/FreeBSD/doc/
Questions that are not answered by the
documentation may be
sent to <freebsd-questions@FreeBSD.org>.
Send questions about this document to <freebsd-doc@FreeBSD.org>.