SUMMARY: MODULE | CLASS | TYPE | PROC | VAR | CONST | DETAIL: TYPE | PROC | VAR | CONST |
XML:UnicodeCodec
Class List | |
Codec | |
Factory |
Class Summary: Codec [Detail] | |
+---XML:UnicodeCodec.Codec | +--XML:UnicodeCodec:UTF8.Codec | |
Inherited Fields | |
Method Summary | |
Decode(VAR ARRAY OF CHAR, LONGINT, LONGINT, VAR ARRAY OF LONGCHAR, LONGINT, LONGINT, VAR LONGINT, VAR LONGINT) Decodes the bytes in `source[sourceStart, sourceEnd[' into the Unicode sequence `dest[destStart, destEnd['. | |
Encode(VAR ARRAY OF LONGCHAR, LONGINT, LONGINT, VAR ARRAY OF CHAR, LONGINT, LONGINT, VAR LONGINT, VAR LONGINT) Encodes the Unicode characters in `source[sourceStart, sourceEnd[' into the byte sequence `dest[destStart, destEnd['. | |
Inherited Methods | |
From XML:UnicodeCodec.Codec: |
Class Summary: Factory [Detail] | |
+---XML:UnicodeCodec.Factory | +--XML:UnicodeCodec:UTF8.Factory | |
Inherited Fields | |
From XML:UnicodeCodec.Factory: | |
Method Summary | |
GetEncodingName(VAR ARRAY OF CHAR) Returns the preferred MIME name for the factory's encoding. | |
NewCodec(): Codec Creates a new codec from factory f. | |
NewCodecBOM(VAR ARRAY OF CHAR, LONGINT, LONGINT, VAR LONGINT): Codec Creates a new codec from factoriy f, taking the byte order mark into account. | |
Inherited Methods | |
From XML:UnicodeCodec.Factory: |
Variable Summary | |
factory-: Factory |
Class Detail: Codec |
Method Detail |
PROCEDURE (codec: Codec) Decode(VAR source: ARRAY OF CHAR; sourceStart: LONGINT; sourceEnd: LONGINT; VAR dest: ARRAY OF LONGCHAR; destStart: LONGINT; destEnd: LONGINT; VAR sourceDone: LONGINT; VAR destDone: LONGINT)
Decodes the bytes in `source[sourceStart, sourceEnd[' into the Unicode sequence `dest[destStart, destEnd['.
sourceStart < sourceEnd, and the character sequence `source[sourceStart, sourceEnd[' holds the characters that are to be decoded.
destEnd-destStart >= maxUCS2EncodingLength. In other words, there must be enough room in the destination sequence `dest[destStart, destEnd[' to hold at least one UCS-4 character, possibly split into a high and low surrogate pair.
sourceStart is the value of sourceDone of a previous call to this procedure (or Factory.NewCodecBOM), or the address of `source[sourceStart]' is aligned on a 4-byte boundary. This ensures, that the decoder functions can access the source sequence in chunks of 2 and 4 bytes, without needing to worry about the alignment of memory accesses.
sourceEnd-sourceStart >= maxUTF8EncodingLength, or sourceEnd designates the end of the byte sequence being decoded. This means, that at least one complete character is encoded in the input sequence, or the input sequence ends with a possibly incomplete character.
sourceStart < sourceDone <= sourceEnd and destStart < destDone <= destEnd. This means, that at least one character has been decoded.
sourceDone > sourceEnd-maxUTF8EncodingLength or destDone > destEnd-maxUCS2EncodingLength. This implies, that the decoding algorithm continues until it gets near the end of the source or destination buffer. But the implementation of the decoding algorithm can be set up in such a way, that it stops when the input or output sequence of the next character may not fit into the buffers. It must not decode the maximum number of bytes that fit into the buffers.
If the procedure was started with sourceEnd-sourceStart < maxUTF8EncodingLength, and if there is enough room in the destination buffer to store the whole result, then all remaining bytes in the source sequence have been decoded and sourceDone equals sourceEnd.
Every malformed character, and every decoded character code that can not be mapped onto a Unicode character (i.e., one or two UCS-2 values), is replaced with decodeError, and the counter codec.invalidChars is incremented by one. The output of the decoding function contains only valid characters. That is, all surrogate codes are properly paired, and the character codes U+FFFE and U+FFFF are replaced with decodeError.
`dest[destStart, destDone[' holds the result of decoding `source[sourceStart, sourceDone['.
[Description inherited from Decode]
Redefines: Decode
PROCEDURE (codec: Codec) Encode(VAR source: ARRAY OF LONGCHAR; sourceStart: LONGINT; sourceEnd: LONGINT; VAR dest: ARRAY OF CHAR; destStart: LONGINT; destEnd: LONGINT; VAR sourceDone: LONGINT; VAR destDone: LONGINT)
Encodes the Unicode characters in `source[sourceStart, sourceEnd[' into the byte sequence `dest[destStart, destEnd['.
sourceStart < sourceEnd, and the Unicode character sequence `source[sourceStart, sourceEnd[' holds the characters that are to be decoded.
destEnd-destStart >= maxUTF8EncodingLength. In other words, there must be enough room in the destination sequence `dest[destStart, destEnd[' to hold at least one UCS-4 character, possibly encoded as a sequence of 6 bytes.
destStart is the value of destDone of a previous call to this procedure (or Factory.NewCodecBOM), or the address of `dest[destStart]' is aligned on a 4-byte boundary. This ensures, that the encoder functions can access the destination sequence in chunks of 2 and 4 bytes, without needing to worry about the alignment of memory accesses.
sourceEnd-sourceStart >= maxUCS2EncodingLength, or sourceEnd designates the end of the character sequence being decoded. This means, that at least one complete character is in the input sequence, or the input sequence ends with a possibly incomplete character.
sourceStart < sourceDone <= sourceEnd and destStart < destDone <= destEnd. This means, that at least one character has been encoded.
sourceDone > sourceEnd-maxUCS2EncodingLength or destDone > destEnd-maxUTF8EncodingLength. This implies, that the encoding algorithm continues until it gets near the end of the source or destination buffer. But the implementation of the encoding algorithm can be set up in such a way, that it stops when the input or output sequence of the next character may not fit into the buffers. It must not decode the maximum number of bytes that fit into the buffers.
If the procedure was started with sourceEnd-sourceStart < maxUCS2EncodingLength, and if there is enough room in the destination buffer to store the whole result, then all remaining bytes in the source sequence have been encoded and sourceDone equals sourceEnd.
Every malformed character, and every character code that can not be mapped onto a valid encoding, is replaced with encodeError, and the counter codec.invalidChars is incremented by one. Out of range Unicode characters encoded as a (high, low) surrogate pair are recognized as a single invalid character. The character codes U+FFFE and U+FFFF are also mapped to encodeError.
`dest[destStart, destDone[' holds the result of encoding `source[sourceStart, sourceDone['.
[Description inherited from Encode]
Redefines: Encode
Class Detail: Factory |
Method Detail |
PROCEDURE (f: Factory) GetEncodingName(VAR name: ARRAY OF CHAR)
Returns the preferred MIME name for the factory's encoding.
[Description inherited from GetEncodingName]
Redefines: GetEncodingName
PROCEDURE (f: Factory) NewCodec(): Codec
Creates a new codec from factory f. This should not be called for factories with an Factory.bom of bomOptional or bomRequired.
[Description inherited from NewCodec]
Redefines: NewCodec
PROCEDURE (f: Factory) NewCodecBOM(VAR source: ARRAY OF CHAR; sourceStart: LONGINT; sourceEnd: LONGINT; VAR sourceDone: LONGINT): Codec
Creates a new codec from factoriy f, taking the byte order mark into account. The exact behaviour of this procedure depends on the value of f.bom.
bomNotApplicable Any byte order mark is ignored, and sourceDone is set to sourceStart.
bomOptional If the source begins with a byte order mark, it is removed from the input and the correspondig codec is returned, and the parameter sourceDone is set after the end of the byte order mark. If there is no byte order mark, sourceDone is set to sourceStart and the default codec is returned.
bomRequired In the presence of a byte order mark, this is just like bomOptional, but without a byter order mark the returned codec's Codec.invalidChars counter is set to one and sourceDone is set to sourceStart.
Pre-condition: sourceEnd-sourceStart >= maxUTF8EncodingLength, or sourceEnd designates the end of the byte sequence being decoded. This means, that at least one complete character is encoded in the input sequence, or the input sequence ends with a possibly incomplete character.
[Description inherited from NewCodecBOM]
Redefines: NewCodecBOM
Variable Detail |
VAR factory-: Factory