SUMMARY: MODULE | CLASS | TYPE | PROC | VAR | CONST | DETAIL: TYPE | PROC | VAR | CONST |
This module provides a registry for Unicode, transport, and encryption codecs. A codec can be registered under several names. Case is ignored when looking up a codec by its name.
ADT:StringBuffer Exception IO Object Object RT0
Class List | |
Codec | A codec provides functions to convert a sequence of 8-bit characters into the Unicode representation used by the Object.String class, and vice versa. |
CryptoDecoder | |
CryptoEncoder | |
Decoder | |
Encoder | |
EncodingError | |
ExceptionEncoder |
Class Summary: Codec [Detail] | |
+---RT0.Object | +---Object.Object | +--Codec.Codec A codec provides functions to convert a sequence of 8-bit characters into the Unicode representation used by the Object.String class, and vice versa. There are two kinds of functions, for convient translation of self contained entities, and for streaming encoding or decoding of chunked data. For the convenience functions, decoding and encoding starts at the beginning of the passed entity in the default state, and it is an error if codec does not reach a "clean" state at the end of the entity. Errors should be signaled by something like an exception. Right now, this is approximated (badly) by a failed ASSERT. For chunked encoding or decoding, separate functions are provided that take a state "token", and update it as part of their work. The token typically
| |
Field Summary | |
class-: CodecClass The type of the code. | |
preferredName-: STRING The preferred name for this codec. | |
Constructor Summary | |
Get(STRING): Codec | |
Method Summary | |
Decode(ARRAY OF CHAR): STRING Equivalent to `codec.Decode(data,0,String.Length(data))'. | |
DecodeRegion(ARRAY OF CHAR, LONGINT, LONGINT): STRING Decode the 8-bit character sequence in `data[start, end-1]'. | |
Encode(STRING): String8 Equivalent to `codec.Encode(s,0,s.length)'. | |
EncodeRegion(STRING, LONGINT, LONGINT): String8 Encode Unicode sequence in `s[start, end-1]' into the 8-bit character sequence. | |
INIT(CodecClass, ARRAY OF CHAR) | |
NewDecoder(): Decoder Creates a decoder object for the codec codec. | |
NewEncoder(): Encoder Creates an encoder object for the codec codec. | |
Inherited Methods | |
From RT0.Object: From Object.Object: |
Class Summary: CryptoDecoder [Detail] | |
+---Codec.Decoder | +--Codec.CryptoDecoder | |
Method Summary | |
INIT() | |
SetKey(String8) | |
Inherited Methods | |
Class Summary: CryptoEncoder [Detail] | |
+---Codec.Encoder | +--Codec.CryptoEncoder | |
Method Summary | |
INIT(Codec) | |
SetKey(String8) | |
Inherited Methods | |
From Codec.Encoder: |
Class Summary: Decoder [Detail] | |
+--Codec.Decoder | |
Method Summary | |
Decode(ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer) Decode the 8-bit character sequence in `data[start, end-1]' starting with the decoder state dec. | |
End() The complement operation to dec.Start, freeing all resources allocated earlier. | |
INIT() | |
Reset() Resets the decoder's state to that created by the initial dec.Start. | |
Start() Allocates and initializes all resources required for the decoder instance. |
Class Summary: Encoder [Detail] | |
+--Codec.Encoder | |
Method Summary | |
Closure(StringBuffer) If the encoder still holds any partial data from previous calls to enc.Encode, then flush this data to the buffer b. | |
Encode(STRING, LONGINT, LONGINT, StringBuffer) Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. | |
EncodeLatin1(ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer) Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc. | |
EncodeUTF16(ARRAY OF LONGCHAR, LONGINT, LONGINT, StringBuffer) Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. | |
End() The complement operation to enc.Start, freeing all resources allocated earlier. | |
INIT(Encoder) | |
Reset() Resets the encoder's state to that created by the initial enc.Start. | |
SetEscapeEncoder(Encoder) For character sequences that cannot be handled by this encoder, the encoder escape is called. | |
Start() Allocates and initializes all resources required for the encoder instance. |
Class Summary: EncodingError [Detail] | |
+---Exception.Exception | +---Exception.Checked | +---IO.Error | +--Codec.EncodingError | |
Method Summary | |
INIT(LONGINT, LONGINT) Initialize exception e and set start as its message. | |
Inherited Methods | |
Class Summary: ExceptionEncoder [Detail] | |
+---Codec.Encoder | +--Codec.ExceptionEncoder | |
Method Summary | |
EncodeLatin1(ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer) Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc. | |
EncodeUTF16(ARRAY OF LONGCHAR, LONGINT, LONGINT, StringBuffer) Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. | |
Inherited Methods | |
From Codec.Encoder: |
Type Summary | |
BufferLatin1 = ARRAY n OF CHAR | |
BufferUCS4 = ARRAY n OF UCS4CHAR | |
CodecClass = SHORTINT |
Procedure Summary | |
EscapeLatin1(Encoder, ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer) | |
EscapeUTF16(Encoder, ARRAY OF LONGCHAR, LONGINT, LONGINT, StringBuffer) | |
Register(Codec, STRING) |
Variable Summary | |
exceptionEncoder-: ExceptionEncoder |
Constant Summary | |
compression A compression codec tries to translate an 8-bit character sequence into a short 8-bit representation. | |
encryption Encrypts an 8-bit character string into another 8-bit string. | |
invalidChar The character cannot be mapped into the character range of the target encoding. | |
invalidData The input data of an operation is malformed. | |
transport A transport codec transforms an 8-bit character string into another 8-bit representation, typically escaping some character codes on the way. | |
unicode A Unicode codec translates a sequence of 32-bit Unicode code points into an 8-bit character sequence, and vice versa. |
Class Detail: Codec |
Field Detail |
FIELD class-: CodecClass
The type of the code. One of Codec.unicode, Codec.transport, Codec.encryption, or Codec.compression.
FIELD preferredName-: STRING
The preferred name for this codec. This is an ASCII string. A codec may be known under any number of names. If the codec has a preferred MIME name, then this value should be used here.
Constructor Detail |
PROCEDURE Get(name: STRING): Codec
Method Detail |
PROCEDURE (codec: Codec) Decode(data: ARRAY OF CHAR): STRING
Equivalent to `codec.Decode(data,0,String.Length(data))'.
PROCEDURE (codec: Codec) DecodeRegion(data: ARRAY OF CHAR; start: LONGINT; end: LONGINT): STRING
Decode the 8-bit character sequence in `data[start, end-1]'. For succesful completion, the byte sequence `data[start, end-1]' must be well formed with respect to the decoder, and the resulting Unicode code points must all be valid.
PROCEDURE (codec: Codec) Encode(s: STRING): String8 RAISES EncodingError;
Equivalent to `codec.Encode(s,0,s.length)'.
PROCEDURE (codec: Codec) EncodeRegion(s: STRING; start: LONGINT; end: LONGINT): String8 RAISES EncodingError;
Encode Unicode sequence in `s[start, end-1]' into the 8-bit character sequence. The result is stored in a string holding only code points in the range `[U+0000, U+00FF]'.
Pre-condition: `0 <= start <= end <= s.length'. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.
PROCEDURE (codec: Codec) INIT(class: CodecClass; preferredName: ARRAY OF CHAR)
PROCEDURE (codec: Codec) NewDecoder(): Decoder
Creates a decoder object for the codec codec. Note: Some decoders, like those implementing an decryption algorithm, require additional settings before they can be used.
PROCEDURE (codec: Codec) NewEncoder(): Encoder
Creates an encoder object for the codec codec. By default, any character sequences the encoder cannot handle cause it to raise an exception EncodingError.
Note: Some encoders, like those implementing encryption algorithm, require additional settings before they can be used.
Class Detail: CryptoDecoder |
Method Detail |
PROCEDURE (dec: CryptoDecoder) INIT()
Redefines: INIT
PROCEDURE (dec: CryptoDecoder) SetKey(key: String8)
Class Detail: CryptoEncoder |
Method Detail |
PROCEDURE (enc: CryptoEncoder) INIT(codec: Codec)
Redefines: INIT
PROCEDURE (enc: CryptoEncoder) SetKey(key: String8)
Class Detail: Decoder |
Method Detail |
PROCEDURE (dec: Decoder) Decode(data: ARRAY OF CHAR; start: LONGINT; end: LONGINT; b: StringBuffer)
Decode the 8-bit character sequence in `data[start, end-1]' starting with the decoder state dec. The result is appended to the string buffer b. On completion, dec is updated to reflect the decoder's state after the last byte of the sequence has been processed.
Pre-condition: `0 <= start <= end <= LEN(data)'. dec.Start has been called.
PROCEDURE (dec: Decoder) End()
The complement operation to dec.Start, freeing all resources allocated earlier. After this method has been called, no other methods of this decoder must be called, except for dec.Start. The default implementation is a no-op.
PROCEDURE (dec: Decoder) INIT()
PROCEDURE (dec: Decoder) Reset()
Resets the decoder's state to that created by the initial dec.Start. All allocated resources are kept. Using this method, it is possible to use one and the same decoder for several different data streams.
PROCEDURE (dec: Decoder) Start()
Allocates and initializes all resources required for the decoder instance. This method must be called once before dec.Decode. The default implementation is a no-op.
The amount of memory allocated in this step differs significantly across the different kinds of decoders. For example, the space requirements of a Unicode codec are virtually none, while a compression decoder may require several hundreds of KBytes.
Class Detail: Encoder |
Method Detail |
PROCEDURE (enc: Encoder) Closure(b: StringBuffer)
If the encoder still holds any partial data from previous calls to enc.Encode, then flush this data to the buffer b. This method must be called at the end of the data stream for codecs that operate on blocks of data, and for which the last and possibly incomplete block must be handled specially.
PROCEDURE (enc: Encoder) Encode(s: STRING; start: LONGINT; end: LONGINT; b: StringBuffer) RAISES EncodingError;
Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.
Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.
PROCEDURE (enc: Encoder) EncodeLatin1(s: ARRAY OF CHAR; start: LONGINT; end: LONGINT; b: StringBuffer) RAISES EncodingError;
Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.
Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.
PROCEDURE (enc: Encoder) EncodeUTF16(s: ARRAY OF LONGCHAR; start: LONGINT; end: LONGINT; b: StringBuffer) RAISES EncodingError;
Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.
Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.
PROCEDURE (enc: Encoder) End()
The complement operation to enc.Start, freeing all resources allocated earlier. After this method has been called, no other methods of this encoder must be called, except for enc.Start. The default implementation is a no-op.
PROCEDURE (enc: Encoder) INIT(escape: Encoder)
PROCEDURE (enc: Encoder) Reset()
Resets the encoder's state to that created by the initial enc.Start. All allocated resources are kept. Using this method, it is possible to use one and the same encoder for several different data streams.
PROCEDURE (enc: Encoder) SetEscapeEncoder(escape: Encoder)
For character sequences that cannot be handled by this encoder, the encoder escape is called. It either raises an EncodingError exception, or translates the characters into a format that can be handled by enc. An example for this is an encoder that creates XML character references from code points that cannot be mapped by enc.
PROCEDURE (enc: Encoder) Start()
Allocates and initializes all resources required for the encoder instance. This method must be called once before enc.Encode. The default implementation is a no-op.
The amount of memory allocated in this step differs significantly across the different kinds of decoders. For example, the space requirements of a Unicode codec are virtually none, while a compression decoder may require several MBytes of memory.
Class Detail: EncodingError |
Method Detail |
PROCEDURE (e: EncodingError) INIT(start: LONGINT; end: LONGINT)
Initialize exception e and set start as its message. start may be NIL, but in this case Exception.GetMessage must be redefined to provide a non-NIL message.
Pre-condition: e is not NIL.
[Description inherited from INIT]
Class Detail: ExceptionEncoder |
Method Detail |
PROCEDURE (enc: ExceptionEncoder) EncodeLatin1(s: ARRAY OF CHAR; start: LONGINT; end: LONGINT; b: StringBuffer) RAISES EncodingError;
Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.
Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.
[Description inherited from EncodeLatin1]
Redefines: EncodeLatin1
PROCEDURE (enc: ExceptionEncoder) EncodeUTF16(s: ARRAY OF LONGCHAR; start: LONGINT; end: LONGINT; b: StringBuffer) RAISES EncodingError;
Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.
Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.
[Description inherited from EncodeUTF16]
Redefines: EncodeUTF16
Type Detail |
TYPE BufferLatin1 = ARRAY n OF CHAR
TYPE BufferUCS4 = ARRAY n OF UCS4CHAR
TYPE CodecClass = SHORTINT
Procedure Detail |
PROCEDURE EscapeLatin1(enc: Encoder; s: ARRAY OF CHAR; start: LONGINT; end: LONGINT; b: StringBuffer) RAISES EncodingError;
PROCEDURE EscapeUTF16(enc: Encoder; s: ARRAY OF LONGCHAR; start: LONGINT; end: LONGINT; b: StringBuffer) RAISES EncodingError;
PROCEDURE Register(codec: Codec; name: STRING)
Pre-condition: name is an ASCII string.
Variable Detail |
VAR exceptionEncoder-: ExceptionEncoder
Constant Detail |
CONST compression
A compression codec tries to translate an 8-bit character sequence into a short 8-bit representation.
CONST encryption
Encrypts an 8-bit character string into another 8-bit string. Because encryption needs parameters like the encryption key and an initialization vector as input, the shorthand notations like Codec.DecodeRegion and Codec.EncodeRegion do not work.
CONST invalidChar
The character cannot be mapped into the character range of the target encoding.
CONST invalidData
The input data of an operation is malformed. For example, a decode instruction operating on 32-bit values is called with a number of bytes that is not a multiple of 4.
CONST transport
A transport codec transforms an 8-bit character string into another 8-bit representation, typically escaping some character codes on the way.
CONST unicode
A Unicode codec translates a sequence of 32-bit Unicode code points into an 8-bit character sequence, and vice versa.