Codec

This module provides a registry for Unicode, transport, and encryption codecs. A codec can be registered under several names. Case is ignored when looking up a codec by its name.

Import List

    ADT:StringBuffer
    Exception
    IO
    Object
    Object
    RT0
 
Class List
CodecA codec provides functions to convert a sequence of 8-bit characters into the Unicode representation used by the Object.String class, and vice versa.
CryptoDecoder
CryptoEncoder
Decoder
Encoder
EncodingError
ExceptionEncoder
Class Summary: Codec [Detail]
  +---RT0.Object
       |
       +---Object.Object
            |
            +--Codec.Codec

A codec provides functions to convert a sequence of 8-bit characters into the Unicode representation used by the Object.String class, and vice versa. There are two kinds of functions, for convient translation of self contained entities, and for streaming encoding or decoding of chunked data.

For the convenience functions, decoding and encoding starts at the beginning of the passed entity in the default state, and it is an error if codec does not reach a "clean" state at the end of the entity. Errors should be signaled by something like an exception. Right now, this is approximated (badly) by a failed ASSERT.

For chunked encoding or decoding, separate functions are provided that take a state "token", and update it as part of their work. The token typically

  • contains information on some internal state (for example, holding the mode of operation if the encoding employs switches),

  • holds partially decoded data (for example, if the previously decoded chunk ended in an incomplete byte sequence),

  • specifies how to react to error conditions (like replace, discard, or abort),

  • keeps track of any errors (for example, by counting the number of discarded characters).

Field Summary
class-: CodecClass

          The type of the code.
preferredName-: STRING

          The preferred name for this codec.
Constructor Summary
Get(STRING): Codec

          
Method Summary
Decode(ARRAY OF CHAR): STRING

          Equivalent to `codec.Decode(data,0,String.Length(data))'.
DecodeRegion(ARRAY OF CHAR, LONGINT, LONGINT): STRING

          Decode the 8-bit character sequence in `data[start, end-1]'.
Encode(STRING): String8

          Equivalent to `codec.Encode(s,0,s.length)'.
EncodeRegion(STRING, LONGINT, LONGINT): String8

          Encode Unicode sequence in `s[start, end-1]' into the 8-bit character sequence.
INIT(CodecClass, ARRAY OF CHAR)

          
NewDecoder(): Decoder

          Creates a decoder object for the codec codec.
NewEncoder(): Encoder

          Creates an encoder object for the codec codec.
Inherited Methods

From RT0.Object:

          Finalize

From Object.Object:

          Equals, HashCode, ToString

 
Class Summary: CryptoDecoder [Detail]
  +---Codec.Decoder
       |
       +--Codec.CryptoDecoder
Method Summary
INIT()

          
SetKey(String8)

          
Inherited Methods

From Codec.Decoder:

          Decode, End, INIT, Reset, Start

 
Class Summary: CryptoEncoder [Detail]
  +---Codec.Encoder
       |
       +--Codec.CryptoEncoder
Method Summary
INIT(Codec)

          
SetKey(String8)

          
Inherited Methods

From Codec.Encoder:

          Closure, Encode, EncodeLatin1, EncodeUTF16, End, INIT, Reset, SetEscapeEncoder, Start

 
Class Summary: Decoder [Detail]
  +--Codec.Decoder
Method Summary
Decode(ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer)

          Decode the 8-bit character sequence in `data[start, end-1]' starting with the decoder state dec.
End()

          The complement operation to dec.Start, freeing all resources allocated earlier.
INIT()

          
Reset()

          Resets the decoder's state to that created by the initial dec.Start.
Start()

          Allocates and initializes all resources required for the decoder instance.
 
Class Summary: Encoder [Detail]
  +--Codec.Encoder
Method Summary
Closure(StringBuffer)

          If the encoder still holds any partial data from previous calls to enc.Encode, then flush this data to the buffer b.
Encode(STRING, LONGINT, LONGINT, StringBuffer)

          Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc.
EncodeLatin1(ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer)

          Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc.
EncodeUTF16(ARRAY OF LONGCHAR, LONGINT, LONGINT, StringBuffer)

          Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc.
End()

          The complement operation to enc.Start, freeing all resources allocated earlier.
INIT(Encoder)

          
Reset()

          Resets the encoder's state to that created by the initial enc.Start.
SetEscapeEncoder(Encoder)

          For character sequences that cannot be handled by this encoder, the encoder escape is called.
Start()

          Allocates and initializes all resources required for the encoder instance.
 
Class Summary: EncodingError [Detail]
  +---Exception.Exception
       |
       +---Exception.Checked
            |
            +---IO.Error
                 |
                 +--Codec.EncodingError
Method Summary
INIT(LONGINT, LONGINT)

          Initialize exception e and set start as its message.
Inherited Methods

From Exception.Exception:

          GetMessage, Name, WriteBacktrace

From IO.Error:

          INIT

 
Class Summary: ExceptionEncoder [Detail]
  +---Codec.Encoder
       |
       +--Codec.ExceptionEncoder
Method Summary
EncodeLatin1(ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer)

          Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc.
EncodeUTF16(ARRAY OF LONGCHAR, LONGINT, LONGINT, StringBuffer)

          Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc.
Inherited Methods

From Codec.Encoder:

          Closure, Encode, EncodeLatin1, EncodeUTF16, End, INIT, Reset, SetEscapeEncoder, Start

 
Type Summary
BufferLatin1 = ARRAY n OF CHAR

          
BufferUCS4 = ARRAY n OF UCS4CHAR

          
CodecClass = SHORTINT

          
Procedure Summary
EscapeLatin1(Encoder, ARRAY OF CHAR, LONGINT, LONGINT, StringBuffer)

          
EscapeUTF16(Encoder, ARRAY OF LONGCHAR, LONGINT, LONGINT, StringBuffer)

          
Register(Codec, STRING)

          
Variable Summary
exceptionEncoder-: ExceptionEncoder

          
Constant Summary
compression

          A compression codec tries to translate an 8-bit character sequence into a short 8-bit representation.
encryption

          Encrypts an 8-bit character string into another 8-bit string.
invalidChar

          The character cannot be mapped into the character range of the target encoding.
invalidData

          The input data of an operation is malformed.
transport

          A transport codec transforms an 8-bit character string into another 8-bit representation, typically escaping some character codes on the way.
unicode

          A Unicode codec translates a sequence of 32-bit Unicode code points into an 8-bit character sequence, and vice versa.

Class Detail: Codec
Field Detail

class

FIELD class-: CodecClass

The type of the code. One of Codec.unicode, Codec.transport, Codec.encryption, or Codec.compression.


preferredName

FIELD preferredName-: STRING

The preferred name for this codec. This is an ASCII string. A codec may be known under any number of names. If the codec has a preferred MIME name, then this value should be used here.

Constructor Detail

Get

PROCEDURE Get(name: STRING): Codec
Method Detail

Decode

PROCEDURE (codec: Codec) Decode(data: ARRAY OF CHAR): STRING

Equivalent to `codec.Decode(data,0,String.Length(data))'.


DecodeRegion

PROCEDURE (codec: Codec) DecodeRegion(data: ARRAY OF CHAR; 
                       start: LONGINT; 
                       end: LONGINT): STRING

Decode the 8-bit character sequence in `data[start, end-1]'. For succesful completion, the byte sequence `data[start, end-1]' must be well formed with respect to the decoder, and the resulting Unicode code points must all be valid.

Pre-condition: `0 <= start <= end <= LEN(data)'.


Encode

PROCEDURE (codec: Codec) Encode(s: STRING): String8
  RAISES EncodingError;

Equivalent to `codec.Encode(s,0,s.length)'.


EncodeRegion

PROCEDURE (codec: Codec) EncodeRegion(s: STRING; 
                       start: LONGINT; 
                       end: LONGINT): String8
  RAISES EncodingError;

Encode Unicode sequence in `s[start, end-1]' into the 8-bit character sequence. The result is stored in a string holding only code points in the range `[U+0000, U+00FF]'.

Pre-condition: `0 <= start <= end <= s.length'. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.


INIT

PROCEDURE (codec: Codec) INIT(class: CodecClass; 
               preferredName: ARRAY OF CHAR)

NewDecoder

PROCEDURE (codec: Codec) NewDecoder(): Decoder

Creates a decoder object for the codec codec. Note: Some decoders, like those implementing an decryption algorithm, require additional settings before they can be used.


NewEncoder

PROCEDURE (codec: Codec) NewEncoder(): Encoder

Creates an encoder object for the codec codec. By default, any character sequences the encoder cannot handle cause it to raise an exception EncodingError.

Note: Some encoders, like those implementing encryption algorithm, require additional settings before they can be used.

 
Class Detail: CryptoDecoder
Method Detail

INIT

PROCEDURE (dec: CryptoDecoder) INIT()

Redefines: INIT


SetKey

PROCEDURE (dec: CryptoDecoder) SetKey(key: String8)
 
Class Detail: CryptoEncoder
Method Detail

INIT

PROCEDURE (enc: CryptoEncoder) INIT(codec: Codec)

Redefines: INIT


SetKey

PROCEDURE (enc: CryptoEncoder) SetKey(key: String8)
 
Class Detail: Decoder
Method Detail

Decode

PROCEDURE (dec: Decoder) Decode(data: ARRAY OF CHAR; 
                 start: LONGINT; 
                 end: LONGINT; 
                 b: StringBuffer)

Decode the 8-bit character sequence in `data[start, end-1]' starting with the decoder state dec. The result is appended to the string buffer b. On completion, dec is updated to reflect the decoder's state after the last byte of the sequence has been processed.

Pre-condition: `0 <= start <= end <= LEN(data)'. dec.Start has been called.


End

PROCEDURE (dec: Decoder) End()

The complement operation to dec.Start, freeing all resources allocated earlier. After this method has been called, no other methods of this decoder must be called, except for dec.Start. The default implementation is a no-op.


INIT

PROCEDURE (dec: Decoder) INIT()

Reset

PROCEDURE (dec: Decoder) Reset()

Resets the decoder's state to that created by the initial dec.Start. All allocated resources are kept. Using this method, it is possible to use one and the same decoder for several different data streams.


Start

PROCEDURE (dec: Decoder) Start()

Allocates and initializes all resources required for the decoder instance. This method must be called once before dec.Decode. The default implementation is a no-op.

The amount of memory allocated in this step differs significantly across the different kinds of decoders. For example, the space requirements of a Unicode codec are virtually none, while a compression decoder may require several hundreds of KBytes.

 
Class Detail: Encoder
Method Detail

Closure

PROCEDURE (enc: Encoder) Closure(b: StringBuffer)

If the encoder still holds any partial data from previous calls to enc.Encode, then flush this data to the buffer b. This method must be called at the end of the data stream for codecs that operate on blocks of data, and for which the last and possibly incomplete block must be handled specially.


Encode

PROCEDURE (enc: Encoder) Encode(s: STRING; 
                 start: LONGINT; 
                 end: LONGINT; 
                 b: StringBuffer)
  RAISES EncodingError;

Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.

Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.


EncodeLatin1

PROCEDURE (enc: Encoder) EncodeLatin1(s: ARRAY OF CHAR; 
                       start: LONGINT; 
                       end: LONGINT; 
                       b: StringBuffer)
  RAISES EncodingError;

Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.

Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.


EncodeUTF16

PROCEDURE (enc: Encoder) EncodeUTF16(s: ARRAY OF LONGCHAR; 
                      start: LONGINT; 
                      end: LONGINT; 
                      b: StringBuffer)
  RAISES EncodingError;

Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.

Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.


End

PROCEDURE (enc: Encoder) End()

The complement operation to enc.Start, freeing all resources allocated earlier. After this method has been called, no other methods of this encoder must be called, except for enc.Start. The default implementation is a no-op.


INIT

PROCEDURE (enc: Encoder) INIT(escape: Encoder)

Reset

PROCEDURE (enc: Encoder) Reset()

Resets the encoder's state to that created by the initial enc.Start. All allocated resources are kept. Using this method, it is possible to use one and the same encoder for several different data streams.


SetEscapeEncoder

PROCEDURE (enc: Encoder) SetEscapeEncoder(escape: Encoder)

For character sequences that cannot be handled by this encoder, the encoder escape is called. It either raises an EncodingError exception, or translates the characters into a format that can be handled by enc. An example for this is an encoder that creates XML character references from code points that cannot be mapped by enc.


Start

PROCEDURE (enc: Encoder) Start()

Allocates and initializes all resources required for the encoder instance. This method must be called once before enc.Encode. The default implementation is a no-op.

The amount of memory allocated in this step differs significantly across the different kinds of decoders. For example, the space requirements of a Unicode codec are virtually none, while a compression decoder may require several MBytes of memory.

 
Class Detail: EncodingError
Method Detail

INIT

PROCEDURE (e: EncodingError) INIT(start: LONGINT; 
               end: LONGINT)

Initialize exception e and set start as its message. start may be NIL, but in this case Exception.GetMessage must be redefined to provide a non-NIL message.

Pre-condition: e is not NIL.

[Description inherited from INIT]

Redefines: INIT, INIT, INIT

 
Class Detail: ExceptionEncoder
Method Detail

EncodeLatin1

PROCEDURE (enc: ExceptionEncoder) EncodeLatin1(s: ARRAY OF CHAR; 
                       start: LONGINT; 
                       end: LONGINT; 
                       b: StringBuffer)
  RAISES EncodingError;

Encode the Latin1 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.

Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.

[Description inherited from EncodeLatin1]

Redefines: EncodeLatin1


EncodeUTF16

PROCEDURE (enc: ExceptionEncoder) EncodeUTF16(s: ARRAY OF LONGCHAR; 
                      start: LONGINT; 
                      end: LONGINT; 
                      b: StringBuffer)
  RAISES EncodingError;

Encode the UTF-16 character sequence in `s[start, end-1]' starting with the encoder state enc. The result is a string holding only code points in the range `[U+0000, U+00FF]', which is appended to the string buffer b. On completion, enc is updated to reflect the encoder's state after the last byte of the sequence has been processed.

Pre-condition: `0 <= start <= end <= s.length'. enc.Start has been called. All code points in `data[start, end-1]' are valid. That is, neither is out of range nor from the surrogate areas.

[Description inherited from EncodeUTF16]

Redefines: EncodeUTF16

 
Type Detail

BufferLatin1

TYPE BufferLatin1 = ARRAY n OF CHAR

BufferUCS4

TYPE BufferUCS4 = ARRAY n OF UCS4CHAR

CodecClass

TYPE CodecClass = SHORTINT
Procedure Detail

EscapeLatin1

PROCEDURE EscapeLatin1(enc: Encoder; 
                       s: ARRAY OF CHAR; 
                       start: LONGINT; 
                       end: LONGINT; 
                       b: StringBuffer)
  RAISES EncodingError;

EscapeUTF16

PROCEDURE EscapeUTF16(enc: Encoder; 
                      s: ARRAY OF LONGCHAR; 
                      start: LONGINT; 
                      end: LONGINT; 
                      b: StringBuffer)
  RAISES EncodingError;

Register

PROCEDURE Register(codec: Codec; 
                   name: STRING)

Pre-condition: name is an ASCII string.

Variable Detail

exceptionEncoder

VAR exceptionEncoder-: ExceptionEncoder
Constant Detail

compression

CONST compression 

A compression codec tries to translate an 8-bit character sequence into a short 8-bit representation.


encryption

CONST encryption 

Encrypts an 8-bit character string into another 8-bit string. Because encryption needs parameters like the encryption key and an initialization vector as input, the shorthand notations like Codec.DecodeRegion and Codec.EncodeRegion do not work.


invalidChar

CONST invalidChar 

The character cannot be mapped into the character range of the target encoding.


invalidData

CONST invalidData 

The input data of an operation is malformed. For example, a decode instruction operating on 32-bit values is called with a number of bytes that is not a multiple of 4.


transport

CONST transport 

A transport codec transforms an 8-bit character string into another 8-bit representation, typically escaping some character codes on the way.


unicode

CONST unicode 

A Unicode codec translates a sequence of 32-bit Unicode code points into an 8-bit character sequence, and vice versa.