Cross-Platform C++

ot::cvt
class CodeConverter

#include "ot/cvt/CodeConverter.h"

ot::ManagedObject ot::CodeConverterBase Common base class for all code converters. A CodeConverter translates Unicode characters into byte sequences and vice versa. The design of CodeConverter is based on the std::codecvt class from the C++ standard library.

OpenTop comes supplied with CodeConverters for many common encodings such as UTF-8, UTF-16, Latin1 and many others.

See also:
CodeConverterFactory



Constructor/Destructor Summary
CodeConverter()
         Creates a CodeConverter with default values.

Method Summary
 virtual bool alwaysNoConversion() const
         Tests if this CodeConverter is using the same encoding as the OpenTop internal encoding.
 virtual Result decode(const Byte* from, const Byte* from_end, const Byte*& from_next, CharType* to, CharType* to_limit, CharType*& to_next)
         Decodes an array of bytes into an array of CharType characters that represent Unicode characters in the internal OpenTop encoding.
 virtual Result encode(const CharType* from, const CharType* from_end, const CharType*& from_next, Byte* to, Byte* to_limit, Byte*& to_next)
         Encodes an array of CharType characters, representing Unicode characters in the internal OpenTop encoding, into an array of bytes.
 virtual size_t getDecodedLength(const Byte* from, const Byte* from_end) const
         Returns the number of Unicode characters that would be created by decoding the array of bytes starting at from.
 virtual String getEncodingName() const
         Returns the canonical name for the encoding handled by this CodeConverter.
 CharAction getInvalidCharAction() const
         Returns the policy for dealing with invalid byte sequences.
 UCS4Char getInvalidCharReplacement() const
         Returns the Unicode character that will be used when this CodeConverter detects an invalid byte sequence.
 virtual size_t getMaxEncodedLength() const
         Returns the maximum number of bytes used to encode a single Unicode character up to U+10FFFF.
 CharAction getUnmappableCharAction() const
         Returns the policy for dealing with Unicode characters that cannot be mapped into the target encoding.
 UCS4Char getUnmappableCharReplacement() const
         Returns the Unicode character that will be used when this CodeConverter detects an unmappable Unicode character.
protected  void handleInvalidByteSequence(const Byte* from, size_t len) const
         Helper function that simply throws a MalformedInputException.
protected  virtual Result handleUnmappableCharacter(UCS4Char ch, Byte* to, Byte* to_limit, Byte*& to_next)
         Helper function called by derived classes' encode() method when it encounters an unmappable Unicode character.
protected  void internalEncodingError(const CharType* from, size_t len) const
         Helper function called by derived classes when they encounter a badly encoded internal CharType array.
 void setInvalidCharAction(CharAction eAction)
         Sets the policy for dealing with badly encoded byte sequences.
 void setInvalidCharReplacement(UCS4Char ch)
         Sets the replacement Unicode character used when the CodeConverter detects an invalid byte sequence.
 void setUnmappableCharAction(CharAction eAction)
         Sets the policy for dealing with Unicode characters that cannot be mapped into the target encoding.
 void setUnmappableCharReplacement(UCS4Char ch)
         Sets the replacement Unicode character used when the CodeConverter detects a Unicode character than cannot be encoded into the target encoding.
protected  void throwUnsupported(unsigned long illegalChar) const
        

Methods inherited from class ot::CodeConverterBase
IsLegalUTF16, IsLegalUTF8, UTF8Decode, UTF8Encode

Methods inherited from class ot::ManagedObject
addRef, getRefCount, onFinalRelease, operator=, release

Enumerations

enum CharAction { abort  
  replace  


Constructor/Destructor Detail

CodeConverter

 CodeConverter()
Creates a CodeConverter with default values.


Method Detail

alwaysNoConversion

virtual bool alwaysNoConversion() const
Tests if this CodeConverter is using the same encoding as the OpenTop internal encoding. If so, the reading and writing of characters can be optimized to by-pass the encoding process.

Returns:
true if this CodeConverter encodes Unicode characters into the OpenTop internal encoding; false otherwise

decode

virtual Result decode(const Byte* from,
                      const Byte* from_end,
                      const Byte*& from_next,
                      CharType* to,
                      CharType* to_limit,
                      CharType*& to_next)
Decodes an array of bytes into an array of CharType characters that represent Unicode characters in the internal OpenTop encoding.

Parameters:
from - pointer to the start of the byte array to decode
from_end - pointer to the next byte past the end of the byte array
from_next - return parameter which holds a pointer to the next byte in the array which has yet to be processed
to - pointer to the start of a CharType array which will hold the result of the decoding operation
to_limit - pointer to the next CharType past the end of the result array
to_next - return parameter which holds a pointer to the next CharType in the result array
Returns:
a Result code indicating the success of the operation.
Exceptions:
MalformedInputException - if an invalid byte sequence is detected and the policy for this CodeConverter is to abort in this situation.

encode

virtual Result encode(const CharType* from,
                      const CharType* from_end,
                      const CharType*& from_next,
                      Byte* to,
                      Byte* to_limit,
                      Byte*& to_next)
Encodes an array of CharType characters, representing Unicode characters in the internal OpenTop encoding, into an array of bytes.

Parameters:
from - pointer to the start of the CharType array to encode
from_end - pointer to the next CharType past the end of the input array
from_next - return parameter which holds a pointer to the next CharType in the array which has yet to be processed
to - pointer to the start of a byte array which will hold the result of the encoding operation
to_limit - pointer to the next byte past the end of the result array
to_next - return parameter which holds a pointer to the next byte in the result array
Returns:
a Result code indicating the success of the operation.
Exceptions:
UnmappableCharacterException - if an unmappable Unicode character is detected and the policy for this CodeConverter is to abort in this situation.

getDecodedLength

virtual size_t getDecodedLength(const Byte* from,
                                const Byte* from_end) const
Returns the number of Unicode characters that would be created by decoding the array of bytes starting at from. Depending on the internal encoding in use by OpenTop, this is not necessarily the same number of CharType characters that will be required to represent the Unicode characters.

Parameters:
from - pointer to the start of an encoded array of bytes
from_end - pointer to the next byte after the end of the array
Returns:
the number of Unicode characters represented by the byte sequence

getEncodingName

virtual String getEncodingName() const
Returns the canonical name for the encoding handled by this CodeConverter.


getInvalidCharAction

CharAction getInvalidCharAction() const
Returns the policy for dealing with invalid byte sequences.

See also:
setInvalidCharAction()

getInvalidCharReplacement

UCS4Char getInvalidCharReplacement() const
Returns the Unicode character that will be used when this CodeConverter detects an invalid byte sequence.

See also:
getInvalidCharAction()

getMaxEncodedLength

virtual size_t getMaxEncodedLength() const
Returns the maximum number of bytes used to encode a single Unicode character up to U+10FFFF.


getUnmappableCharAction

CharAction getUnmappableCharAction() const
Returns the policy for dealing with Unicode characters that cannot be mapped into the target encoding.

See also:
setUnmappableCharAction()

getUnmappableCharReplacement

UCS4Char getUnmappableCharReplacement() const
Returns the Unicode character that will be used when this CodeConverter detects an unmappable Unicode character.

See also:
getUnmappableCharAction()

handleInvalidByteSequence

protected void handleInvalidByteSequence(const Byte* from,
                                         size_t len) const
Helper function that simply throws a MalformedInputException.

Exceptions:
MalformedInputException - always

handleUnmappableCharacter

protected virtual Result handleUnmappableCharacter(UCS4Char ch,
                                                   Byte* to,
                                                   Byte* to_limit,
                                                   Byte*& to_next)
Helper function called by derived classes' encode() method when it encounters an unmappable Unicode character.

Parameters:
ch - the unmappable Unicode character
to - pointer to the next byte in the output byte array for the current encoding operation
to_limit - pointer to the next byte after the end of the output byte buffer
to_next - return parameter which holds a pointer to the next byte in the result array
Returns:
a Result code indicating the success of the operation

internalEncodingError

protected void internalEncodingError(const CharType* from,
                                     size_t len) const
Helper function called by derived classes when they encounter a badly encoded internal CharType array.

Parameters:
from - pointer to the start of the array @len length of the array

setInvalidCharAction

void setInvalidCharAction(CharAction eAction)
Sets the policy for dealing with badly encoded byte sequences. Two policies are supported: replace or abort.

When the action is set to CodeConverter::abort, a MalformedInputException is thrown by decode() when an invalid byte sequence is decoded. When the action is set to CodeConverter::replace, the invalid byte sequence is decoded as the replacement character returned from getInvalidCharReplacement().

Parameters:
eAction - the required action to take.
See also:
getInvalidCharAction()

setInvalidCharReplacement

void setInvalidCharReplacement(UCS4Char ch)
Sets the replacement Unicode character used when the CodeConverter detects an invalid byte sequence.

See also:
setInvalidCharAction()

setUnmappableCharAction

void setUnmappableCharAction(CharAction eAction)
Sets the policy for dealing with Unicode characters that cannot be mapped into the target encoding. Two policies are supported: replace or abort.

When the action is set to CodeConverter::abort, an UnmappableCharacterException is thrown by encode() when an unmappable Unicode character is encoded. When the action is set to CodeConverter::replace, the unmappable character is replaced by the character returned from getUnmappableCharReplacement().

Parameters:
eAction - the required action to take.
See also:
getUnmappableCharAction()

setUnmappableCharReplacement

void setUnmappableCharReplacement(UCS4Char ch)
Sets the replacement Unicode character used when the CodeConverter detects a Unicode character than cannot be encoded into the target encoding.

See also:
setUnmappableCharAction()

throwUnsupported

protected void throwUnsupported(unsigned long illegalChar) const



Cross-Platform C++

Found a bug or missing feature? Please email us at support@elcel.com

Copyright © 2000-2003 ElCel Technology   Trademark Acknowledgements