Cross-Platform C++

ot::io
class InputStreamReader

#include "ot/io/InputStreamReader.h"

ot::io::Reader ot::SynchronizedObject ot::ManagedObject An InputStreamReader reads raw bytes from an InputStream and translates them into Unicode characters using an instance of the CodeConverter class to perform the translation. The encoding of the underlying byte input stream may be specified by name or by providing an instance of a CodeConverter. If no encoding is specified the system will use a default encoding.

A CodeConverter can be configured to behave in one of two specified ways when it encounters an encoding error. By default, encoding errors are silently dealt with by skipping the invalid byte sequence and returning the replacement character U+FFFD to the application. A stricter scheme is also available that treats encoding errors as non-recoverable and throws a MalformedInputException. Overloaded versions of the InputStreamReader constructors can be used to explicitly set the required policy.

To improve efficiency, the InputStreamReader contains a byte buffer into which it reads bytes from the underlying input stream. Therefore more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.

The following example demonstrates a simple transcoding function. It decodes a UTF-8 encoded file into a stream of Unicode characters and then writes the characters out encoded into bytes using the UTF-16 encoding:-

    File in(OT_T("utf8.txt"));
    File out(OT_T("utf16.txt"));

    RefPtr<Reader> rpRdr  = new InputStreamReader(
                            new FileInputStream(in), OT_T("UTF-8") );

    RefPtr<Writer> rpWtr = new OutputStreamWriter(
                           new FileOutputStream(out), OT_T("UTF-16") );

    CharType buffer[1024];
    long count;
    while( (count=rpRdr->read(buffer, sizeof(buffer))) != Reader::EndOfFile)
    {
        rpWtr->write(buffer, count);
    }
    rpWtr->flush();

Multi-threaded considerations:
As shown on the inheritance graph, InputStreamReader derives from SynchronizedObject, which gives it the ability to protect its internal state from concurrent access from multiple threads. All public methods are synchronized for safe concurrent access.
See also:
OutputStreamWriter



Constructor/Destructor Summary
InputStreamReader(InputStream* pInputStream)
         Constructs an InputStreamReader with pInputStream as the contained InputStream.
InputStreamReader(InputStream* pInputStream, const String& encoding)
         Constructs an InputStreamReader with pInputStream as the contained InputStream and encoding as the specified encoding name.
InputStreamReader(InputStream* pInputStream, CodeConverter* pDecoder)
         Constructs an InputStreamReader with pInputStream as the contained InputStream and pDecoder as the CodeConverter which will translate bytes from the input stream into Unicode characters.
InputStreamReader(InputStream* pInputStream, const String& encoding, bool bStrict)
         Constructs an InputStreamReader with pInputStream as the contained InputStream and encoding as the specified encoding name.
~InputStreamReader()
         The destructor frees resources associated with this InputStreamReader.

Method Summary
 virtual void close()
         Closes the Reader and its associated InputStream.
 RefPtr< CodeConverter > getDecoder() const
         Returns the CodeConverter used by this InputStreamReader to decode bytes into Unicode characters.
 String getEncoding() const
         Returns the canonical name of the encoding employed by the underlying byte stream.
 virtual long read(CharType* pBuffer, size_t bufLen)
         Reads up to bufLen characters into the supplied buffer.
 virtual Character readAtomic()
         Reads a single Unicode Character.
 virtual long readAtomic(CharType* pBuffer, size_t bufLen)
         Reads an integral number of Unicode characters into the supplied CharType buffer.
static String SenseEncoding(InputStream* pInputStream, size_t& BOMSize)
         A static helper function that attempts to guess the encoding used by an InputStream by checking the initial byte sequence for a Byte Order Mark (BOM).

Methods inherited from class ot::ManagedObject
addRef, getRefCount, onFinalRelease, operator=, release

Methods inherited from class ot::io::Reader
getLock, mark, markSupported, read, reset, skip, skipAtomic

Methods inherited from class ot::SynchronizedObject
lock, unlock

Constructor/Destructor Detail

InputStreamReader

 InputStreamReader(InputStream* pInputStream)
Constructs an InputStreamReader with pInputStream as the contained InputStream. The InputStream will be decoded using the default CodeConverter returned from the CodeConverterFactory. Malformed byte sequences will be silently converted into replacement characters.

Parameters:
pInputStream - the contained InputStream.
Exceptions:
NullPointerException - if pInputStream is null.
See also:
CodeConverterFactory::getDefaultConverter()

InputStreamReader

 InputStreamReader(InputStream* pInputStream,
                   const String& encoding)
Constructs an InputStreamReader with pInputStream as the contained InputStream and encoding as the specified encoding name. The InputStream will be decoded using a CodeConverter obtained from the CodeConverterFactory. Malformed byte sequences will be silently converted into replacement characters.

Parameters:
pInputStream - the contained InputStream.
encoding - the name of the encoding (e.g. "UTF-8")
Exceptions:
NullPointerException - if pInputStream is null.
UnsupportedEncodingException - if the CodeConverterFactory is unable to create a CodeConverter for the specified encoding

InputStreamReader

 InputStreamReader(InputStream* pInputStream,
                   CodeConverter* pDecoder)
Constructs an InputStreamReader with pInputStream as the contained InputStream and pDecoder as the CodeConverter which will translate bytes from the input stream into Unicode characters. The policy for dealing with malformed byte sequences can be specified using the CodeConverter::setInvalidCharAction() method.

Parameters:
pInputStream - the contained InputStream.
pDecoder - the CodeConverter to use for decoding bytes into Unicode characters
Exceptions:
NullPointerException - if either pInputStream or pDecoder is null.

InputStreamReader

 InputStreamReader(InputStream* pInputStream,
                   const String& encoding,
                   bool bStrict)
Constructs an InputStreamReader with pInputStream as the contained InputStream and encoding as the specified encoding name. The InputStream will be decoded using a CodeConverter obtained from the CodeConverterFactory. The policy for the treatment of malformed byte sequences is specified using the bStrict parameter.

Parameters:
pInputStream - the contained InputStream.
encoding - the name of the encoding (e.g. "UTF-8")
bStrict - when true, the CodeConverter is instructed to throw a MalformedInputException when it encounters an invalid byte sequence; otherwise invalid byte sequences are silently converted into a replacement character
Exceptions:
NullPointerException - if pInputStream is null.
UnsupportedEncodingException - if the CodeConverterFactory is unable to create a CodeConverter for the specified encoding

~InputStreamReader

virtual ~InputStreamReader()
The destructor frees resources associated with this InputStreamReader. The underlying InputStream is not explicitly closed, but it will be automatically closed when no further references to it exist.


Method Detail

close

virtual void close()
Closes the Reader and its associated InputStream. Once a Reader is closed, all system resources associated with the Reader are released, preventing any further read(), mark(), reset() or skip() operations. However, further calls to close() have no effect.

Exceptions:
IOException - if an I/O error occurs.
Multi-threaded considerations:
Synchronized for safe access from multiple concurrent threads.

getDecoder

RefPtr< CodeConvertergetDecoder() const
Returns the CodeConverter used by this InputStreamReader to decode bytes into Unicode characters.

Multi-threaded considerations:
Synchronized for safe access from multiple concurrent threads.

getEncoding

String getEncoding() const
Returns the canonical name of the encoding employed by the underlying byte stream.

Multi-threaded considerations:
Synchronized for safe access from multiple concurrent threads.

read

virtual long read(CharType* pBuffer,
                  size_t bufLen)
Reads up to bufLen characters into the supplied buffer. The CharType characters read into the supplied buffer may not make up an integral number of Unicode characters. For example, in the case where the internal character encoding is UTF-16, if the passed buffer has room for just one CharType, and the next Unicode character is higher than U+FFFF, then only the first half of the UTF-16 surrogate pair will be returned. The second half of the pair will be returned on the next read operation.

Parameters:
pBuffer - A pointer to the buffer into which the characters will be copied. This must be capable of holding at least bufLen CharType positions.
bufLen - The maximum number of CharType characters to read into the passed buffer. If this exceeds the maximum value that can be represented by a long integer, it is reduced to a value that can be so represented.
Returns:
The number of CharType characters read or Reader::EndOfFile if the end of the stream has been reached.
Exceptions:
IllegalArgumentException - if bufLen is zero
NullPointerException - if pBuffer is null
IOException - if an error occurs while reading from the character stream
See also:
readAtomic(CharType*, size_t)
Multi-threaded considerations:
Synchronized for safe access from multiple concurrent threads.

readAtomic

virtual Character readAtomic()
Reads a single Unicode Character. The Character class is capable of representing all Unicode characters up to U+10FFFF.

Returns:
The Character read or Character::EndOfFileCharacter if the end of the stream has been reached.
Exceptions:
AtomicReadException - if the next CharType is not on a character sequence boundary (i.e. a non-atomic read operation has been performed previously which resulted in an incomplete multi-character sequence being read)
IOException - if an error occurs while reading from the character stream
See also:
read()
Multi-threaded considerations:
Synchronized for safe access from multiple concurrent threads.

readAtomic

virtual long readAtomic(CharType* pBuffer,
                        size_t bufLen)
Reads an integral number of Unicode characters into the supplied CharType buffer. Reads as many characters that are available and that will fit into the supplied CharType buffer. Unicode characters that are encoded internally into multi-character sequences are either read in their entirety or not at all.

A return value of zero indicates that the supplied buffer was not large enough to hold the multi-character sequence for one Unicode character.

Parameters:
pBuffer - A pointer to the buffer into which the characters will be copied. This must be capable of holding at least bufLen CharType positions.
bufLen - The maximum number of CharType characters to read into the passed buffer. If this exceeds the maximum value that can be represented by a long integer, it is reduced to a value that can be so represented.
Returns:
The number of CharType characters read or Reader::EndOfFile if the end of the stream has been reached.
Exceptions:
IllegalArgumentException - if bufLen is zero
NullPointerException - if pBuffer is null
AtomicReadException - if the next CharType is not on a character sequence boundary (i.e. a non-atomic read operation has been performed previously which resulted in an incomplete multi-character sequence being read)
IOException - if an error occurs while reading from the character stream
See also:
readAtomic()
Multi-threaded considerations:
Synchronized for safe access from multiple concurrent threads.

SenseEncoding

static String SenseEncoding(InputStream* pInputStream,
                            size_t& BOMSize)
A static helper function that attempts to guess the encoding used by an InputStream by checking the initial byte sequence for a Byte Order Mark (BOM). This function uses mark() and reset() to re-position the input stream to its original location, so an InputStream that supports these operations must be used.

Parameters:
pInputStream - the byte input stream to test
BOMSize - a return parameter containing the size of the BOM detected
Returns:
a String containing the encoding name
Exceptions:
NullPointerException - if pInputStream is null.
IOException - if pInputStream does not support the mark operation.
See also:
InputStream::mark()


Cross-Platform C++

Found a bug or missing feature? Please email us at support@elcel.com

Copyright © 2000-2003 ElCel Technology   Trademark Acknowledgements