6.5 Advanced Byte-to-Character Conversion
In Example 6-4 we saw a basic loop for copying bytes from one
channel to another. Another commonly seen loop in programs that use
the New I/O API is one that combines reading or writing bytes with
decoding bytes to characters, or encoding characters to bytes. In
Example 6-3 we saw the Charset.decode(
) method for decoding a buffer of bytes into a buffer of
characters. This is actually a high-level convenience method, and
we'll see similar convenience methods elsewhere in
this chapter. For better streaming performance, however, you can use
the lower-level CharsetDecoder and
CharsetEncoder classes, as is done in Example 6-5. This example is the
ChannelToWriter class, which defines a single
static copy( ) method. This method reads bytes
from a specified channel, decodes them to characters using the
specified Charset, and then writes them to the
specified Writer. (Note that this is not the same
function performed by Channels.newReader( ),
Channels.newWriter( ), or
Channels.newChannel( ). The factory methods of the
Channels class allow you to wrap a channel around
a stream or a stream around a channel, but do not perform a copy.)
The read/decode/write loop shown in this example is a common one in
java.nio code, but is more complex than you might
expect. One reason for the complexity is that in many character
encodings, there is not a one-to-one correspondence between bytes and
characters. This means that there is no guarantee that all bytes in a
buffer can be decoded into characters each time through the
loop—one or more bytes at the end of the buffer might not
encode a complete character. Note also that before entering the loop,
we tell the CharsetDecoder to ignore bad input. If
we don't do this, we must examine the return value
of each decode( ) call to ensure that it was
successful.
Example 6-5. ChannelToWriter.java
package je3.nio;
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
import java.nio.channels.*;
public class ChannelToWriter {
/**
* Read bytes from the specified channel, decode them using the specified
* Charset, and write the resulting characters to the specified writer
*/
public static void copy(ReadableByteChannel channel, Writer writer,
Charset charset)
throws IOException
{
// Get and configure the CharsetDecoder we'll use
CharsetDecoder decoder = charset.newDecoder( );
decoder.onMalformedInput(CodingErrorAction.IGNORE);
decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
// Get the buffers we'll use and the backing array for the CharBuffer.
ByteBuffer bytes = ByteBuffer.allocateDirect(2*1024);
CharBuffer chars = CharBuffer.allocate(2*1024);
char[ ] array = chars.array( );
while(channel.read(bytes) != -1) { // Read from channel until EOF
bytes.flip( ); // Switch to drain mode for decoding
// Decode the byte buffer into the char buffer.
// Pass false to indicate that we're not done.
decoder.decode(bytes, chars, false);
// Put the char buffer into drain mode, and write its contents
// to the Writer, reading them from the backing array.
chars.flip( );
writer.write(array, chars.position( ), chars.remaining( ));
// Discard all bytes we decoded, and put the byte buffer back into
// fill mode. Since all characters were output, clear that buffer.
bytes.compact( ); // Discard decoded bytes
chars.clear( ); // Clear the character buffer
}
// At this point there may still be some bytes in the buffer to decode
// So put the buffer into drain mode, call decode( ) a final time, and
// finish with a flush( ).
bytes.flip( );
decoder.decode(bytes, chars, true); // True means final call
decoder.flush(chars); // Flush any buffered chars
// Write these final chars (if any) to the writer.
chars.flip( );
writer.write(array, chars.position( ), chars.remaining( ));
writer.flush( );
}
// A test method: copy a UTF-8 file to standard out
public static void main(String[ ] args) throws IOException {
FileChannel c = new FileInputStream(args[0]).getChannel( );
OutputStreamWriter w = new OutputStreamWriter(System.out);
Charset utf8 = Charset.forName("UTF-8");
ChannelToWriter.copy(c, w, utf8);
c.close( );
w.close( );
}
}
 |