Java Canning: Character Streams in the Official Guide to I/O Streams

Keywords: Java Oracle encoding socket

Included in this album: Overview and Index of I/O Stream Official Chinese Guide Series

Most of the content comes from The Java™ Tutorials Official guidelines, the rest from other sources such as Translation of ifeve imooc, book Android interview book and so on.
Author: @youyuge
Personal blog sites: https://youyuge.cn

I. What is Character Stream

What is a byte stream? What's the relationship between them? Please read my article carefully and have a thorough understanding of character encoding.

Java Canned Opening: Character Coding Full Resolution

After reading it, you should understand it thoroughly. My personal summary is as follows.

  • First, file storage in computers is binary bytes of storage, and CPU s can only read binary bytes, that is, 0 and 1 (why do you ask? Baidu, but how do 0 and 1 represent our English letters and Chinese characters?
  • In order to represent the characters in our lives in binary system, we have made artificial regulations, that is, coding set. But the rules of coding are fixed by people. There are many kinds of rules. For example, when we code a Chinese character, the same Chinese character is different under different rules:

    Chinese Character "You" - - - --> 5C24 (UTF-8 encoding)
    Chinese Character "You" - - - --> D3C8 (GBK encoding)

  • So we write a txt file in UTF-8 format, and the actual computer will decode our characters into binary 0 and 1 and store them. When we open it in UTF-8 format, the text editor will encode and display a large number of 0 and 1 characters in the secondary system according to UTF-8 conversion rules.

  • The so-called open txt scrambling is well explained. We use UTF-8 coding criteria to open a txt written in GBK criteria, it will scramble. Generally speaking, the two coding criteria are actually different translations of a large number of binary 0 and 1.

  • For a more popular example, I have a Chinese phrase "squid is best to eat", which is translated into letters (decoding): you yu zui hao chi, store it and send it to others. Others open the file and see "you yu zui hao chi". He translates (encodes) it in English. He can't translate it, so he doesn't know why. So he used the Chinese code to translate, and found that this is not Pinyin, probably know the meaning. In this case, Chinese is a coding criterion (such as GBK), English is also a coding criterion (such as UTF-8), and letters are bytes, the underlying binary.

Character Streams, Official Definition of Character Streams

The Java platform stores character values using Unicode conventions. Character stream I/O automatically translates this internal format to and from the local character set. In Western locales, the local character set is usually an 8-bit superset of ASCII.

The Java platform uses Unicode standards to store the values of characters. Character stream I/O automatically converts this intrinsic form into a local character encoding set. In the West, local character sets are usually 8-bit supersets of ASCII.

Using Character Streams

All character stream classes are descended from Reader and Writer . As with byte streams, there are character stream classes that specialize in file I/O: FileReader and FileWriter . TheCopyCharacters example illustrates these classes.

All character stream classes are inherited from Reader and Writer. Like byte streams, there are classes of character streams dedicated to reading and writing files: FileReader and FileWriter . The following is a character stream copy of a file:

import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class CopyCharacters {
    public static void main(String[] args) throws IOException {

        FileReader inputStream = null;
        FileWriter outputStream = null;

        try {
            inputStream = new FileReader("xanadu.txt");
            outputStream = new FileWriter("characteroutput.txt");

            int c;
            while ((c = inputStream.read()) != -1) {
                outputStream.write(c);
            }
        } finally {
            if (inputStream != null) {
                inputStream.close();
            }
            if (outputStream != null) {
                outputStream.close();
            }
        }
    }
}

CopyCharacters is very similar to CopyBytes. The most important difference is that CopyCharacters uses FileReader and FileWriter for input and output in place of FileInputStream and FileOutputStream. Notice that both CopyBytes and CopyCharacters use an int variable to read to and write from. However, in CopyCharacters, the int variable holds a character value in its last 16 bits; in CopyBytes, the int variable holds a byte value in its last 8 bits.

  • Copy characters are very similar to copy bytes. The most important difference is that characters are copied using FileReader and FileWriter rather than FileInputStream and FileOutputStream.

  • Note that both the copy byte and the copy character are read and written using an int integer variable (4 bytes). However, when copying characters, int has only the last 2 bytes of data, and the first 2 bytes are all 0, because Java defaults to UTF-16 encoding. The copy byte, one byte at a time, so the int variable has data only in the last byte.

4. Character streams use byte streams

Character streams are often "wrappers" for byte streams. The character stream uses the byte stream to perform the physical I/O, while the character stream handles translation between characters and bytes. FileReader, for example, uses FileInputStream, while FileWriter uses FileOutputStream.

Character streams are packages of byte streams. Character streams use byte streams to manipulate physical I/O, while character streams deal with direct conversion of characters and bytes. The FileReader class uses FileInputStream, while FileWriter uses FileOutputStream.

There are two general-purpose byte-to-character "bridge" streams: InputStreamReader andOutputStreamWriter . Use them to create character streams when there are no prepackaged character stream classes that meet your needs. The sockets lesson in the networking trail shows how to create character streams from the byte streams provided by socket classes.

There are two general byte-to-character "bridge" streams: InputStreamReader and OutputStreamWriter . When there are no pre-packaged character streams that meet your requirements, use them to create character streams. In the socket course in the Network Guide, we show how to convert byte streams into character streams using the socket classes provided.

V. Bank-oriented I/O

Character I/O usually occurs in bigger units than single characters. One common unit is the line: a string of characters with a line terminator at the end. A line terminator can be a carriage-return/line-feed sequence ("\r\n"), a single carriage-return ("\r"), or a single line-feed ("\n"). Supporting all possible line terminators allows programs to read text files created on any of the widely used operating systems.

Sometimes we need to read in or out one line at a time. Usually, a line is defined as a string ending with a line terminator. Line terminators can be carriage return + newline ("\r\\") (under windows), a single carriage return ("\r"), or a single newline ("\\n") (the newline character of mac OSX system). In this way, text files created by different systems can be used to get the so-called line correctly.

Note: The println method adds line breaks to the end of the current operating system, so in order to ensure cross-platform performance, line breaks in Java cannot simply write "/r/n", but must be used:

 //java writes CRLF line break character according to system platform
 String lineSeparator = System.getProperty("line.separator", "/n");

Posted by alexk1781 on Mon, 07 Jan 2019 09:03:09 -0800