Record the problems encountered during the completion of the project.
When java reads UTF-8 format file, it finds a very depressing problem: when the UTF-8 format file edited by ue is read, it will read an invisible character from the first line of the file.
The test code is as follows:
package test; import java.io.*; public class HelloWorld { public static void main(String[] args) { String fielPath = "C:\\Users\\16223\\Desktop\\hahaha2.txt"; //Get the encoding format of the file String codeString = codeString(fielPath); System.out.println(codeString); File file = new File(fielPath); BufferedReader reader = null; try { reader = new BufferedReader(new InputStreamReader(new FileInputStream(file),codeString)); String tempchar; while ((tempchar = reader.readLine()) != null) { //The first character is empty when reading files in utf-8 format char c = tempchar.charAt(0); System.out.println(c); /*if(c==65279) { //65279 Null character System.out.println("The first character is empty“); tempchar = tempchar.substring(1); }*/ System.out.println(tempchar); System.out.println(tempchar.startsWith("create table")); } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } catch (FileNotFoundException e) { e.printStackTrace(); }catch (IOException e) { e.printStackTrace(); } } public static String codeString(String filePath) { String encoding = null; File file = new File(filePath); BufferedInputStream bis = null; try { bis = new BufferedInputStream(new FileInputStream(file)); int p = (bis.read() << 8) + bis.read(); switch (p) { case 0xefbb: encoding = "UTF-8"; break; case 0xfffe: encoding = "Unicode"; break; case 0xfeff: encoding = "UTF-8"; break; case 0x5c75: encoding = "ASCII"; break; default: encoding = "GBK"; } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { try { if (bis != null) { bis.close(); } } catch (IOException e) { e.printStackTrace(); } } return encoding; } }
The results printed on the console are as follows:
UTF-8 create table aaa 'Ha ha ha'; false
It can be seen from the results that the second line is blank line, that is, when System.out.println (c) is executed, a blank line appears, which is the first formal character represented by C;
Then we know that it's the problem of Java reading BOM (Byte Order Mark). When using UTF-8, you can use three bytes of EF BB BF at the beginning of the file to identify that the file uses UTF-8 encoding to get the encoding format of the file. Of course, you can also use the three bytes. The above problem should be caused by reading the first three bytes.
resolvent:
1. Do not use BOM format code when saving files
GBK c create table aaa 'Crucible�'; true
The output result of the above console is that there is no BOM format. Of course, the Chinese code appears disorderly due to the direct modification of the saved format;
2. If you need to use the read content, you can also judge whether the current content has the situation mentioned above, that is, judge whether the first character is a null character. If so, you can cut off the first character
//The first character is empty when reading files in utf-8 format char c = tempchar.charAt(0); System.out.println(c); if(c==65279) { //65279 is an empty character System.out.println("The first character is empty"); tempchar = tempchar.substring(1); }