JAVA read and write the file if you do not specify the character set, then the operating system are based on the default character set. When we create a file on a Windows platform, contains a Chinese, then in the Linux platform with the JAVA read out, then it is There may be garbled. This is because: in the Chinese version of Windows platform, the system's default character set for the GB18030, and Linux, compared with UTF-8. If we do not use other tools, in general, a document is what character set to write, then it should be the same character set in order not to read them wrong, of course, except in the case is compatible with the character set (nothing to do with the garbled, not discussion). pulled so much, nothing more than would like to read the file and write files when they can specify the character set can be solved garbled.
Reading File:
InputStreamReader isr = new InputStreamReader (new FileInputStream (
filePath), charsetName);
Write to the file:
OutputStreamWriter osw = new OutputStreamWriter (new FileOutputStream (
filePath), charsetName);
However, in many cases, the file may be derived from the other tools, we do not know the document character set is. In particular, import and export appear in the WEB, it is often the client is Windows, the server is Linux, solaris, AIX, or other . then generated by the client-side file into the server, if you need to analyze its contents, then likely be garbled into predicament. Is there any way to a file's content to determine the document character set then? seems to be no can be completely accurate judgments. online that can be according to the first few bytes of the file to determine, as in the "0xEF0xBB 0xBF" at the beginning of the text file as "UTF-8" format, but the judge limited to documents BOM ( Byte Order Mark, under the character set to play a little smarter) cases, if the file is not BOM, so get to "UTF-8" encoding the first three bytes of the file is not always the "EF, BB, BF", so This method can not be used to determine a document character set. There is a preferable way is to open-source Mozilla out of a package - "chardet", can be down to the sourceforge.net, but at the moment the site seems to have been sealed, and only Google or Baidu by. the adoption of the package are able to draw a file may be character sets. is not necessarily full potential (Windows Notepad can also get to determine the character set of documents, but also can not all be accurate, the most famous is to judge the "Unicom" word, use Notepad to write the word, save, then open that is garbled, unless you specify the character set), who are interested can look at. Overall, chardet should be a better solution a.