JAVA specify the character set to read and write files

2010-03-29  来源:本站原创  分类:Java  人气:461 

JAVA read and write the file if you do not specify the character set, then the operating system are based on the default character set. When we create a file on a Windows platform, contains a Chinese, then in the Linux platform with the JAVA read out, then it is There may be garbled. This is because: in the Chinese version of Windows platform, the system's default character set for the GB18030, and Linux, compared with UTF-8. If we do not use other tools, in general, a document is what character set to write, then it should be the same character set in order not to read them wrong, of course, except in the case is compatible with the character set (nothing to do with the garbled, not discussion). pulled so much, nothing more than would like to read the file and write files when they can specify the character set can be solved garbled.
Reading File:
InputStreamReader isr = new InputStreamReader (new FileInputStream (
filePath), charsetName);

Write to the file:
OutputStreamWriter osw = new OutputStreamWriter (new FileOutputStream (
filePath), charsetName);

However, in many cases, the file may be derived from the other tools, we do not know the document character set is. In particular, import and export appear in the WEB, it is often the client is Windows, the server is Linux, solaris, AIX, or other . then generated by the client-side file into the server, if you need to analyze its contents, then likely be garbled into predicament. Is there any way to a file's content to determine the document character set then? seems to be no can be completely accurate judgments. online that can be according to the first few bytes of the file to determine, as in the "0xEF0xBB 0xBF" at the beginning of the text file as "UTF-8" format, but the judge limited to documents BOM ( Byte Order Mark, under the character set to play a little smarter) cases, if the file is not BOM, so get to "UTF-8" encoding the first three bytes of the file is not always the "EF, BB, BF", so This method can not be used to determine a document character set. There is a preferable way is to open-source Mozilla out of a package - "chardet", can be down to the sourceforge.net, but at the moment the site seems to have been sealed, and only Google or Baidu by. the adoption of the package are able to draw a file may be character sets. is not necessarily full potential (Windows Notepad can also get to determine the character set of documents, but also can not all be accurate, the most famous is to judge the "Unicom" word, use Notepad to write the word, save, then open that is garbled, unless you specify the character set), who are interested can look at. Overall, chardet should be a better solution a.

相关文章
  • JAVA specify the character set to read and write files 2010-03-29

    JAVA read and write the file if you do not specify the character set, then the operating system are based on the default character set. When we create a file on a Windows platform, contains a Chinese, then in the Linux platform with the JAVA read out

  • Java Note that character 2010-04-30

    JAVA escaped characters in the original four very simple: 1. Octal escape sequences: \ + 1-3 5 figures; range '\ 000' ~ '\ 377' \ 0: null character 2.Unicode escape characters: \ u + 4 hexadecimal digits; 0 ~ 65535 \ u0000: null character 3. Special

  • 解决Linux下编译Java产生 illegal character: \65279 错误的问题 2013-02-27

    1.问题背景 由于项目开发需要,从svn中checkout代码下来,用ide打开设置为UTF-8编码进行编译.结果却产生了 illegal character: \65279 错误,搞得一头雾水. 2.分析问题 开始以为是编码没有选对,调整了并询问原开发人员确认是UTF-8没有问题.上网翻阅资料后才发现,是由于Windows系统开发的编码为UTF-8(BOM)导致,BOM是Byte-Order Mark的意思.一种为了让编辑器自动识别编码.在文件前3个字节加上了EE,BB,BF,但标准的UTF-

  • Java in the character (string) and numeric types of conversion 2010-11-10

    Although simple, but many people ask about this in the . Brief description of : String string="123"; int x=Integer.parseInt(string); System.out.println("1: String to numeric value "+x); char c='5'; int x1=c-'0'; System.out.println(&quo

  • java coding, garbled, character series (3) 2010-08-05

    Some resources may refer to: http://www.javaeye.com/topic/311583 Be further added!

  • java operation xml w3c and xml store your picture files 2010-03-29

    // File 1 package com.kelsen.beans.imagehelper; import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.DataOutputStream; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputS

  • The basis of nine Java: Java class loader in the description of the class files can be encrypted, custom ClassLoader 2010-05-28

    /** * Java The class loader * * @author Zhang Xue * */ public class ClassLoaderStudy { public static void main(String[] args) throws Exception { // Gets the : Load the class loader ClassLoaderStudy System.out.println(ClassLoaderStudy.class.getClassLo

  • JAVA How to create \ delete \ modify \ copy directories and files - [change] 2010-07-17

    This requires import java.io class import java.io. *; public class FileOperate ( public FileOperate () ( ) / ** * New directory * @ Param folderPath String, such as c: / fqf * @ Return boolean * / public void newFolder (String folderPath) ( try ( Str

  • Usage java JNI transfer c / c + + cpp file link dll files 2010-11-06

    The characteristics of its cross-platform JAVA loved by the people, but precisely because it is the purpose of cross-platform, making it all the internal links of the local machine to become small, constrained by its function. JAVA on the local opera

  • java io Study Notes (character stream) 2010-03-29

    Character stream processing and byte streams similar, API is basically the same, that is, different units of measurement. Another character stream also provides a number of other processing streams, such as by line reads the stream, string streams, e

  • Reprinted: JAVA character encoding Series 1: Unicode, GBK, GB2312, UTF-8 based on the concept of 2010-09-06

    Weekend experience in java to convert character encoding problem, by himself in utf-8, GBK, and transfer to transfer to gb2312, and the results themselves Jinong dizzy. Although the final debugging through the code itself, but still know nothing abou

  • Reprinted: JAVA character encoding Series 3: Java application coding issues 2010-09-06

    Another two days time to summarize / organize a bit of encoding a variety of encoding methods, and Java applications In usage here recorded for future reference. In order to constitute a complete text encoding of knowledge against and in-depth Bawo,

  • Great summary of examination questions java interview Pen 2010-03-29

    Great summary of examination questions java interview Pen Great summary of examination questions java interview Pen First, to talk about the final, finally, finalize the difference. Most frequently asked. Second, Anonymous Inner Class (anonymous inne

  • Java Common Questions 1 2010-03-29

    What are the characteristics of object-oriented aspects of (1). Abstraction: Abstraction is to overlook a topic has nothing to do with the current target those aspects in order to more fully attention-related aspects of the current target. Abstract d

  • java regular expression escape 2010-04-25

    Learning java regular expression encountered three problems. 1, java strings and string pattern is very clear 2, there is the concept of capturing group, and also of the capture group after the replacement string, which appendReplacement (StringBuffe

  • Java I / O Programming 2010-05-04

    java i / o Principle Basic concepts: I / O (Input / Output) Data source (Data Source) Data places (Data Sink) Java in the different data sources and procedures for data transmission between the abstract representation of both the "flow" (Stream)

  • java programming ideas 2010-06-25

    Chap1 Object Description 1. Abstract process Alan Kay summarized five basic characteristics of Smalltalk. These characteristics represent a pure object-oriented programming method: (1). All things are objects. Think of a special kind of object variab

  • Explanation of _ java coding switch 2010-08-07

    Java Code Analysis Java and Unicode: Java-class file using utf8 encoding, JVM run-time using utf16. Java is unicode encoded string. In short, Java uses unicode character set, allowing easy internationalization. Java support what character set: Java c

  • Automatic identification of character set encoding jchardet 2010-08-11

    What is jchardet? jchardet is automatic character set detection algorithm mozilla code java transplantation, its source code can be downloaded from sourceforge. This algorithm first author is frank Tang, C + + source code http://www.infomall.cn/cgi-b

  • java io performance tuning 2010-09-12

    Most of this technology around the adjustment disk file I / O, but some content is the same for network I / O and the window output. The first part of the technical discussion of the underlying I / O issues, and discuss issues such as compression, fo