Java / JSP Chinese garbage problem-solving experience (change)

2010-06-09  来源:本站原创  分类:Java  人气:280 

From: http://www.xici.net/u9206704/d56632455.htm

Since access to Java and JSP has been with the Java on the issue of Chinese garbage deal, and now finally been completely resolved, we now experience and share solutions.

1, Java the Chinese origin of the problem

The core Java class files, and unicode-based, which makes Java program has a good cross-platform, but also brought some Chinese disorderly trouble. There are two main reasons, Java, and JSP files generated when compiling their own garbled problems and interact with Java programs created in other media garbled question.

First Java (including JSP) source file may contain Chinese, and Java and JSP source file is saved is based on the byte stream, if Java and JSP files compiled during the class, use the source file encoding coding inconsistencies, there will be garbled. Based on this garbage, the proposed document in Java try not to write in Chinese (not involved in compiling some notes, write in Chinese does not matter), if you must write it, try manually with parameter-ecoding GBK or-ecoding gb2312 compile; for the JSP, the file header add <% @ page contentType = "text / html; charset = GBK"%> or <% @ page contentType = "text / html; charset = gb2312"%> Basically, this type of garbage can solve the problem.

This paper will focus on the second garbage, that is, Java programs interact with other storage media produced garbled. A lot of storage media, such as databases, files, streams and other storage methods are based on byte streams, Java programs interact with these media occurs when the character (char) and bytes (byte) conversion between, as follows :

Form from the page to submit data to the java program byte-> char
Java program to display the page from the char-> byte

Procedures from the database to java byte-> char
Java program to the database from the char-> byte

Java program from a file to byte-> char
Java program to a file from the char-> byte

Java program from the stream to byte-> char
Java program to flow from the char-> byte

If the above conversion process used in the original encoding and the byte coding is inconsistent, it is likely there will be garbled.

Second, solutions

As mentioned earlier, the Java programs interact with other media, the process of converting characters and bytes, if the the conversion process is easy to produce garbage. The key to solve these problems is to ensure garbled converted encoding to use when the original encoding and byte aligned separately below (Java or JSP self-generated garbage see Part I).

1, JSP page parameters and between the garbled
JSP page parameters for general use the system default encoding, if the page type and encoding parameters of the system default encoding type of inconsistency, it is likely there will be garbled. Solve the basic problems of this type of garbage is to obtain the parameters in the page before the request for mandatory parameters specified encoding: request.setCharacterEncoding ("GBK") or request.setCharacterEncoding ("gb2312").
If the variable output to the JSP page when a garbled, you can set the response.setContentType ("text / html; charset = GBK") or the response.setContentType ("text / html; charset = gb2312") solution.
If you do not want to write this for each file are two sentences more concise approach is to use the Servlet specification of the misplaced device specified encoding filter in the web.xml of the typical configuration and the main code is as follows:
web.xml:

<filter>
<filter-name> CharacterEncodingFilter </ filter-name>
<filter-class> net.vschool.web.CharacterEncodingFilter </ filter-class>
<init-param>
<param-name> encoding </ param-name>
<param-value> GBK </ param-value>
</ Init-param>
</ Filter>
<filter-mapping>
<filter-name> CharacterEncodingFilter </ filter-name>
<url-pattern> / * </ url-pattern>
</ Filter-mapping>

CharacterEncodingFilter.java:

public class CharacterEncodingFilter implements Filter
(

protected String encoding = null;

public void init (FilterConfig filterConfig) throws ServletException
(
this.encoding = filterConfig.getInitParameter ("encoding");
)

public void doFilter (ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException
(
request.setCharacterEncoding (encoding);
response.setContentType ("text / html; charset =" + encoding);
chain.doFilter (request, response);
)

)

2, Java and Database garbled most of the databases support unicode encoding, Java and database to resolve the problem between the garbled more sensible way is to directly interact with the database using unicode encoding. Many database-driven automatic support for unicode, such as Microsoft's SQLServer driver. Most of the other database-driven, you can drive the url parameter is specified, such as the mysql driver, such as mm: jdbc: mysql: / / localhost / WEBCLDB? UseUnicode = true & characterEncoding = GBK.

3, Java and file / stream between the garbled
Read and write Java class files is the most common FileInputStream / FileOutputStream and FileReader / FileWriter. FileInputStream and FileOutputStream which is based on byte stream, commonly used in reading and writing binary files. Recommended reading and writing files character character-based FileReader and FileWriter, eliminating the need for conversion between bytes and characters. However, the default constructor for the two classes using the system encoding, if the file content is inconsistent with the system encoding, may be garbled. In this case, recommend the use of FileReader and FileWriter parent class: InputStreamReader / OutputStreamWriter, which is also based on character, but in the constructor to specify the encoding type: InputStreamReader (InputStream in, Charset cs) and the OutputStreamWriter (OutputStream out, Charset cs).

4, other methods mentioned above should be able to solve most of the garbage problem, if in other places also garbled, you may need to manually modify the code. Java garbage problem solving is the key bytes and characters in the conversion process, you must know the original or the converted byte byte encoding, conversion code must be used consistent with this encoding. We used to use the Resin server, use smartUpload component upload files, upload files at the same time the Chinese argument for passing the problem is not garbled. When set to Linux in the Resin service, upload files at the same time parameters for the Chinese appeared garbled. This problem has troubled us for a long time, then we analyze the smartUpload component source files, because the file upload is a byte stream used by the way, which contains the parameter names and values is the way byte stream passed. smartUpload component byte stream and then read the parameter names and values from the byte stream to parse out, the problem appears in the smartUpload the byte stream into the string as a system default encoding, but will set into services Resin , the system default encoding may be changed, resulting in the garbage. Later, we changed the smartUpload the source file, adds an attribute charset and setCharset (String) method, upload () method to extract parameters of the statement:
String value = new String (m_binArray, m_startData, (m_endData - m_startData) + 1);
Changed
String value = new String (m_binArray, m_startData, (m_endData - m_startData) + 1, charset);
Finally solved the garbage problem.

Third, access to Java and JSP Postscript has been more than a year, the biggest gain this year is more in love with Java, began to study the issue as a pleasure, not the previous fear, I believe I will continue to down. This year, online learning a lot from the valuable experience of peers, to express gratitude. This is my first Java study summarized their own experience, as limited, this article biased and mistakes, please correct me. If you have some value in the retention of the original source of information and the premise of the article can be reproduced anywhere.
Reference before writing this paper, a lot of articles on the issue of Java in Chinese, which have the greatest impact on owen1944 in the "Java Research Organization," published in "This is a summary of some of our problems on the Chinese garbled some solutions and experience and share with you! "And so on. This article talked about a solution has been applied to the "web-based collaborative learning system-WebCL" and other projects and to achieve by Zi Yuan binding of Way Chinese Wen of the platform two versions of real-time switch. Google automatically selects the language according to the browser, a page also shows the international application in multiple languages and cars East "Java in Chinese and study notes - Hello Unicode" article caught my great interest in the future would like to continue to explore the Java- internationalization, welcome to the discussions.

from: http://hi.baidu.com/lsjlym/blog/item/d50914af2d6f86fffbed5014.html
Discuss issues of character

1, theme: the question about the Chinese JAVA
JAVA the Chinese are quite conspicuous, mainly in the control panel output, JSP pages and database access on the output.
This article try to avoid font problems, but only talks about coding. In this article, you can learn JAVA Chinese in origin, the solution, which raised about the methods to access the database using JDBC.

Second, the problem description:
1) in the Chinese W2000 Chinese window to compile and run, using the international version of the JDK, the connection is under the Cp936 Chinese W2000
SQL SERVER Database Coding:

J: \ exercise \ demo \ encode \ HelloWorld> make
Created by XCompiler. PhiloSoft All Rights Reserved.
Wed May 30 02:54:45 CST 2001

J: \ exercise \ demo \ encode \ HelloWorld> run
Created by XRunner. PhiloSoft All Rights Reserved.
Wed May 30 02:51:33 CST 2001
Chinese
[B @ 7bc8b569
[B @ 7b08b569
[B @ 7860b569
Chinese Chinese
????
Chinese Chinese
????
??
??
??

2) If the Western Chinese W2000 window (code 437) were compiled using JAVA to run due to no font can not display properly, if the same as above, the Chinese in the Chinese W2000 window operation, the output is:

J: \ exercise \ demo \ encode \ HelloWorld> run
Created by XRunner. PhiloSoft All Rights Reserved.
Wed May 30 02:51:33 CST 2001
????
[B @ 7bc0b66a
[B @ 7b04b66a
[B @ 7818b66a
????
????
????
????
????
????
Chinese Chinese
????

C) Analysis

1) there are garbled (is?). Since there is only? But no small box, shows only a coding problem, not the font problem. In the code, if you convert from one character set to do a character set, typically from GB2312
Converted to ISO8859_1 (ie ASCII), so many characters (half Chinese) can not be mapped to the text characters to the West, in this case, the system put these characters use? Place. Similarly, there are small to large character set character set can not be the case, do not go into much detail here the specific reasons.

2) the emergence of the Chinese environment, compiler, run-time environment, the Chinese character display correct and some not the right place, and similarly, in the Spanish environment, compile, run in the Chinese environment, when a similar situation. This is because the automatic (default) or manual (also new String (bytes [, encode]) and bytes getBytes ([encode])) transcoding results.

2.1) in the JAVA source files -> JAVAC -> Class -> Java -> getBytes ()--> new String ()--> show in the process every step of the code conversion process, this process always there, but sometimes with the default parameters. The following step by step analysis of why we are top of the situation there.

2.2) Here is the source code:

HelloWorld.java:
------------------------
public class HelloWorld
(
public static void main (String [] argv) (
try (
System.out.println ("Chinese ");// 1
System.out.println ("Chinese". GetBytes ());// 2
System.out.println ("Chinese". GetBytes ("GB2312 "));// 3
System.out.println ("Chinese". GetBytes ("ISO8859_1 "));// 4

System.out.println (new String ("Chinese". GetBytes ()));// 5
System.out.println (new String ("Chinese". GetBytes (), "GB2312 "));// 6
System.out.println (new String ("Chinese". GetBytes (), "ISO8859_1 "));// 7

System.out.println (new String ("Chinese". GetBytes ("GB2312 ")));// 8
System.out.println (new String ("Chinese". GetBytes ("GB2312"), "GB2312 "));// 9
System.out.println (new

String ("Chinese". GetBytes ("GB2312"), "ISO8859_1 "));// 10

System.out.println (new String ("Chinese". GetBytes ("ISO8859_1 ")));// 11
System.out.println (new

String ("Chinese". GetBytes ("ISO8859_1"), "GB2312 "));// 12
System.out.println (new

String ("Chinese". GetBytes ("ISO8859_1"), "ISO8859_1 "));// 13
)
catch (Exception e) (
e.printStackTrace ();
)
)
)

For convenience, in the back of each conversion plus the operation number, 1,2 ,..., 13, respectively.

2.3) should be explained that, JAVAC is the system default encoding to read source file, and then press the UNICODE encoding. In
When running JAVA, JAVA is using UNICODE encoding, and the default input and output are the operating system's default encoding, which means that new String (bytes [, encode]), the system that the input is encoded as encode the byte stream, in other words, if according to encode to translate bytes to get the correct results, the results of the last to JA
VA saved, it is converted from the encode Unicode, that there are bytes -> encode characters -> Uni
character code conversion; in String.getBytes ([encode]), the system want to be a Unicode character -> enco
de characters -> bytes conversion.

In this case, except that in English, except when the code window, in fact, cases, the default encoding is GBK (in this case, we have for the time being equated to GBK and GB2312).

2.4) due to unspecified use in the above code to achieve the conversion of two, if not specified encode, the system will use the default encoding (in this case as GBK), we believe that the above 5,6,7 and 8,9,10 is the same, 8 and 9,11 and 12 is the same, so we will only discuss 1,9,10,12,13 discussion. 2,3,4 which is only used for testing, not within the scope of our discussion.

2.5) The following procedures we have to keep track of the "in" word of the conversion process, we start with the Chinese to make an compile and run the window procedure, note the following letters in the subscript, I consciously use some figures to show the same, different or related to 2.5.1) we first used the above code in the 13 code 9 as an example:

Steps to explain the contents of location
01: C1 HelloWorld.java C1 refers to a GBK character
02: U1 JAVAC read U1 refers to a Unicode character
03: C1 getBytes () and the operating system first exchange step JAVA
04: B1, B2 getBytes () the second step and then back to byte array
05: C1 new String () and the operating system first exchange step JAVA
06: U1 new String () the second step and then return character
07: C1 println (String) to display "in" word, content, and the same as the original

2.5.2) and then to code 10, for example, we note only:

Steps to explain the contents of location
01: C1 HelloWorld.java C1 refers to a GBK character
02: U1 JAVAC read U1 refers to a Unicode character
03: C1 getBytes () and the operating system first exchange step JAVA
04: B1, B2 getBytes () the second step and then back to byte array
05: C3, C4 new String () first JAVA operating system first and exchange, this time parse errors
06: U5, U6 new String () the second step and then return character
07: C3, C4 println (String) As in the words to split into two halves, not just in ISO8859_1 character in

Can be mapped on, it appears as "??". In the above example,
"Chinese" word on the display as "????"
2.5.3) in the full Chinese mode similar to other circumstances, I do not say

2.6) We then see why the DOS window in Spanish classes compiled under the Chinese windows has a similar situation, in particular, why some cases can actually display the correct character.

2.6.1) we start to code 9 as an example:

Steps to explain the contents of location
01: C1C2 HelloWorld.java C1C2 were generally refers to a ISO8859_1 character, "the" word is apart
02: U3U4 JAVAC read U1U2 refers to a Unicode character
03: C5C6 getBytes () JAVA first step and the operating system communication, time parse errors
04: B5B6B7B8 getBytes () returns a byte array and then the second step
05: C5C6 new String () and the operating system first exchange step JAVA
06: U3U4 new String () the second step and then return character
07: C5C6 println (String) although the same two characters, but not the first "two ISO8859_1 character

Fu, "but rather" two BGK character "," the "show has become?" ? "
And "Chinese" on the show has become "????"

2.6.2) the following paragraph 12 of our code, for example, because it can display Chinese characters correctly

Steps to explain the contents of location

01: C1C2 HelloWorld.java C1C2 were generally refers to a ISO8859_1 character, "the" word is apart
02: U3U4 JAVAC read U1U2 refers to a Unicode character
03: C1C2 getBytes () first step in JAVA and the operating system before the exchange (note or right Oh!)
04: B5B6 getBytes () the second step and then return a byte array (which is a crucial step!)
05: C12 new String () first step in JAVA and the operating system before the exchange (which is even more crucial step, JAVA already know B5B6 to resolve into a character!)
06: U7 new String () the second step and then return character (really a key two! U7 contains U3U4 information)
07: C12 println (String) This is the original "in" word, it is hard to be a back JAVAC wronged, but was a bit programmer order out of chaos! Of course, the "Chinese" word can correctly show!

3) Why sometimes the use of JDBC
new String (Recordset.getBytes (int) [, encode])
Recordset.getSting (int)
Recordset.setBytes (String.getBytes ([encode]))
And
Recordset.setString (String)
Time will be garbled out?

Actually, the problem arises in the preparation of JDBC, the also consider the encoding problem, it reads data from the database, you may have made a free hand from GB2312 (default encoding) to Unicode conversion, I offer this WebLogic For SQL Server
The JDBC Driver is like this, when I read the string when the issue was not properly read Chinese characters, I hated the character string can write, which makes somewhat difficult to accept!
In other words, we had time to read or write to transcode, even though the transcoding sometimes not so obvious
This is because we use the default encoding transcoding. JDBC Driver has done the operation, we only have access to the source code inside to clear, is not it?

相关文章
  • Java / JSP Chinese garbled problem solving experience 2010-11-25

    Since the exposure to Java and JSP has to constantly deal with Java's garbage problem in Chinese, and now finally been completely resolved, we now experience and share solutions. One, Java the Chinese origin of the problem Java class files and the ke

  • jsp Chinese garbage problem solving 2010-05-30

    jsp Chinese garbage problem solving Method 1. JSP page displays garbled display the following page (display.jsp) The garbled :<html><head><title>JSP Chinese language processing </title><meta http-equiv="Content-Type" c

  • The Chinese garbage problem solving Msysgit 2010-05-31

    The Chinese garbage problem solving Msysgit Git for Windows version Msysgit support for Chinese is not good enough when used, the following three conditions in Chinese garbled: 1.ls not display Chinese catalog solution: the git \ etc \ git-completion

  • Chinese garbage problem solving netbeans 2010-06-28

    Chinese garbage problem solving netbeans After installing netbeans, menu in Chinese is no problem, the local tag and html content in Chinese for the distortion. Are: Solution is to modify the netbeans / etc / netbeans.conf file, add: -J-Dfile.encodin

  • Chinese garbage problem solving Msysgit 2010-12-07

    Chinese garbage problem solving Msysgit (rpm) The Windows version of Msysgit Git support for Chinese is not good enough when used, will appear garbled in Chinese three cases: ls can not display Chinese directory. The solution: the git / etc / git-com

  • Java / JSP Chinese garbage problem-solving experience (change) 2010-06-09

    From: http://www.xici.net/u9206704/d56632455.htm Since access to Java and JSP has been with the Java on the issue of Chinese garbage deal, and now finally been completely resolved, we now experience and share solutions. 1, Java the Chinese origin of

  • JSP Chinese garbage problem 2010-04-24

    JSP Chinese garbled to the first article: 1, we must ensure that the output to the client JSP is the output encoding in Chinese, that is all we have in our first generation of JSP source code by adding the following line: <% @ Page contentType = "

  • Difference between MySQL character set GBK.GB2312.UTF8 MYSQL Chinese garbage problem solving 2010-09-30

    MySQL character sets involved in several character-set-server/default-character-set: server character set, used by default. character-set-database: the database character set. character-set-table: database table character set. Increase in priority or

  • SSH-centos Chinese garbage problem solving 2010-04-06

    centos5.2 configuration is completed, normally can display Chinese, view found support for utf-8 character set. The putty and ssh secure shell connection only to find garbage, mainly for the implementation of vim garbled, su command garbled. Use $ lo

  • php code in the gb2312 Chinese garbage problem solving under Ajax 2010-03-03

    PHP send Chinese, Ajax receiving Php at the top of just adding a: header ('Content-type: text / html; charset = GB2312'); xmlHttp resolve them correctly in Chinese. Ajax to send the Chinese, PHP received The more complex: Ajax in the first on using e

  • Download Chinese garbage problem solving 2010-04-01

    The first is the js file to the page containing the link address in Chinese to do transcoding times var url = encodeURIComponent('http://conrol.javaeye.com/ Garbled .rar') Daemon fileName = new String(fileName.getBytes("ISO8859-1"),"UTF-8&q

  • Fax 500 Chinese to the server-side garbage problem solving 2010-06-25

    Fax 500 Chinese to the server-side garbage problem solving, very interesting things, ajax send the default is utf-8 encoding so js is not set function createXMLHttpRequest() { var xmlHttp; if (window.ActiveXObject) { xmlHttp = new ActiveXObject("Micr

  • 1. Web programming in Chinese garbage problem 2010-07-14

    In Web programming, involving transfer of two Chinese jsp page parameter, the problem will be garbled, the background always print out a number of question marks. Search the web will come out a lot of answers, most say that is changing my jsp page en

  • Achieved by DhtmlXtree display, modify, add, delete, move functions to achieve a tree, iframe single slider shown in the tree nodes contained in the iframe is too long for Chinese content problem solving 2010-10-12

    Achieved by DhtmlXtree display, modify, add, delete, move functions to achieve a tree, iframe single slider shown in the tree nodes contained in the iframe is too long for Chinese content problem solving 1, the data in the database composed of a stri

  • Hibernate MySQL Chinese garbage problem 2010-07-28

    Hibernate MySQL Chinese garbage problem <script type="text/javascript"> document.body.oncopy = function () (if (window.clipboardData) (setTimeout (function () (var text = clipboardData.getData ("text"); if (text & ; & tex

  • jsp + mysql Chinese garbage problem 2010-09-24

    jsp Page to input Chinese data, save to mysql The database is garbled, read data from the database after jsp Page display is garbled, to solve this problem from the following aspects need to be considered : 1. Access to mysql character set View mysql

  • FusionCharts Chinese garbage problem 2010-05-06

    Download free version from http://www.infosoftglobal.com/ site to the local, extract to a local, out JSClass directory and copy all the files under the Charts directory to your website chart directory, you can start to use FusionCharts trip. First, s

  • Mysql character encoding mechanism. Chinese garbage problem and solutions 2011-09-05

    ZZ http://apps.hi.baidu.com/share/detail/33178546 I believe many of my friends will turn away from the character encoding, but a garbled problems are endless head, the paper combined with previous experience and Mysql manual explains, with examples o

  • Eclipse.Properties Chinese garbled problem solving 2011-06-01

    Jira recently with Eclipse plug-in development, has been the Chinese garbled problems plague. After exploring, there are powerful search with google to find a bunch of relevant information, and finally put this issue is resolved. Now this process of

  • jfreechart histogram, Chinese garbled problem solving (transfer) 2011-05-08

    Original: http://hi.baidu.com/lynsahuang/blog/item/313c6fd0f6221eda562c84ab.html Added in the original Chinese garbled on the basis of solution to the problem In our programming if we want the results displayed as graphical form, to not be separated