python code to deal with the issue of Chinese

2010-07-29  来源:本站原创  分类:Python  人气:184 

Today, try Python's CGI module does not correctly display Chinese characters encountered the problem, very depressed.
Carefully searched all the Internet has finally solved the problem, the solution is stated as follows, to prevent the next mistake.

Page source code is as follows

# -*- Coding: utf8 -*-

import cgitb, cgi
cgitb.enable ()

form = cgi.FieldStorage ()
if (form.has_key ("name") and form.has_key ("addr")):
print "<p> name:", form ["name"]. value

print "<p> addr:", form ["addr"]. value

[Addr parameters tested here, only the Chinese] to receive Ascii characters to run well, but the reception of Chinese characters garbled,
Browser switch to the GB2312 encoding, you can display properly, but the individual require it to be UTF-8 encoded displayed

Into print "<p> addr:", form ["addr"]. Value.encode ('utf-8'), they reported the following error:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid data

In the see the http://blog.chinaunix.net/u2/68206/showart_668359.html finally, after understanding

Encoding and decoding of Python which is unicode and str transformation between the two forms.
Encoding is unicode -> str, contrast, decoding is str -> unicode.
The remaining question is to determine when to encode or decode. On file at the beginning of the "coding instructions,"
Is # -*- coding: -*- This statement. Python script file is the default UTF-8 encoding,
When the file has non-UTF-8 encoded characters within the time we will use the "encoding instructions" to fix.
About sys.defaultencoding, this is not explicitly specified in the decoding method used when decoding.

For example, I have the following code:

#! / Usr / bin / env python
# -*- Coding: utf-8 -*-
s = 'Chinese' # note that here is a str type str, not unicode
s.encode ('gb18030')

This code will re-encoded gb18030 s format, that is for unicode -> str conversion.
Since s itself is str type, so the first s Python will automatically decode unicode,
Then encoded into gb18030. Because decoding is a python automatic, we do not specify the decoding method
sys.defaultencoding python will use to decode the manner specified.
In many cases sys.defaultencoding is ANSCII, if s is not the type of error occurs.
Take the above circumstances, my sys.defaultencoding is anscii,
And s, encoding and file the same encoding is utf8, so wrong:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position
0: ordinal not in range (128)
In this case, we have two ways to correct the error:
The first clear indications of the encoding s

#! / Usr / bin / env python
# -*- Coding: utf-8 -*-

s = 'Chinese'
s.decode ('utf-8'). encode ('gb18030')

Second, change the file encoding sys.defaultencoding

#! / Usr / bin / env python
# -*- Coding: utf-8 -*-

import sys
reload (sys) # Python2.5 initialization will remove sys.setdefaultencoding this method, we need to reload
sys.setdefaultencoding ('utf-8')

str = 'Chinese'
str.encode ('gb18030')

Having read it, change it
print "<p> addr:", form ["addr"]. value.decode ('gb2312'). encode ('utf-8')
Successfully.

I summarize the reasons why they write:

1. When you get back the data and the encoding declared in the current script is inconsistent to do transcoding on

2. In the encoding when converting the first encoded data to its own format changed to unicode code, then press the unicode utf8 encoding

3. Why does my browser returns gb2312 encoding data to the server, and client systems should be a relationship code

Reproduced here by the way what about the Chinese problem Mysql Python operations:

Chinese garbled MySQL and Python operating problems

Are several measures to ensure that the output of MySQL does not mess:
1 Python documentation set encoding utf-8 (file preceded by # encoding = utf-8)
2 MySQL database charset = utf-8
3 Python connection with MySQL is the parameter charset = utf8
4 set Python's default encoding utf-8 (sys.setdefaultencoding (utf-8)

Java code
# Encoding = utf-8
import sys
import MySQLdb

reload (sys)
sys.setdefaultencoding ('utf-8')

db = MySQLdb.connect (user = 'root', charset = 'utf8')
cur = db.cursor ()
cur.execute ('use mydb')
cur.execute ('select * from mytb limit 100')

f = file ("/ home / user / work / tem.txt", 'w')

for i in cur.fetchall ():
f.write (str (i))
f.write ("")

f.close ()
cur.close ()

# Encoding = utf-8
import sys
import MySQLdb

reload (sys)
sys.setdefaultencoding ('utf-8')

db = MySQLdb.connect (user = 'root', charset = 'utf8')
cur = db.cursor ()
cur.execute ('use mydb')
cur.execute ('select * from mytb limit 100')

f = file ("/ home / user / work / tem.txt", 'w')

for i in cur.fetchall ():
f.write (str (i))
f.write ("")

f.close ()
cur.close ()

相关文章
  • python code to deal with the issue of Chinese 2010-07-29

    Today, try Python's CGI module does not correctly display Chinese characters encountered the problem, very depressed. Carefully searched all the Internet has finally solved the problem, the solution is stated as follows, to prevent the next mistake.

  • In this Japanese drama series Bloody Monday actually saw the Python code 2010-03-26

    See the blog in a Pythoner recommended Bloody Monday, to see a few sets, did not expect to see a lot of Python code to close. . . Can not say this film was very serious, although the drama is too cliche, but the Python code as the first Japanese dram

  • sqlite3 jdbc.c interfaces. python interface to deal with the problems encountered by the Chinese and their solutions 2010-11-13

    sqlite3 jdbc, c the interface, python interface to deal with the problems encountered by the Chinese and their solutions sqlite (version 3), hereinafter referred to as sqlite3, current function has been very powerful. Currently supported character se

  • dom4j generated xml issue in Chinese 2010-04-30

    dom4j generated xml issue in Chinese 2007-10-11 16:04 The past few days to start learning dom4j, on the Internet to find the article on the open dry, get started very quickly, but just can not find a problem with UTF-8 to save xml file, save the time

  • Fckeditor 2.6 From the issue of Chinese 2010-05-10

    Provided in the web.xml to configure FCKeditor to upload Servlet. <servlet> <servlet-name>ConnectorServlet</servlet-name> <servlet-class>net.fckeditor.connector.ConnectorServlet</servlet-class> <load-on-startup>1</lo

  • The vim of the mel and python code sent directly to the Maya in 2010-05-25

    I believe no matter what editor you can directly send to the Maya in the code and implement a very handy feature. In vim li achieve this function, you just install a plug-in on the line, but the necessary preparations or want. First, you need a suppo

  • python data submitted under the innodb issue 2010-07-12

    mysql database storage engine to innodb from isam transfer Today, the emergence of a strange question: insert, delete, update, data storage is not always Solving process: First suspected a problem with the database configuration parameters, but have

  • URL parsing python code 2010-10-18

    Analysis of the python script code, we often see from a search engine over the referr encoded with a long string, and if we can convert it into Chinese characters can be able to understand it, here is a small script to accomplish this things. Thank c

  • Learning ---- unicode python code 2010-12-21

    Summary from python1.6 to handle unicode characters start out. First, several common encoding format. 1.1, ascii, with one byte. 1.2, UTF-8, with 1-3 bytes that ascii code when it only takes 1 byte, ascii encoding is a subset of UTF-8. 1.3, UTF-16, w

  • Use coverage testing for Python code coverage 2011-05-01

    Coverage.py is a tool for measuring code coverage of Python programs. It monitors your program, noting which parts of the code have been executed, then analyzes the source to identify code that could have been executed but was not. Coverage measureme

  • To avoid the deterioration of the python code 2010-12-02

    Deterioration Code: s = [] for i in seq: s.append(foo(i)) Referral Code: s = map(foo, seq) More efficient code for i in itertools.imap(foo, seq): bar(i) Deterioration Code: for i in xrange(len(seq1)): foo(seq1[i], seq2[i]) Referral Code: for i, j in

  • To avoid the deterioration of the python code disputes 2010-12-07

    1.xrange and enumerate enumerate: enumerate is useful for obtaining an indexed list xrange: generates the numbers in the range on demand. For looping, this is slightly faster than range () and more memory efficient. Performance comparison is based on

  • jcaptcha verification code into jquery ui dialog issue in IE, there funny 2010-12-15

    Will produce the picture jcaptcha into jquery ui dialog appears the problem out now, do not say the problem is on IE6 <img align="bottom" src="jcaptcha.jpg"/> This picture is set, no dialog, then add no problem at all, when addin

  • Clear svn python code information file 2011-01-11

    Wrote a simple script to clear. Svn folders. Obtained from the svn source code repository, in each directory and the directory has a name. Svn folders. When want to download the code into svn to another project when, you need to delete the file svn-r

  • Test python code to respond to SIGTERM 2010-04-16

    #! import time, signal quit = False def shutdown_all(signum, frame): global quit quit = True if __name__=="__main__": signal.signal(signal.SIGTERM, shutdown_all) while not quit: time.sleep(1) print 'graceful quit' Verified by

  • the first one python code 2011-05-10

    print 'want' error. Check the next 3.0 to know print ('want')

  • Struts2 file uploads deal with the problem in Chinese 2010-03-24

    Afternoon of the next Struts2 taking the time to get the file upload, the Chinese had never solve the problem Then struts.xml configured struts.i18n.encoding to GBK, but the Chinese still garbled, looking for a long time have not been resolved Night,

  • To deal with xpdf, and pdfbox Chinese PDF document and its comparison 2009-01-09

    In my previous project using pdfbox, in reading Chinese documents can be read out most of the text, but in numbers, paging and other places, or the inevitable garbled. So I searched the internet to see if there is no solution, see saying: "PDFBox loo

  • Eclipse console output garbled issue of Chinese 2010-07-09

    Do today S2SH integrated example of where all the settings are set to UTF-8, including the tomcat configuration file server.xml, web.xml in increased filter, struts2 of i18N constants, etc., but the console and Action in Print out or garbled. Find so

  • MySQL garbled issue of Chinese UTF8 2010-09-01

    Very hard to detect the occurrence of garbled: Plug-in Development Profiles sometimes garbled, java file encoding errors when there will be garbled, often when doing web data into the database when the results occur garbled ...... , I am terribly ups