python字符集的转换（mysql数据乱码的处理）

首页 > 代码库 > python字符集的转换（mysql数据乱码的处理）

python字符集的转换（mysql数据乱码的处理）

2024-08-25 12:16:00 220人阅读

本文参考：http://blog.csdn.net/crazyhacking/article/details/39375535 谢谢整理！
chardet模块：http://blog.csdn.net/tianzhu123/article/details/8187470
字符集转换部分：http://blog.chinaunix.net/uid-26249349-id-2846894.html

1.mysql乱码问题：

背景：两个msyql库，字符集均为gbk，需要从A库中取数据，插入到B库中，其中某些字段值为中文。
代码：

#!/usr/bin/env python
# _*_ encoding:utf-8 _*_

‘‘‘
author: tiantiandas
‘‘‘

import sys
reload(sys)
sys.setdefaultencoding(‘gbk‘)
import MySQLdb

def Connect_Mysql(sql,host):
    db_info = {‘host‘: host,
               ‘user‘: ‘test‘,
               ‘db‘: ‘TestDB‘,
               ‘passwd‘: ‘dnstest‘,
               ‘charset‘:‘gbk‘} #很关键
    try:
        connect = MySQLdb.connect(**db_info)
        cursor = connect.cursor()
        cursor.execute(sql)
        connect.commit()
        result = cursor.fetchone()
        return result
    except Exception as e:
        print e
        sys.exit(10)

def main():
    domain = sys.argv[1]
    query = ‘select Name,AdminDesc from EmailBox where Domain="{0}"‘.format(domain)
    try:
        Name, AdminDesc = Connect_Mysql(sql=query,host="host1")
        update = "update  EmailBox set Name=‘{0}‘,AdminDesc=‘{1} where Domain=‘{2}‘".format(Name,AdminDesc)
        try:
            print update
            Connect_Mysql(sql=update,host=‘host2‘)                
        except Exception as e:
            print e
    except Exception as e:
        print e

if __name__ == ‘__main__‘:
    main()

几个关键点：
- sys.setdefaultencoding(‘gbk‘) ：这段代码让从A库拉出的数据，python会将其解码为成gbk。（大概是这个意思）
- mysql编码： charset:gbk ：这个调整让写入到库中的数据字符集为gbk
所以如果拉出的数据是为了自己看的时候，就不需要 sys.setdefaultencoding(‘gbk‘)这段代码了。

2.关于编码和解码

chardet模块

chardet是字符编码识别的模块，使用如下：

#!/usr/bin/env python
# _*_ encoding:utf-8 _*_
import chardet
  
a="天天"
print chardet.detect(a)

结果：
{‘confidence‘: 0.75249999999999995, ‘encoding‘: ‘utf-8‘}

如果要对一个大文件进行编码识别，如下的方法，可以提高识别速度：（相比第一种，这种确实会快一些）

import urllib
from chardet.universaldetector import UniversalDetector
usock = urllib.urlopen(‘http://www.baidu.com/‘)
#创建一个检测对象
detector = UniversalDetector()
for line in usock.readlines():
    #分块进行测试，直到达到阈值
    detector.feed(line)
    if detector.done: break
#关闭检测对象
detector.close()
usock.close()
#输出检测结果
print detector.result

运行结果：
{‘confidence‘: 0.99, ‘encoding‘: ‘GB2312‘}

有了chardet模块，就可以识别获取数据的字符集格式，之后就可以将数据转换为想要的字符集格式了。

两个函数：
- decode：可以将数据解码为想要的字符集格式
- encode：可以将数据编码为想要的字符集格式
- python识别的是unicode，所以是用decode现将数据转换为unicode，之后再用encode将数据转换为想要的字符集。
测试代码：

>>> name="天天"
>>> name 
‘\xe5\xa4\xa9\xe5\xa4\xa9‘  #天天 汉字的gbk码

>>> b=name.decode(‘gbk‘)   
>>> b
u‘\u6fb6\u2541\u3049‘

>>> c=b.encode(‘utf8‘)
>>> c
‘\xe6\xbe\xb6\xe2\x95\x81\xe3\x81\x89‘

——————————————————————————

>>> ‘\xcc\xec\xcc\xec‘.decode(‘gbk‘)
u‘\u5929\u5929‘
>>> ‘\xcc\xec\xcc\xec‘.decode(‘gbk‘).encode(‘utf8‘)
‘\xe5\xa4\xa9\xe5\xa4\xa9‘
>>> ‘天天‘
‘\xe5\xa4\xa9\xe5\xa4\xa9‘

技术分享

python字符集的转换（mysql数据乱码的处理）

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > python字符集的转换（mysql数据乱码的处理）

python字符集的转换（mysql数据乱码的处理）

看完仍有疑问？有类似问题直接问程序猿