首页 > 代码库 > information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

1:CHARACTER_SETS
首先看一下查询前十条的结果:
root@localhost [information_schema]>select * from CHARACTER_SETS order by MAXLEN DESC limit 10;
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| utf32 | utf32_general_ci | UTF-32 Unicode | 4 |
| utf16le | utf16le_general_ci | UTF-16LE Unicode | 4 |
| gb18030 | gb18030_chinese_ci | China National Standard GB18030 | 4 |
| utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 |
| utf16 | utf16_general_ci | UTF-16 Unicode | 4 |
| eucjpms | eucjpms_japanese_ci | UJIS for Windows Japanese | 3 |
| ujis | ujis_japanese_ci | EUC-JP Japanese | 3 |
| utf8 | utf8_general_ci | UTF-8 Unicode | 3 |
| gbk | gbk_chinese_ci | GBK Simplified Chinese | 2 |
| ucs2 | ucs2_general_ci | UCS-2 Unicode | 2 |
+--------------------+----------------------+---------------------------------+--------+
看一下官方给的解释:
INFORMATION_SCHEMA NameSHOW NameRemarks
CHARACTER_SET_NAMECharset字符集 
DEFAULT_COLLATE_NAMEDefault collation默认排序 
DESCRIPTIONDescription描述MySQL extension
MAXLENMaxlen最大长度,字节数MySQL extension
这个表包括了MySQL支持的所有的字符集,一共是41中字符集,拿utf8 来说,默认排序utf8_general_ci ,一个字符最多占用三个字节。汉字在UTF8下就占用三个字节。
show create table 一下:
| CHARACTER_SETS | CREATE TEMPORARY TABLE `CHARACTER_SETS` (
`CHARACTER_SET_NAME` varchar(32) NOT NULL DEFAULT ‘‘,
`DEFAULT_COLLATE_NAME` varchar(32) NOT NULL DEFAULT ‘‘,
`DESCRIPTION` varchar(60) NOT NULL DEFAULT ‘‘,
`MAXLEN` bigint(3) NOT NULL DEFAULT ‘0‘
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
我们可以看到,ENGINE=MEMORY默认的引擎是memory的,也就是每次重启会重新生成一个一模一样的表
2:COLLATIONS
首先看一下查询前十条的结果:
oot@localhost [information_schema]>select * from COLLATIONS order by id limit 10;
+-------------------+--------------------+----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+-------------------+--------------------+----+------------+-------------+---------+
| big5_chinese_ci | big5 | 1 | Yes | Yes | 1 |
| latin2_czech_cs | latin2 | 2 | | Yes | 4 |
| dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 |
| cp850_general_ci | cp850 | 4 | Yes | Yes | 1 |
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| hp8_english_ci | hp8 | 6 | Yes | Yes | 1 |
| koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin2_general_ci | latin2 | 9 | Yes | Yes | 1 |
| swe7_swedish_ci | swe7 | 10 | Yes | Yes | 1 |
+-------------------+--------------------+----+------------+-------------+---------+
老规矩,贴一下官方解释:
 
INFORMATION_SCHEMA NameSHOW NameRemarks
COLLATION_NAMECollation 连线校对 
CHARACTER_SET_NAMECharset对应的字符集MySQL extension
IDId排序第几个,这个应该是MySQL自己编排的,不深究MySQL extension
IS_DEFAULTDefault
表示的字符集是否被编译到服务器
MySQL extension
IS_COMPILEDCompiled
涉及的存储器中的字符集表达的字符串进行排序所需的量。
MySQL extension
SORTLENSortlen
涉及的存储器中的字符集表达的字符串进行排序所需的量。
MySQL extension
一般情况下,我们可以使用 SHOW COLLATION这个语句查看一下。
show create table 一下:
------------------------------------------------------------------------------------------------------------+
| COLLATIONS | CREATE TEMPORARY TABLE `COLLATIONS` (
`COLLATION_NAME` varchar(32) NOT NULL DEFAULT ‘‘,
`CHARACTER_SET_NAME` varchar(32) NOT NULL DEFAULT ‘‘,
`ID` bigint(11) NOT NULL DEFAULT ‘0‘,
`IS_DEFAULT` varchar(3) NOT NULL DEFAULT ‘‘,
`IS_COMPILED` varchar(3) NOT NULL DEFAULT ‘‘,
`SORTLEN` bigint(3) NOT NULL DEFAULT ‘0‘
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
+------------+-------------------------------------------------------
内存表,系统自动生成,不会改变。
3:COLLATION_CHARACTER_SET_APPLICABILITY
看一下前十条数据,我们根据条件查询一下。
root@localhost [information_schema]>select * from COLLATION_CHARACTER_SET_APPLICABILITY where CHARACTER_SET_NAME like ‘%utf%‘ limit 10;
+-------------------+--------------------+
| COLLATION_NAME | CHARACTER_SET_NAME |
+-------------------+--------------------+
| utf8_general_ci | utf8 |
| utf8_bin | utf8 |
| utf8_unicode_ci | utf8 |
| utf8_icelandic_ci | utf8 |
| utf8_latvian_ci | utf8 |
| utf8_romanian_ci | utf8 |
| utf8_slovenian_ci | utf8 |
| utf8_polish_ci | utf8 |
| utf8_estonian_ci | utf8 |
| utf8_spanish_ci | utf8 |
+-------------------+--------------------+
10 rows in set (0.00 sec)
老规矩,贴一下官方解释:
INFORMATION_SCHEMA NameSHOW NameRemarks
COLLATION_NAMECollation 
CHARACTER_SET_NAMECharset 
很明显,就是一个字符集和连线校对的一个对应关系而已。毫无疑问的是这也是一个内存表,在初始化的会根据数据库的版本自动生成。
 
下面我们说一下character sets和collations的区别:
字符集(character sets)存储字符串,是指人类语言中最小的表义符号。例如’A‘、’B‘等;
连线校对(collations)规则比较字符串,collations是指在同一字符集内字符之间的比较规则
每个字符序唯一对应一种字符集,但一个字符集可以对应多种字符序,其中有一个是默认字符序(Default Collation)
 MySQL中的字符序名称遵从命名惯例:以字符序对应的字符集名称开头;以_ci(表示大小写不敏感)、_cs(表示大小写敏感)或_bin(表示按编码值比较)结尾。例如:在字符序“utf8_general_ci”下,字符“a”和“A”是等价的
看一下有关于字符集和校对相关的MySQL变量:
– character_set_server:默认的内部操作字符集
– character_set_client:客户端来源数据使用的字符集
– character_set_connection:连接层字符集
– character_set_results:查询结果字符集
– character_set_database:当前选中数据库的默认字符集
– character_set_system:系统元数据(字段名等)字符集
再看一下MySQL中的字符集转换过程:
1. MySQL Server收到请求时将请求数据从character_set_client转换为character_set_connection;
2. 进行内部操作前将请求数据从character_set_connection转换为内部操作字符集,其确定方法如下:
• 使用每个数据字段的CHARACTER SET设定值;
• 若上述值不存在,则使用对应数据表的DEFAULT CHARACTER SET设定值(MySQL扩展,非SQL标准);
• 若上述值不存在,则使用对应数据库的DEFAULT CHARACTER SET设定值;
• 若上述值不存在,则使用character_set_server设定值。
3. 将操作结果从内部操作字符集转换为character_set_results。
 
其中有借鉴别人博客,把地址贴下边方便大家理解,也感谢博主的贡献精神:
http://www.laruence.com/2008/01/05/12.html

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)