MySQL运维实战(5.5) 数据导入导出时的字符集问题
mysql可以使用load data/select into outfile或mysqldump工具进行数据导入导出。下面分别分析数据导入导出时的字符集相关问题。
准备测试数据
创建测试表,2个字段分别使用gbk和utf8字符集
mysql> create table test_load(c_gbk varchar(100) character set gbk, c_utf8 varchar(100) character set utf8); Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> show create table test_load\G *************************** 1. row *************************** Table: test_load Create Table: CREATE TABLE `test_load` ( `c_gbk` varchar(100) CHARACTER SET gbk COLLATE gbk_chinese_ci DEFAULT NULL, `c_utf8` varchar(100) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec)
写入测试数据
mysql> insert into test_load values('列列列列列AAA', '列列列列列AAA'); Query OK, 1 row affected (0.00 sec) mysql> select c_gbk, hex(c_gbk), c_utf8, hex(c_utf8) from test_load; +--------------------+----------------------------+--------------------+--------------------------------------+ | c_gbk | hex(c_gbk) | c_utf8 | hex(c_utf8) | +--------------------+----------------------------+--------------------+--------------------------------------+ | 列列列列列AAA | C1D0C1D0C1D0C1D0C1D0414141 | 列列列列列AAA | E58897E58897E58897E58897E58897414141 | +--------------------+----------------------------+--------------------+--------------------------------------+
使用select into outfile导出数据
mysql> select * into outfile 'test_load.txt' from test_load; Query OK, 1 row affected (0.00 sec)
查看数据文件:
$ cat /usr/local/var/mysql/test/test_load.txt ����������AAA 列列列列列AAA $ od -t x1 /usr/local/var/mysql/test/test_load.txt 0000000 c1 d0 c1 d0 c1 d0 c1 d0 c1 d0 41 41 41 09 e5 88 0000020 97 e5 88 97 e5 88 97 e5 88 97 e5 88 97 41 41 41 0000040 0a 0000041
使用select into outfile时,默认不做字符集转换。
指定导出文件字符集
mysql> select * into outfile 'test_load_x.txt' character set utf8 from test_load; Query OK, 1 row affected, 1 warning (0.00 sec) lazybug:~ lazybug$ cat /usr/local/var/mysql/test/test_load_x.txt 列列列列列AAA 列列列列列AAA lazybug:~ lazybug$ od -t x1 /usr/local/var/mysql/test/test_load_x.txt 0000000 e5 88 97 e5 88 97 e5 88 97 e5 88 97 e5 88 97 41 0000020 41 41 09 e5 88 97 e5 88 97 e5 88 97 e5 88 97 e5 0000040 88 97 41 41 41 0a 0000046
如果指定select into outfile字符集,则会进行字符集的转换。
load data
mysql> create table test_load2 like test_load; Query OK, 0 rows affected (0.01 sec) mysql> load data local infile '/usr/local/var/mysql/test/test_load.txt' into table test_load2; ERROR 1300 (HY000): Invalid utf8mb4 character string: '' mysql> load data local infile '/usr/local/var/mysql/test/test_load.txt' into table test_load2 character set gbk; Query OK, 1 row affected (0.00 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 0 mysql> select * from test_load2; +--------------------+----------------------------+ | c_gbk | c_utf8 | +--------------------+----------------------------+ | 列列列列列AAA | 鍒楀垪鍒楀垪鍒桝AA | +--------------------+----------------------------+ 1 row in set (0.00 sec)
由于文件中存在2种编码的数据,可以使用binary字符集导入数据:
mysql> truncate table test_load2; Query OK, 0 rows affected (0.01 sec) mysql> load data local infile '/usr/local/var/mysql/test/test_load.txt' into table test_load2。 character set binary; Query OK, 1 row affected (0.00 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 0 mysql> select c_gbk, hex(c_gbk), c_utf8, hex(c_utf8) from test_load2; +--------------------+----------------------------+--------------------+--------------------------------------+ | c_gbk | hex(c_gbk) | c_utf8 | hex(c_utf8) | +--------------------+----------------------------+--------------------+--------------------------------------+ | 列列列列列AAA | C1D0C1D0C1D0C1D0C1D0414141 | 列列列列列AAA | E58897E58897E58897E58897E58897414141 | +--------------------+----------------------------+--------------------+--------------------------------------+ 1 row in set (0.00 sec)
总结
MySQL允许一个表的多个字段使用不同的字符集。如果一个表多个字段使用了不同的字符集,那么在数据导入导出时需要注意,要么导出时将所有数据都转换成同一种编码,要么导入时使用binary字符集,不然容易出现乱码问题。