mysqlimport show warnings

Friday, October 10, 2008

Pumping Out Warnings When Using Load Data From the Command Line

I’m currently writing a SQL Server SSIS package that leverages the Load data infile command of mysql, to import data. The problem I’ve been running into is that I need to find out any warnings that occur and can be reviewed. Since I’m calling the mysql executable from the command line, I’m unable to capture this data. The command currently looks like this:

C:\>mysql -uuser -ppassword --execute="load data local infile ‘c:\\test.txt'
Replace into table test Fields terminated by ',' " -hServer DB

To work around this I’ve added a second command inside the batch that leverages the “show warnings” command, and now looks like this.

C:\>mysql -uuser -ppassword --execute="load data local infile ‘c:\\test.txt'
Replace into table test Fields terminated by ','; show warnings " -hServer DB

This now pumps out the results. In my case within ssis I pump this out to a text file and email it as part of the results such as


C:\>mysql -uuser -ppassword --execute="load data local infile ‘c:\\test.txt'
Replace into table test Fields terminated by ',' " -hServer DB >c:\testout.txt

0 comments:

MySQL:一波三折的数据导入经历,mysqlimport and MySQL load data
2010-08-03 13:30

本文代码运行环境:MySQL:5.1.26-rc-community,Windows 2003

无意中在 emule 的安装目录下看到了个 ip-to-country.csv 文件。打开后,发现是世界各国及IP段对照的文件。格式如下:

33996344,33996351,GB,GBR,英国
50331648,69956103,US,USA,美国
69956104,69956111,BM,BMU,百慕达群岛
69956112,83886079,US,USA,美国
94585424,94585439,SE,SWE,瑞典
100663296,121195295,US,USA,美国
121195296,121195327,IT,ITA,意大利
121195328,152305663,US,USA,美国

真是个好东东!正好一个项目要用到,就准备把数据导入到 MySQL 数据库中。首先在 MySQL 数据库中建立表结构:

use testdb;

create table ip_to_country
(
   ip1      int unsigned  not null
  ,ip2      int unsigned  not null
  ,cname1   varchar(10)   not null
  ,cname2   varchar(10)   not null
  ,cname3   varchar(50)   not null
)
  engine=innodb default charset=utf8;

我准备用 MySQL 导入工具:mysqlimport 来完成导入任务。首先把数据文件 ip-to-country.csv copy 到 d:\,为了使其和 MySQL 中表名匹配,重命名为 ip_to_country.csv。然后根据数据文件格式,编写并执行下面的 mysqlimport 脚本:

mysqlimport --local
            --user=root
            --password=123456
            --delete
            --default-character-set="gb2312"
            --fields-terminated-by=","
            --fields-enclosed-by=""
            --fields-escaped-by=""
            --lines-terminated-by="\r\n"
            testdb
            "d:/ip_to_country.csv"

注意:上面的 mysqlimport 参数应写成一行,为了便于阅读我有意分成多行。执行上面的 mysqlimport 命令后,发现有 Warnings:

testdb.ip_to_country: Records: 65290  Deleted: 0  Skipped: 0  Warnings: 15

数据已经导入,不管那么多了, 先进 MySQL 数据库看下效果。首先设置 character_set_results=gb2312;然后查询 10 条记录出来看看,步骤如下:

mysql> set character_set_results=gb2312;

mysql> show variables like '%char%';
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_client     | utf8                     |
| character_set_connection | utf8                     |
| character_set_database   | utf8                     |
| character_set_filesystem | binary                   |
| character_set_results    | gb2312                   |
| character_set_server     | utf8                     |
| character_set_system     | utf8                     |
| character_sets_dir       | D:\MySQL\share\charsets\ |
+--------------------------+--------------------------+
mysql> select * from ip_to_country limit 10;

+-----------+-----------+--------+--------+--------+
| ip1       | ip2       | cname1 | cname2 | cname3 |
+-----------+-----------+--------+--------+--------+
|  33996344 |  33996351 | GB     | GBR    | ?      |
|  50331648 |  69956103 | US     | USA    |        |
|  69956104 |  69956111 | BM     | BMU    |        |
|  69956112 |  83886079 | US     | USA    |        |
|  94585424 |  94585439 | SE     | SWE    |        |
| 100663296 | 121195295 | US     | USA    |        |
| 121195296 | 121195327 | IT     | ITA    |        |
| 121195328 | 152305663 | US     | USA    |        |
| 152305664 | 152338431 | GB     | GBR    | ?      |
| 152338432 | 167772159 | US     | USA    |        |
+-----------+-----------+--------+--------+--------+

结果发现国家的中文名称都是乱码。奇怪,已经把 mysqlimport 的 default-character-set 参数设为:gb2312,为什么会有乱码?最后不得以,只好在 MySQL 数据库中,把表 ip_to_country 的字符集改为 gb2312。

mysql> alter table ip_to_country default character set gb2312;
mysql> alter table ip_to_country convert to character set gb2312;

然后重新执行导入命令 mysqlimport,这时候发现 MySQL 乱码问题已解决,中文国家名字可以正常显示:

mysql> select * from ip_to_country limit 10;

+-----------+-----------+--------+--------+------------+
| ip1       | ip2       | cname1 | cname2 | cname3     |
+-----------+-----------+--------+--------+------------+
|  33996344 |  33996351 | GB     | GBR    | 英国         |
|  50331648 |  69956103 | US     | USA    | 美国         |
|  69956104 |  69956111 | BM     | BMU    | 百慕达群岛   |
|  69956112 |  83886079 | US     | USA    | 美国         |
|  94585424 |  94585439 | SE     | SWE    | 瑞典         |
| 100663296 | 121195295 | US     | USA    | 美国         |
| 121195296 | 121195327 | IT     | ITA    | 意大利       |
| 121195328 | 152305663 | US     | USA    | 美国         |
| 152305664 | 152338431 | GB     | GBR    | 英国         |
| 152338432 | 167772159 | US     | USA    | 美国         |
+-----------+-----------+--------+--------+------------+

留下一个问题:mysqlimport 到底能不能把文本文件中的 gb2312 字符转换成 utf8 导入到 MySQL 数据库中?

虽然问题看起来已经解决了,但我还想试下 MySQL load data 命令。 mysqlimport 虽然把数据导入数据库了,但还有 15 Warnings 在闹心。我本想利用 mysqlimport 自身的功能来查看这些 Warnings 到底是怎么回事,但翻翻手册,仍无计可施。MySQL 中有个 show warnings 给我一线希望。我这样想:先在 MySQL 中执行 Load data,然后 show warnings 不就可以找到问题所在了吗?

mysql> truncate table ip_to_country;

mysql>
mysql> load data infile "d:/ip_to_country.csv"
          replace into table ip_to_country
          character set gb2312
          fields terminated by "," enclosed by ""
          lines terminated by "\r\n";

ERROR 1262 (01000): Row 6737 was truncated; it contained more data than there were input columns

晕,又出现个拦路虎:ERROR 1262 (01000): Row 6737 was truncated; it contained more data than there were input columns. 最后发现问题是 sql_mode 的问题。

mysql> show variables like '%sql_mode%';

+---------------+----------------------------------------------------------------+
| Variable_name | Value                                                          |
+---------------+----------------------------------------------------------------+
| sql_mode      | strict_trans_tables,no_auto_create_user,no_engine_substitution |
+---------------+----------------------------------------------------------------+

mysql> set sql_mode='no_auto_create_user,no_engine_substitution';

把 strict_trans_tables 从 sql_mode 中去掉,再次执行 MySQL Load data

mysql>  load data infile "d:/ip_to_country.csv"
               replace into table ip_to_country
               character set gb2312
               fields terminated by "," enclosed by ""
               lines terminated by "\r\n";

Query OK, 65290 rows affected, 15 warnings (0.63 sec)
Records: 65290  Deleted: 0  Skipped: 0  Warnings: 15

接下来,用 MySQL show warnings 命令,来找警告的详细描述:

mysql> show warnings;

+---------+------+-------------------------------------------------------------------------------+
| Level   | Code | Message                                                                       |
+---------+------+-------------------------------------------------------------------------------+
| Warning | 1262 | Row 6737 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 6817 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 6914 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 6916 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 6918 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 6988 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 7028 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 7226 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 7569 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 7791 was truncated; it contained more data than there were input columns  |
| Warning | 1262 | Row 47856 was truncated; it contained more data than there were input columns |
| Warning | 1262 | Row 47885 was truncated; it contained more data than there were input columns |
| Warning | 1262 | Row 49331 was truncated; it contained more data than there were input columns |
| Warning | 1262 | Row 49539 was truncated; it contained more data than there were input columns |
| Warning | 1262 | Row 49547 was truncated; it contained more data than there were input columns |
+---------+------+-------------------------------------------------------------------------------+

根据行号 Row 6737 到 ip_to_country.csv 中查看,发现果然有问题,国家中文名中间多了个逗号 “,”

1089579216,1089579223,VI,VIR,维京群岛,美国

看来,需要在表 ip_to_country 中再增加一列,来存放多出的内容。于是修改表结构:

mysql> alter table ip_to_country add column cname4 varchar(50) null;

再次执行 mysql load data,数据顺利导入。这时仍有警告,这些警告是因为文件中的大部分数据行只有 5 列,而表中有 6 列,因此 MySQL 才 Warning。

把表 ip_to_country 的字符集改为 utf8,看有没有乱码:

truncate table ip_to_country;
alter table ip_to_country default character set utf8;
alter table ip_to_country convert to character set utf8;

再次,执行 MySQL Load data 命令:

mysql>  load data infile "d:/ip_to_country.csv"
            replace into table ip_to_country
            character set gb2312
            fields terminated by "," enclosed by ""
            lines terminated by "\r\n";

Query OK, 65290 rows affected, 65275 warnings (0.64 sec)
Records: 65290  Deleted: 0  Skipped: 0  Warnings: 65275

怀着激动的心情,select:

mysql> select * from ip_to_country where cname4 is not null limit 10;

+------------+------------+--------+--------+----------+--------+
| ip1        | ip2        | cname1 | cname2 | cname3   | cname4 |
+------------+------------+--------+--------+----------+--------+
| 1089579216 | 1089579223 | VI     | VIR    | 维京群岛 | 美国   |
| 1093062144 | 1093062399 | VI     | VIR    | 维京群岛 | 美国   |
| 1097896192 | 1097897215 | VI     | VIR    | 维京群岛 | 美国   |
| 1097947136 | 1097949183 | VI     | VIR    | 维京群岛 | 美国   |
| 1097951232 | 1097953279 | VI     | VIR    | 维京群岛 | 美国   |
| 1101625344 | 1101625407 | VI     | VIR    | 维京群岛 | 美国   |
| 1101971072 | 1101971079 | VI     | VIR    | 维京群岛 | 美国   |
| 1113864768 | 1113864783 | VI     | VIR    | 维京群岛 | 美国   |
| 1119428608 | 1119432703 | VI     | VIR    | 维京群岛 | 美国   |
| 1123590144 | 1123594239 | VI     | VIR    | 维京群岛 | 美国   |
+------------+------------+--------+--------+----------+--------+

可见,MySQL load data infile 指令,可以实现不同字符集之间的转换。

你可能感兴趣的:(import)