1、sqoop导入单表数据:
a)创建表b_top_item_test1:
CREATE TABLE b_top_item_test1(
num_iid varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(4000),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100));
b)sqoop脚本如下:
export --connect "jdbc:mysql://xxxx.drds.aliyuncs.com/ccms_anta?useUnicode=true&characterEncoding=utf-8" --username ccms_anta --password xxxx --table b_top_item_test1 --export-dir /user/admin/b_top_item_test2 --input-fields-terminated-by "\001" -m 10 --lines-terminated-by "\n"
c)通过sqoop可以正常将hdfs文件中数据导入到drds中(300万数据导入时间是328秒),执行结果如下:
2、sqoop导入分库分区表
a)创建分库分表的表:
CREATE TABLE b_top_item_test2(
num_iid varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(4000),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100))
ENGINE=InnoDB DEFAULT CHARSET=utf8 dbpartition by HASH(num_iid)
tbpartition by HASH (num_iid) tbpartitions 2;
b)报错:
结论:sqoop只适合将hdfs中数据导入到DRDS的单表中,不适合导入到分库分表的表中。可以通过DRDS控制台,将单表数据转移到分库分表的表中。
首先,创建通过range_hase(col_1,col_2,N)分库分表的表:
b_top_item_test5表是基于range_hash方式分库分表的表,建表语句如下:
CREATE TABLE b_top_item_test5(
num_iid varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(4000),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100))
ENGINE=InnoDB DEFAULT CHARSET=utf8 dbpartition by RANGE_HASH(num_iid,dp_id,5)
tbpartition by RANGE_HASH (num_iid,dp_id,5) tbpartitions 2;
1、指定num_iid,dp_id,向b_top_item_test5表中插入一条记录,分别根据num_iid和dp_id字段,查询结果如下:
insert into b_top_item_test5 values ('9842703120a','63784415a','1','','','','','','','','','');
SELECT * from b_top_item_test5 where num_iid='9842703120a';
select * from b_top_item_test5 where dp_id='63784415a';
2、指定num_iid,dp_id(num_iid, dp_id均与步骤1相同),向b_top_item_test5表中插入一条记录,分别根据num_iid和dp_id字段,都可以查询出此次插入的记录,查询结果如下:
insert into b_top_item_test5 values ('9842703120a','63784415a','2','','','','','','','','','');
SELECT * from b_top_item_test5 where num_iid='9842703120a';
select * from b_top_item_test5 where dp_id='63784415a';
3、指定num_iid,dp_id(num_iid与步骤1不同, dp_id与步骤1相同),向b_top_item_test5表中插入一条记录,通过num_iid可以查询出此次插入的记录,但是通过dp_id无法查询出此次插入记录,查询结果如下:
insert into b_top_item_test5 values ('9842703120b','63784415a','3','','','','','','','','','');
select * from b_top_item_test5 where num_iid='9842703120b';
select * from b_top_item_test5 where dp_id='63784415a';
4、指定num_iid,dp_id(num_iid与步骤1相同, dp_id与步骤1不同),向b_top_item_test5表中插入一条记录,通过num_iid可以查询出此次插入的记录,但是通过dp_id无法查询出此次插入记录,查询结果如下:
insert into b_top_item_test5 values ('9842703120a','63784415b','4','','','','','','','','','');
select * from b_top_item_test5 where num_iid='9842703120a';
select * from b_top_item_test5 where dp_id='63784415b';
总论:range_hash(col_1,col_2,N)这种分库分表方式,新插入数据时,两个拆分键的后 N 位必须与已有数据确保一致;倘若不一致,将出现通过其中一个拆分键无法正常查询出新插入的数据。