如在文章《Trafodion 数据加载介绍》中提到,Trafodion的Bulk Load与Tricke Load不同,Bulk Load主要是针对大数据量的且一般是批量装载的方式。Bulk Load通常使用LOAD语句实现。
下面介绍几种Bulk Load方式,
(1)创建两个表结构相同的测试表
SQL>showddl test_tbl;
CREATE TABLE TRAFODION.SEABASE.TEST_TBL
(
ID INT NO DEFAULT NOT NULL NOT DROPPABLE
SERIALIZED
, NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL SERIALIZED
)
;
--- SQL operation complete.
SQL>showddl test_tbl2;
CREATE TABLE TRAFODION.SEABASE.TEST_TBL2
(
ID INT NO DEFAULT NOT NULL NOT DROPPABLE
SERIALIZED
, NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL SERIALIZED
)
;
--- SQL operation complete.
SQL>select * from test_tbl;
ID NAME
----------- ----------
1 LIU
2 zhang
3 LIU
4 wu
5 feng
--- 5 row(s) selected.
SQL>select * from test_tbl2;
--- 0 row(s) selected.
SQL>load into test_tbl2 select * from test_tbl;
UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_TBL2
Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_TBL2
Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_TBL2
Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_TBL2
Rows Processed: 5
Task: PREPARATION Status: Ended ET: 00:00:00.263
Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_TBL2
Task: COMPLETION Status: Ended ET: 00:00:00.531
--- SQL operation complete.
SQL>select * from test_tbl2;
ID NAME
----------- ----------
1 LIU
2 zhang
3 LIU
4 wu
5 feng
--- 5 row(s) selected.
(1)创建测试文件并利用scp或者其他方式上传到Hadoop Cluster上,内容如下
[centos@cent-1 tmp]$ cat test_table
Lucy,20,F
Jason,22,M
Stephen,21,M
Lily,19,F
Vincent,25,M
(2)将测试文件导入HDFS中
[hdfs@cent-1 ~]$ hadoop fs -mkdir -p /hive/test/test_tbl
[hdfs@cent-1 ~]$ hadoop fs -copyFromLocal /tmp/test_table /hive/test/test_tbl
[hdfs@cent-1 ~]$ hadoop fs -ls /hive/test/test_tbl
Found 1 items
-rw-r--r-- 3 hdfs supergroup 57 2016-09-28 02:31 /hive/test/test_tbl/test_table
(3)在Trafodion中创建测试表
SQL>create table test_table ( name varchar(10), age int, sex varchar(1));
--- SQL operation complete.
SQL>showddl test_table;
CREATE TABLE TRAFODION.SEABASE.TEST_TABLE
(
NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL SERIALIZED
, AGE INT DEFAULT NULL SERIALIZED
, SEX VARCHAR(1) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL SERIALIZED
)
;
--- SQL operation complete.
(4)在Hive中创建Hive外部表,指向前面对应的HDFS文件
[hdfs@cent-1 ~]$ hive
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/hive-common-1.1.0-cdh5.4.8.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> create external table test_tbl (name string,age int,sex string) row format delimited fields terminated by ',' location '/hive/test/test_tbl';
OK
Time taken: 0.085 seconds
hive> show tables;
OK
test_tbl
Time taken: 0.364 seconds, Fetched: 1 row(s)
(5)加载数据到Trafodion表
SQL>load into test_table
+>select * from hive.hive.test_tbl;
UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_TABLE
Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Rows Processed: 5
Task: PREPARATION Status: Ended ET: 00:00:01.272
Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Task: COMPLETION Status: Ended ET: 00:00:00.254
--- SQL operation complete.
SQL>select * from test_table;
NAME AGE SEX
---------- ----------- ----
Lucy 20 F
Jason 22 M
Stephen 21 M
Lily 19 F
Vincent 25 M
--- 5 row(s) selected.
(1)在Hive中创建测试表并插入几条数据
hive> describe hive_tbl;
OK
name string
age int
sex string
Time taken: 0.107 seconds, Fetched: 3 row(s)
hive> select * from hive_tbl;
OK
Lucy 20 F
Jason 22 M
Stephen 21 M
Time taken: 0.093 seconds, Fetched: 3 row(s)
(2)Trafodion中的表继续沿用上述测试表,先清空数据
SQL>purgedata test_table;
--- SQL operation complete.
SQL>showddl test_table;
CREATE TABLE TRAFODION.SEABASE.TEST_TABLE
(
NAME VARCHAR(10) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL SERIALIZED
, AGE INT DEFAULT NULL SERIALIZED
, SEX VARCHAR(1) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL SERIALIZED
)
;
--- SQL operation complete.
(注:在做从hive到trafodion数据加载的时候有几个参数可以配置,用来提升加载性能,如HIVE_MAX_STRING_LENGTH、ALLOW_INCOMPATIBLE_ASSIGNMENT
典型的用法如下:
CQD HIVE_MAX_STRING_LENGTH '1000'; --用于当列的最大长度为1KB时
CQD ALLOW_INCOMPATIBLE_ASSIGNMENT 'ON'; --用于当有时间相关的列的时候)
SQL>load into test_table select * from hive.hive.hive_tbl;
UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TEST_TABLE
Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Rows Processed: 3
Task: PREPARATION Status: Ended ET: 00:00:00.390
Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TEST_TABLE
Task: COMPLETION Status: Ended ET: 00:00:00.447
--- SQL operation complete.
SQL>select * from test_table;
NAME AGE SEX
---------- ----------- ----
Lucy 20 F
Jason 22 M
Stephen 21 M
--- 3 row(s) selected.
(1)在集群上安装并启动Sqoop,Sqoop安装步骤请参照其官方文档 点击打开链接
(2)下载mysql connector Jar包,并放到/var/lib/sqoop目录下
(3)在某台机器上安装Mysql,安装Mysql步骤此处不作详细说明
(4)在mysql数据中创建测试表,并插入几条数据
mysql> use test;
Database changed
mysql> create table test_tbl(name varchar(10), age int, sex char(1));
Query OK, 0 rows affected (0.00 sec)
mysql> select * from test;
ERROR 1146 (42S02): Table 'test.test' doesn't exist
mysql> select * from test_tbl;
Empty set (0.00 sec)
mysql> insert into test_tbl values('Wang wei', 30, 'M');
Query OK, 1 row affected (0.00 sec)
mysql> insert into test_tbl values('Li ping', 29, 'F');
Query OK, 1 row affected (0.00 sec)
mysql>
mysql>
mysql> select * from test_tbl;
+----------+------+------+
| name | age | sex |
+----------+------+------+
| Wang wei | 30 | M |
| Li ping | 29 | F |
+----------+------+------+
2 rows in set (0.00 sec)
(5)使用sqoop命令从mysql往hive导入数据
[hdfs@cent-2 ~]$ sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username centos --table test_tbl --split-by name --hive-import --create-hive-table --hive-table test_tabl_hive
Warning: /opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
...
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/hive-common-1.1.0-cdh5.4.8.jar!/hive-log4j.properties
OK
Time taken: 3.052 seconds
Loading data to table default.test_tabl_hive
Table default.test_tabl_hive stats: [numFiles=4, totalSize=27]
OK
Time taken: 0.748 seconds
(6)从Trafodion或Hive Shell中查看数据是否导入成功
SQL>select * from hive.hive.test_tabl_hive;
NAME AGE SEX
-------------------------------------------------------------------------------------------------------------------------------- ----------- --------------------------------------------------------------------------------------------------------------------------------
Li ping 29 F
Wang wei 30 M
--- 2 row(s) selected.