Apache Sqoop™是一种旨在有效地在Apache Hadoop和诸如关系数据库等结构化数据存储之间传输大量数据的工具。
Sqoop于2012年3月孵化出来,现在是一个顶级的Apache项目。
[root@node1 module]# tar -zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/software/
[root@node1 module]# cd /opt/software/
[root@node1 software]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop-1.4.6
export SQOOP_HOME=/opt/software/sqoop-1.4.6
export PATH=$PATH:$SQOOP_HOME/bin:
生效
source /etc/profile
[root@node1 sqoop-1.4.6]# cp conf/sqoop-env-template.sh conf/sqoop-env.sh
[root@node1 sqoop-1.4.6]# vi conf/sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/software/hadoop-2.7.0
export HADOOP_MAPRED_HOME=/opt/software/hadoop-2.7.0
export HIVE_HOME=/opt/software/hive
[root@node1 sqoop-1.4.6]# cp /opt/module/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ./lib
hive和mysql的安装见我的另一篇博客
大数据Hadoop学习(7)-Hive安装
将导入或导出命令翻译成mapreduce程序来实现。
mysql到hdfs
mysql> create database userdb;
mysql> use userdb;
DROP TABLE IF EXISTS `emp`;
CREATE TABLE `emp` (
`id` int(11) NOT NULL,
`name` varchar(100) DEFAULT NULL,
`deg` varchar(100) DEFAULT NULL,
`salary` int(11) DEFAULT NULL,
`dept` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `emp` VALUES ('1201', 'gopal', 'manager', '50000', 'TP');
INSERT INTO `emp` VALUES ('1202', 'manisha', 'Proof reader', '50000', 'TP');
INSERT INTO `emp` VALUES ('1203', 'khalil', 'php dev', '30000', 'AC');
INSERT INTO `emp` VALUES ('1204', 'prasanth', 'php dev', '30000', 'AC');
INSERT INTO `emp` VALUES ('1205', 'kranthi', 'admin', '20000', 'TP');
[root@node1 ~]# cd /opt/software/hadoop-2.7.0/
[root@node1 hadoop-2.7.0]# sbin/start-dfs.sh
[root@node2 ~]# cd /opt/software/hadoop-2.7.0/
[root@node2 hadoop-2.7.0]# sbin/start-yarn.sh
将表emp的数据导入到HDFS文件系统
sqoop import \
--connect jdbc:mysql://node1:3306/userdb \
--username root --password 123456 \
--target-dir /sqoopresult \
--table emp \
--num-mappers 1
如果报错
20/05/15 01:56:16 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /tmp/hadoop-yarn/staging/root/.staging/job_1589478333948_0001.Name node is in safe mode.
退出安全模式
[root@node1 hadoop-2.7.0]# bin/hdfs dfsadmin -safemode leave
[root@node1 hadoop-2.7.0]# bin/hdfs dfs -cat /sqoopresult/part-m-00000
1201,gopal,manager,50000,TP
1202,manisha,Proof reader,50000,TP
1203,khalil,php dev,30000,AC
1204,prasanth,php dev,30000,AC
1205,kranthi,admin,20000,TP
hive建表
[root@node1 hive]# bin/hive
create table staff_hive(
id int,
name string,
deg string,
salary int,
dept string
)
row format delimited fields terminated by "\t";
导入
hive> quit;
[root@node1 hive]# sqoop import \
--connect jdbc:mysql://node1:3306/userdb \
--username root --password 123456 \
--table emp \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table staff_hive
提示:该过程分为两步,第一步将数据导入到HDFS,第二步将导入到HDFS的数据迁移到Hive仓库
查看
[root@node1 hive]# hdfs dfs -ls /user/hive/warehouse
Found 2 items
drwxrwxr-x - root supergroup 0 2020-05-15 02:26 /user/hive/warehouse/staff_hive
在Sqoop中,“导出”概念指:从大数据集群(HDFS,HIVE,HBASE)向非大数据集群(RDBMS)中传输数据,叫做:导出,即使用export关键字。
sqoop export \
--connect jdbc:mysql://node1:3306/userdb \
--username root --password 123456 \
--table staff \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"
提示:Mysql中如果表不存在,不会自动创建。所以需要先在mysql中创建表
mysql> use userdb;
mysql> CREATE TABLE `staff` (
`id` int(11) NOT NULL,
`name` varchar(100) DEFAULT NULL,
`deg` varchar(100) DEFAULT NULL,
`salary` int(11) DEFAULT NULL,
`dept` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
);
mysql> \q
查看
[root@node1 hive]# mysql -uroot -p123456 -e "select * from userdb.staff"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------+----------+--------------+--------+------+
| id | name | deg | salary | dept |
+------+----------+--------------+--------+------+
| 1201 | gopal | manager | 50000 | TP |
| 1202 | manisha | Proof reader | 50000 | TP |
| 1203 | khalil | php dev | 30000 | AC |
| 1204 | prasanth | php dev | 30000 | AC |
| 1205 | kranthi | admin | 20000 | TP |
+------+----------+--------------+--------+------+
使用opt格式的文件打包sqoop命令,然后执行
1)创建一个.opt文件
mkdir opt
touch opt/job_HDFS2RDBMS.opt
2)编写sqoop脚本
$ vi opt/job_HDFS2RDBMS.opt
内容
sqoop export \
--connect jdbc:mysql://hadoop1:3306/test \
--username root \
--password 123456 \
--table staff \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"
3)执行该脚本
$ bin/sqoop --options-file opt/job_HDFS2RDBMS.opt