Sqoop安装、配置和操作详解

一、安装和配置Sqoop

1、开启Zookeeper
2、开启集群服务
3、配置文件:sqoop-env.sh,如下:

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/modules/cdh5.3.6/hadoop-2.5.0-cdh5.3.6

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/modules/cdh5.3.6/hadoop-2.5.0-cdh5.3.6

#set the path to where bin/hbase is available
#export HBASE_HOME=

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/modules/cdh5.3.6/hive-0.13.1-cdh5.3.6

export ZOOKEEPER_HOME=/opt/modules/cdh5.3.6/zookeeper-3.4.5-cdh5.3.6
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/modules/cdh5.3.6/zookeeper-3.4.5-cdh5.3.6

4、拷贝jdbc驱动到sqoop的lib目录下
进入到mysql-connector-java-5.1.27文件夹下,输入命令:

cp -a mysql-connector-java-5.1.27-bin.jar /opt/modules/cdh5.3.6/sqoop-1.4.5-cdh5.3.6/lib/

5、启动Sqoop
(1)查看帮助

bin/sqoop help

(2)测试Sqoop是否能够连接成功

bin/sqoop list-databases --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/metastore --username root --password ******

二、导入和导出

从Hadoop集群的外面,往HDFS导数据称为导入 ;从HDFS往外面导数据称为导出。

1、import:使用Sqoop将mysql中的数据导入到HDFS(RDBMS --> HDFS)

(1)全部导入

bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table edu_score --target-dir /user/company --delete-target-dir --fields-terminated-by "\t"

(2)查询导入(query)
注意:query内容的末尾一定要加上 $CONDITIONS,意思是把查询的结果带出来导入到HDFS中

bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --target-dir /user/company --delete-target-dir --num-mappers 1 --fields-terminated-by "\t" --query 'select course_no from edu_score where stu_id = 1001 and $CONDITIONS;'

(3)导入指定列(columns)

bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --target-dir /user/company --delete-target-dir --num-mappers 1 --fields-terminated-by "\t" --columns stu_id, course_no --table edu_score

(4)使用sqoop关键字筛选查询导入数据,本质是将(2)中的query逻辑拆开

bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --target-dir /user/company --delete-target-dir --num-mappers 1 --fields-terminated-by "\t" --table edu_score --where "stu_id = 1001"

2、import:使用Sqoop将mysql中的数据导入到Hive(RDBMS --> Hive)
注意:可以不用前置在Hive中建表,直接导入即可,Sqoop会自行在Hive中建表

bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table edu_student --num-mappers 1 --hive-import --fields-terminated-by "\t" --hive-overwrite --hive-table company.edu_student

3、export:使用Sqoop将Hive/HDFS的数据导入到MySQL (Hive/HDFS–> RDBMS)

(1)在Mysql中创建一张表
import不需要创建,但是export必须要创建表
(2)执行语句

bin/sqoop export --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table dept --num-mappers 1 --export-dir /user/hive/warehouse/company.db/dept --input-fields-terminated-by "\t" 

三、通过编写Sqoop脚本文件去执行Sqoop

方式:使用opt文件打包sqoop命令,然后执行

1、创建一个.opt文件
例如在opt目录下创建一个job_hffs2rdbms.opt的文件

2、编写sqoop脚本

export --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table dept --num-mappers 1 --export-dir /user/hive/warehouse/company.db/dept --input-fields-terminated-by "\t"

3、执行该脚本

bin/sqoop --options-file opt/job_hffs2rdbms.opt

你可能感兴趣的:(hive,Sqoop,hadoop)