1、开启Zookeeper
2、开启集群服务
3、配置文件:sqoop-env.sh,如下:
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/modules/cdh5.3.6/hadoop-2.5.0-cdh5.3.6
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/modules/cdh5.3.6/hadoop-2.5.0-cdh5.3.6
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/opt/modules/cdh5.3.6/hive-0.13.1-cdh5.3.6
export ZOOKEEPER_HOME=/opt/modules/cdh5.3.6/zookeeper-3.4.5-cdh5.3.6
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/modules/cdh5.3.6/zookeeper-3.4.5-cdh5.3.6
4、拷贝jdbc驱动到sqoop的lib目录下
进入到mysql-connector-java-5.1.27文件夹下,输入命令:
cp -a mysql-connector-java-5.1.27-bin.jar /opt/modules/cdh5.3.6/sqoop-1.4.5-cdh5.3.6/lib/
5、启动Sqoop
(1)查看帮助
bin/sqoop help
(2)测试Sqoop是否能够连接成功
bin/sqoop list-databases --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/metastore --username root --password ******
从Hadoop集群的外面,往HDFS导数据称为导入 ;从HDFS往外面导数据称为导出。
1、import:使用Sqoop将mysql中的数据导入到HDFS(RDBMS --> HDFS)
(1)全部导入
bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table edu_score --target-dir /user/company --delete-target-dir --fields-terminated-by "\t"
(2)查询导入(query)
注意:query内容的末尾一定要加上 $CONDITIONS,意思是把查询的结果带出来导入到HDFS中
bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --target-dir /user/company --delete-target-dir --num-mappers 1 --fields-terminated-by "\t" --query 'select course_no from edu_score where stu_id = 1001 and $CONDITIONS;'
(3)导入指定列(columns)
bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --target-dir /user/company --delete-target-dir --num-mappers 1 --fields-terminated-by "\t" --columns stu_id, course_no --table edu_score
(4)使用sqoop关键字筛选查询导入数据,本质是将(2)中的query逻辑拆开
bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --target-dir /user/company --delete-target-dir --num-mappers 1 --fields-terminated-by "\t" --table edu_score --where "stu_id = 1001"
2、import:使用Sqoop将mysql中的数据导入到Hive(RDBMS --> Hive)
注意:可以不用前置在Hive中建表,直接导入即可,Sqoop会自行在Hive中建表
bin/sqoop import --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table edu_student --num-mappers 1 --hive-import --fields-terminated-by "\t" --hive-overwrite --hive-table company.edu_student
3、export:使用Sqoop将Hive/HDFS的数据导入到MySQL (Hive/HDFS–> RDBMS)
(1)在Mysql中创建一张表
import不需要创建,但是export必须要创建表
(2)执行语句
bin/sqoop export --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table dept --num-mappers 1 --export-dir /user/hive/warehouse/company.db/dept --input-fields-terminated-by "\t"
方式:使用opt文件打包sqoop命令,然后执行
1、创建一个.opt文件
例如在opt目录下创建一个job_hffs2rdbms.opt的文件
2、编写sqoop脚本
export --connect jdbc:mysql://hadoop-senior01.halearn.cn:3306/company --username root --password ****** --table dept --num-mappers 1 --export-dir /user/hive/warehouse/company.db/dept --input-fields-terminated-by "\t"
3、执行该脚本
bin/sqoop --options-file opt/job_hffs2rdbms.opt