sqoop的安装与使用

Sqoop是一个用于在hadoop与mysql之间传输数据的工具

Sqoop 环境搭建
(1)上传安装包:sqoop-1.4.6-cdh5.14.2.tar.gz到/opt/software
(2)解压安装包:tar -zxf sqoop-1.4.6-cdh5.14.2.tar.gz -C /opt/install/
(3)创建软连接:ln -s /opt/install/sqoop-1.4.6-cdh5.14.2/ /opt/install/sqoop
(4)配置环境变量:vi /etc/profile
   export SQOOP_HOME=/opt/install/sqoop
   export PATH=$SQOOP_HOME/bin:$PATH
(5)让配置文件生效:source /etc/profile
(6)切换到sqoop根目录下的conf目录,复制并改名配置文件:cp sqoop-env-template.sh sqoop-env.sh
(7)修改配置文件sqoop-env.sh,在文件末尾追加以下内容:
export HADOOP_COMMON_HOME=/opt/install/hadoop
export HADOOP_MAPRED_HOME=/opt/install/hadoop
export HIVE_HOME=/opt/install/hive
export ZOOCFGDIR=/opt/install/zookeeper
export HBASE_HOME=/opt/install/hbase
(8)复制以下文件到 sqoop 的 lib 目录下
   mysql-connector-java-5.1.27-bin.jar
   java-json.jar
   hive-common-1.1.0-cdh5.14.2.jar
   hive-exec-1.1.0-cdh5.14.2.jar
(9)验证 sqoop 配置是否正确:sqoop help
(10)测试 Sqoop 是否能够成功连接数据库:
    sqoop list-databases --connect jdbc:mysql://hadoop101:3306/ --username root --password 123
(11)做快照

# 连接数据库获取可用的数据库名称
sqoop list-databases \
--connect jdbc:mysql://hadoop101:3306 \
--username root \
--password 123

# 连接数据库获取指定数据库中的所有数据表
sqoop list-tables \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123

# 从mysql导入指定表中的全部数据到hdfs
# 启动各种服务
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
# 执行导入命令
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123 \
--table customers \
--target-dir /data/retail_db/customers \
--num-mappers 1

# 从mysql导入指定表中带条件的数据到hdfs
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123 \
--table orders \
--where 'order_id<500' \
--delete-target-dir \
--target-dir /data/retail_db/orders \
--num-mappers 1
# 查看数据
hdfs dfs -cat /data/retail_db/orders/*

# 从mysql导入指定表中字段且带条件的数据到hdfs
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123 \
--table orders \
--where 'order_id<500' \
--columns order_id,order_date,order_customer_id \
--delete-target-dir \
--target-dir /data/retail_db/orders \
--num-mappers 1
# 查看数据
hdfs dfs -cat /data/retail_db/orders/*

# 从mysql导入指定查询语句的数据到hdfs【注:单双引号的区别,必须有where且以and $CONDITIONS结尾】
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123 \
--query 'select * from orders where order_status!="CLOSED" and $CONDITIONS' \
--delete-target-dir \
--target-dir /data/retail_db/orders \
--num-mappers 1

sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123 \
--query "select * from orders where order_status!='CLOSED' and \$CONDITIONS" \
--delete-target-dir \
--target-dir /data/retail_db/orders \
--num-mappers 3 \
--split-by order_id
# 查看数据
hdfs dfs -cat /data/retail_db/orders/*

# 增量导入
# 在mysql中建表
use test;
create table student
(
  id int,
  name varchar(20),
  sex varchar(20)
); 
insert into student values(1,'tom','male'),(2,'jack','male');
select * from student;
# 在sqoop中导入
# 第一次全量导入
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/test \
--username root \
--password 123 \
--table student \
--target-dir /data/retail_db/student \
--delete-target-dir \
--num-mappers 1
# 查看数据
hdfs dfs -cat /data/retail_db/student/*
# 结果
1,tom,male
2,jack,male
# 在mysql中增加数据
insert into student values(3,'tim','male'),(4,'jim','male');
select * from student;
# 在sqoop中第二次增量导入【其中last-value是大于的关系】
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/test \
--username root \
--password 123 \
--table student \
--target-dir /data/retail_db/student \
--incremental append \
--check-column id \
--last-value 2 \
--num-mappers 1
# 查看数据
hdfs dfs -cat /data/retail_db/student/*
# 结果
# 增加数据
insert into student values(5,'tim','male'),(6,'jim','male');
# PPT 演示
# 第一次全量导入
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123 \
--query "select * from orders where order_date between '2013-07-01' and '2014-04-15' and \$CONDITIONS" \
--delete-target-dir \
--target-dir /data/retail_db/orders \
--num-mappers 3 \
--split-by order_id
# 第二次增量导入
sqoop import \
--driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://hadoop101:3306/retail_db \
--username root \
--password 123 \
--table orders \
--incremental append \
--check-column order_date \
--last-value 2014-04-15 \
--target-dir /data/retail_db/orders \
--num-mappers 3 \
--split-by order_id

你可能感兴趣的:(sqoop,hadoop,hive)