Sqoop安装&导入导出

Sqoop

概念

数据从传统数据库和Hadoop之间进行导入导出,底层就是使用MapReduce来实现的,不过只有Map没有Reduce,因为不需要进行聚合操作

使用场景

1)数据数据在RDBMS中,你想使用Hive进行处理
2)使用Hive统计分析好了,数据还在Hive中,如何导到mysql中,并将
统计结果最终是通过报表可视化展示的

  • 解决方案
    可以使用mapreduce进行输出文件,但是编写很麻烦,所以出现Sqoop这种工具

  • 需要的参数
    RDBMS:url、driver、db、table、user、password
    HDFS:path
    Hive:databse、table、partition

版本

1.4.* Sqoop1 ***
1.99.* Sqoop2
主流使用1.4.x版本

参照点

基于Hadoop作为参考点/基准点

  • 导入: import
    RDBMS ==> Hadoop
  • 导出: export
    Hadoop ==> RDBMS

安装部署

下载
wget http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.16.2.tar.gz

移动到sofaware目录下
[root@JD ~]# mv sqoop-1.4.6-cdh5.16.2.tar.gz /home/hadoop/sofaware/

切换hadoop用户
[root@JD software]# su - hadoop

解压
[hadoop@JD software]$ tar -zxvf sqoop-1.4.6-cdh5.16.2.tar.gz -C ~/app/

解压完后需要向Sqoop的lib目录下添加jar包
mysql-connector-java-5.1.27-bin.jar
java-json.jar
hive-common-1.1.0-cdh5.16.2.jar
hive-exec-1.1.0-cdh5.16.2.jar


配置环境变量
[hadoop@JD ~]$ vi ~/.bashrc
export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.16.2
export PATH=$SQOOP_HOME/bin:$PATH

加载环境变量
[hadoop@JD ~]$ source ~/.bashrc

进入Sqoop的conf目录下,复制sqoop-env-template.sh为sqoop-env.sh
[hadoop@JD conf]$ cp sqoop-env-template.sh sqoop-env.sh 

配置增加hadoop和hive的路径
[hadoop@JD conf]$ vi sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.16.2

将mysql驱动包复制到lib目录下
[hadoop@JD lib]$ cp mysql-connector-java-5.1.27-bin.jar /home/hadoop/app/sqoop-1.4.6-cdh5.16.2/lib


简单使用

查看命令帮助
[hadoop@JD sqoop-1.4.6-cdh5.16.2]$ sqoop help
usage: sqoop COMMAND [ARGS]
Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information
  
  
  查看某一个命令的使用
  [hadoop@JD sqoop-1.4.6-cdh5.16.2]$ sqoop help list-databases
  
  查看mysql中有多少个数据库
  [hadoop@JD sqoop-1.4.6-cdh5.16.2]$ sqoop list-databases --connect jdbc:mysql://JD:3306 --username root --password xxx
  information_schema
  bigdata
  mysql
  performance_schema
  ruozedata_erp
  ruozedata_hive
  test
 
查看mysql中某一数据库有多少张表
[hadoop@JD sqoop-1.4.6-cdh5.16.2]$ sqoop list-tables --connect jdbc:mysql://JD:3306/bigdata --username root --password xxx
dept
emp
sal
salgrade
testdata


数据导入(表的数据导入到HDFS)

参数介绍
–connect mysql数据库地址
–username 数据库用户名
–password 数据库密码
–table 表名
–delete-target-dir 导入前先删除HDFS上的文件夹
–mapreduce-job-name mr任务的名称
–columns 指定mysql的字段
–target-dir 指定存储在HDFS的路径
–fields-terminated-by 指定分割标识,默认是,
–null-string 指定字符串型的null为默认值
–null-non-string 指定非字符串的null为默认值
-m 指定mr的任务数,默认为4个
–query 指定sql语句进行查询

–incremental append
–check-column EMPNO
–last-value 7788
按照EMPNO字段在7788以后进行数据追加

sqoop --options-file sqoop执行文件命令

1、在数据库的emp表导出字段EMPNO,ENAME,JOB,SAL,COMM,并按\t进行分割,如果字符串为null则为‘’,如果非字符串为null则为0,运行一个task,任务名称为FromMySQL2HDFS
sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin --username root \
--table emp  \
--delete-target-dir --mapreduce-job-name FromMySQL2HDFS \
--columns "EMPNO,ENAME,JOB,SAL,COMM" \
--target-dir EMP_COLUMN_QUERY \
--fields-terminated-by '\t' \
--null-string '' \
--null-non-string '0' \
-m 1

查看导入到hdfs的文件
[hadoop@JD bin]$ hadoop fs -cat /user/hadoop/EMP_COLUMN_QUERY/part*

7369    SMITH   CLERK   800.00  0
7499    ALLEN   SALESMAN        1600.00 300.00
7521    WARD    SALESMAN        1250.00 500.00
7566    JONES   MANAGER 2975.00 0
7654    MARTIN  SALESMAN        1250.00 1400.00
7698    BLAKE   MANAGER 2850.00 0
7782    CLARK   MANAGER 2450.00 0
7788    SCOTT   ANALYST 3000.00 0
7839    KING    PRESIDENT       5000.00 0
7844    TURNER  SALESMAN        1500.00 0.00
7876    ADAMS   CLERK   1100.00 0
7902    FORD    ANALYST 3000.00 0


2、自己写sql语句导入到hdfs时,在sql语句中需加入
$CONDITIONS,并且sql语句需要用''进行修饰

sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin --username root \
--delete-target-dir --mapreduce-job-name FromMySQL2HDFS \
--target-dir JOIN \
--fields-terminated-by '\t' \
--null-string '' \
--null-non-string '0' \
--query 'select e.empno,e.ename,e.deptno,d.dname from emp e join dept d on e.deptno=d.deptno and $CONDITIONS' \
-m 1

查看数据
[hadoop@JD bin]$ hadoop fs -cat JOIN/part*
7369    SMITH   20      RESEARCH
7499    ALLEN   30      SALES
7521    WARD    30      SALES
7566    JONES   20      RESEARCH
7654    MARTIN  30      SALES
7698    BLAKE   30      SALES
7782    CLARK   10      ACCOUNTING
7788    SCOTT   20      RESEARCH
7839    KING    10      ACCOUNTING
7844    TURNER  30      SALES
7876    ADAMS   20      RESEARCH
7902    FORD    20      RESEARCH

3、向hdfs中的emp中追加数据
sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin --username root \
--table emp  \
--mapreduce-job-name FromMySQL2HDFS \
--target-dir EMP_APPEND \
--fields-terminated-by '\t' \
--null-string '' \
--incremental append \
--check-column EMPNO \
--last-value 7788 \
--null-non-string '0' \
-m 1

查看数据
[hadoop@JD bin]$ hadoop fs -cat emp/part*
7839    KING    PRESIDENT       0       1981-11-17 00:00:00.0   5000.00 0       10
7844    TURNER  SALESMAN        7698    1981-09-08 00:00:00.0   1500.00 0.00    30
7876    ADAMS   CLERK   7788    1983-01-12 00:00:00.0   1100.00 0       20
7902    FORD    ANALYST 7566    1981-12-03 00:00:00.0   3000.00 0       20


4、对于没有主键的表来说,运行导入数据到hdfs中时,要指定map任务是按照什么字段来分配

sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--delete-target-dir \
--table salgrade \
--split-by 'GRADE'

[hadoop@JD bin]$ hadoop fs -ls salgrade/
-rw-r--r--   1 hadoop supergroup          0 2019-12-26 23:09 salgrade/_SUCCESS
-rw-r--r--   1 hadoop supergroup         11 2019-12-26 23:09 salgrade/part-m-00000
-rw-r--r--   1 hadoop supergroup         12 2019-12-26 23:09 salgrade/part-m-00001
-rw-r--r--   1 hadoop supergroup         12 2019-12-26 23:09 salgrade/part-m-00002
-rw-r--r--   1 hadoop supergroup         24 2019-12-26 23:09 salgrade/part-m-00003



5、mysql原始数据
mysql> select * from dept;
+--------+------------+----------+
| deptno | dname      | loc      |
+--------+------------+----------+
|     10 | ACCOUNTING | NEW YORK |
|     20 | RESEARCH   | DALLAS   |
|     30 | SALES      | CHICAGO  |
|     40 | OPERATIONS | BOSTON   |
+--------+------------+----------+
4 rows in set (0.00 sec)

使用sqoop向mysql某表中插入一条数据
[hadoop@JD bin]$ sqoop eval --connect jdbc:mysql://JD:3306/bigdata --password mysqladmin --username root --query "insert into dept values (60,'RD', 'BEIJING')"

查看mysql的数据
mysql> select * from dept;
+--------+------------+----------+
| deptno | dname      | loc      |
+--------+------------+----------+
|     10 | ACCOUNTING | NEW YORK |
|     20 | RESEARCH   | DALLAS   |
|     30 | SALES      | CHICAGO  |
|     40 | OPERATIONS | BOSTON   |
|     60 | RD         | BEIJING  |
+--------+------------+----------+
5 rows in set (0.01 sec)

执行文件

建立文件emp.opt内容如下
import
--connect
jdbc:mysql://ruozedata001:3306/bigdata
--password
mysqladmin
--username
root
--target-dir
EMP_OPTIONS_FILE2
--delete-target-dir
--table
emp
-m
2

执行
[hadoop@JD bin]$ sqoop --options-file ./emp.opt

HDFS导出到mysql

导出前需要自己创建表,如果表不存在,不会自动创建
sqoop export 导出
Dsqoop.export.records.per.statement 批量导出,每隔xx条提交一次
–export-dir 导出的文件夹
–columns 导出的字段
–fields-terminated-by 指定导出数据的分隔符
-m map任务的个数

进入mysql创建emp_demo表
mysql> create table emp_demo as select * from emp where 1=2

导入数据
sqoop export \
-Dsqoop.export.records.per.statement=10 \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table emp_demo \
--export-dir /user/hadoop/EMP_COLUMN_QUERY \
--columns "EMPNO,ENAME,JOB,SAL,COMM" \
--fields-terminated-by '\t' \
-m 1

在mysql查询是否导入成功
mysql> select * from emp_demo;
+-------+--------+-----------+------+----------+---------+---------+--------+
| empno | ename  | job       | mgr  | hiredate | sal     | comm    | deptno |
+-------+--------+-----------+------+----------+---------+---------+--------+
|  7369 | SMITH  | CLERK     | NULL | NULL     |  800.00 |    0.00 |   NULL |
|  7499 | ALLEN  | SALESMAN  | NULL | NULL     | 1600.00 |  300.00 |   NULL |
|  7521 | WARD   | SALESMAN  | NULL | NULL     | 1250.00 |  500.00 |   NULL |
|  7566 | JONES  | MANAGER   | NULL | NULL     | 2975.00 |    0.00 |   NULL |
|  7654 | MARTIN | SALESMAN  | NULL | NULL     | 1250.00 | 1400.00 |   NULL |
|  7698 | BLAKE  | MANAGER   | NULL | NULL     | 2850.00 |    0.00 |   NULL |
|  7782 | CLARK  | MANAGER   | NULL | NULL     | 2450.00 |    0.00 |   NULL |
|  7788 | SCOTT  | ANALYST   | NULL | NULL     | 3000.00 |    0.00 |   NULL |
|  7839 | KING   | PRESIDENT | NULL | NULL     | 5000.00 |    0.00 |   NULL |
|  7844 | TURNER | SALESMAN  | NULL | NULL     | 1500.00 |    0.00 |   NULL |
|  7876 | ADAMS  | CLERK     | NULL | NULL     | 1100.00 |    0.00 |   NULL |
|  7902 | FORD   | ANALYST   | NULL | NULL     | 3000.00 |    0.00 |   NULL |
+-------+--------+-----------+------+----------+---------+---------+--------+
12 rows in set (0.00 sec)

MySQL导入Hive

MySQL导入Hive的时候,首先必须要在Hive中创建表
–hive-overwrite 覆盖已存在的数据 默认是追加
hive-import 向Hive导入数据的标识
–hive-table Hive的表名
–hive-partition-key 分区字段的key
–hive-partition-value 分区字段的value
–fields-terminated-by 字段分割符,默认为,

1、导入普通表

在Hive中创建数据库
CREATE TABLE emp_import(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

将mysql数据导入hive的普通表中
sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database db_hive1 \
--hive-table emp_import \
--fields-terminated-by '\t' \
-m 1

在hive中查询
hive (db_hive1)> select * from emp_import;
OK
emp_import.empno        emp_import.ename        emp_import.job  emp_import.mgr  emp_import.hiredate     emp_import.sal  emp_import.comm emp_import.deptno
7369    SMITH   CLERK   7902    1980-12-17 00:00:00.0   800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-02-20 00:00:00.0   1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-02-22 00:00:00.0   1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-04-02 00:00:00.0   2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-09-28 00:00:00.0   1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-05-01 00:00:00.0   2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-06-09 00:00:00.0   2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1982-12-09 00:00:00.0   3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17 00:00:00.0   5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-09-08 00:00:00.0   1500.0  0.0     30
7876    ADAMS   CLERK   7788    1983-01-12 00:00:00.0   1100.0  NULL    20
7902    FORD    ANALYST 7566    1981-12-03 00:00:00.0   3000.0  NULL    20


2、导入mysql数据到hive的分区表中

在hive中创建分区表
CREATE TABLE emp_import_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
partitioned by (pt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

导入数据
sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database db_hive1 \
--hive-table emp_import_partition \
--hive-partition-key 'pt' \
--hive-partition-value '2019-12-30' \
--fields-terminated-by '\t' \
-m 1

查看数据
hive (db_hive1)> select * from emp_import_partition where pt='2019-12-30';

emp_import_partition.empno      emp_import_partition.ename      emp_import_partition.job        emp_import_partition.mgr        emp_import_partition.hiredate   emp_import_partition.sal    emp_import_partition.comm       emp_import_partition.deptno     emp_import_partition.pt
7369    SMITH   CLERK   7902    1980-12-17 00:00:00.0   800.0   NULL    20      2019-12-30
7499    ALLEN   SALESMAN        7698    1981-02-20 00:00:00.0   1600.0  300.0   30      2019-12-30
7521    WARD    SALESMAN        7698    1981-02-22 00:00:00.0   1250.0  500.0   30      2019-12-30
7566    JONES   MANAGER 7839    1981-04-02 00:00:00.0   2975.0  NULL    20      2019-12-30
7654    MARTIN  SALESMAN        7698    1981-09-28 00:00:00.0   1250.0  1400.0  30      2019-12-30
7698    BLAKE   MANAGER 7839    1981-05-01 00:00:00.0   2850.0  NULL    30      2019-12-30
7782    CLARK   MANAGER 7839    1981-06-09 00:00:00.0   2450.0  NULL    10      2019-12-30
7788    SCOTT   ANALYST 7566    1982-12-09 00:00:00.0   3000.0  NULL    20      2019-12-30
7839    KING    PRESIDENT       NULL    1981-11-17 00:00:00.0   5000.0  NULL    10      2019-12-30
7844    TURNER  SALESMAN        7698    1981-09-08 00:00:00.0   1500.0  0.0     30      2019-12-30
7876    ADAMS   CLERK   7788    1983-01-12 00:00:00.0   1100.0  NULL    20      2019-12-30
7902    FORD    ANALYST 7566    1981-12-03 00:00:00.0   3000.0  NULL    20      2019-12-30



Hive导出MySQL

首先要在mysql创建表
默认是追加数据
–export-dir 指定要导出的数据在HDFS的位置

在hive中导出数据
sqoop export \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table emp_demo \
--export-dir /user/hive/warehouse/db_hive1.db/emp \
-m 1

查看mysql,查出数据
mysql> select * from emp_demo;
mysql> select * from emp_demo;
+-------+--------+-----------+------+---------------------+---------+---------+--------+
| empno | ename  | job       | mgr  | hiredate            | sal     | comm    | deptno |
+-------+--------+-----------+------+---------------------+---------+---------+--------+
|  7369 | SMITH  | CLERK     | NULL | NULL                |  800.00 |    0.00 |   NULL |
|  7499 | ALLEN  | SALESMAN  | NULL | NULL                | 1600.00 |  300.00 |   NULL |
|  7521 | WARD   | SALESMAN  | NULL | NULL                | 1250.00 |  500.00 |   NULL |
|  7566 | JONES  | MANAGER   | NULL | NULL                | 2975.00 |    0.00 |   NULL |
|  7654 | MARTIN | SALESMAN  | NULL | NULL                | 1250.00 | 1400.00 |   NULL |
|  7698 | BLAKE  | MANAGER   | NULL | NULL                | 2850.00 |    0.00 |   NULL |
|  7782 | CLARK  | MANAGER   | NULL | NULL                | 2450.00 |    0.00 |   NULL |
|  7788 | SCOTT  | ANALYST   | NULL | NULL                | 3000.00 |    0.00 |   NULL |
|  7839 | KING   | PRESIDENT | NULL | NULL                | 5000.00 |    0.00 |   NULL |
|  7844 | TURNER | SALESMAN  | NULL | NULL                | 1500.00 |    0.00 |   NULL |
|  7876 | ADAMS  | CLERK     | NULL | NULL                | 1100.00 |    0.00 |   NULL |
|  7902 | FORD   | ANALYST   | NULL | NULL                | 3000.00 |    0.00 |   NULL |
|  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 00:00:00 |  800.00 |    NULL |     20 |
|  7499 | ALLEN  | SALESMAN  | 7698 | 1981-02-20 00:00:00 | 1600.00 |  300.00 |     30 |
|  7521 | WARD   | SALESMAN  | 7698 | 1981-02-22 00:00:00 | 1250.00 |  500.00 |     30 |
|  7566 | JONES  | MANAGER   | 7839 | 1981-04-02 00:00:00 | 2975.00 |    NULL |     20 |
|  7654 | MARTIN | SALESMAN  | 7698 | 1981-09-28 00:00:00 | 1250.00 | 1400.00 |     30 |
|  7698 | BLAKE  | MANAGER   | 7839 | 1981-05-01 00:00:00 | 2850.00 |    NULL |     30 |
|  7782 | CLARK  | MANAGER   | 7839 | 1981-06-09 00:00:00 | 2450.00 |    NULL |     10 |
|  7788 | SCOTT  | ANALYST   | 7566 | 1982-12-09 00:00:00 | 3000.00 |    NULL |     20 |
|  7839 | KING   | PRESIDENT | NULL | 1981-11-17 00:00:00 | 5000.00 |    NULL |     10 |
|  7844 | TURNER | SALESMAN  | 7698 | 1981-09-08 00:00:00 | 1500.00 |    0.00 |     30 |
|  7876 | ADAMS  | CLERK     | 7788 | 1983-01-12 00:00:00 | 1100.00 |    NULL |     20 |
|  7902 | FORD   | ANALYST   | 7566 | 1981-12-03 00:00:00 | 3000.00 |    NULL |     20 |
+-------+--------+-----------+------+---------------------+---------+---------+--------+

2、封装导出命令,使用job运行

	创建job
	sqoop job --create bigdata-sqoop-job -- \
import --connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table emp \
--delete-target-dir
	执行job
	sqoop job --exec bigdata-sqoop-job
	删除job
	sqoop job --delete bigdata-sqoop-job
	
	查询job
[hadoop@JD bin]$ sqoop job --list
Available jobs:
  bigdata-sqoop-job

综合案例

1、需求:
emp和dept表是在MySQL,把MySQL数据抽取到Hive进行统计分析,然后将统计结果回写到MySQL中
2、思路:
Hive:创建emp_etl和dept_etl和结果表result_etl;统计:select e.empno, e.ename, e.deptno, e.dname from emp_etl e join dept_etl d on e.deptno=d.deptno;MySQL: 创建一个result_etl结果表

在hive中创建emp_etl和dept_etl表
CREATE TABLE emp_etl(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

CREATE TABLE dept_etl(
deptno int,
dname string,
loc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

从mysql向hive导入数据
sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table dept \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database db_hive1 \
--hive-table dept_etl \
--fields-terminated-by '\t' \
-m 1

sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database db_hive1 \
--hive-table emp_etl \
--fields-terminated-by '\t' \
-m 1

在hive中创建中间表
CREATE TABLE result_etl(
empno int,
ename string,
deptno int,
dname string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

向结果表中插入结果数据
insert overwrite table result_etl select e.empno, e.ename, e.deptno, d.dname from emp_etl e join dept_etl d on e.deptno=d.deptno;	

mysql中创建结果表
create table etl_result(
empno int,
ename varchar(10),
deptno int,
dname varchar(20)
);

从hive中导出结果到mysql中
sqoop export \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table etl_result \
--export-dir /user/hive/warehouse/db_hive1.db/result_etl \
--fields-terminated-by '\t' \
-m 1

查看数据,导入成功
mysql> select * from etl_result;
+-------+--------+--------+------------+
| empno | ename  | deptno | dname      |
+-------+--------+--------+------------+
|  7369 | SMITH  |     20 | RESEARCH   |
|  7499 | ALLEN  |     30 | SALES      |
|  7521 | WARD   |     30 | SALES      |
|  7566 | JONES  |     20 | RESEARCH   |
|  7654 | MARTIN |     30 | SALES      |
|  7698 | BLAKE  |     30 | SALES      |
|  7782 | CLARK  |     10 | ACCOUNTING |
|  7788 | SCOTT  |     20 | RESEARCH   |
|  7839 | KING   |     10 | ACCOUNTING |
|  7844 | TURNER |     30 | SALES      |
|  7876 | ADAMS  |     20 | RESEARCH   |
|  7902 | FORD   |     20 | RESEARCH   |
+-------+--------+--------+------------+
12 rows in set (0.00 sec)

4、封装到shell脚本中
4.1shell脚本内容

#!/bin/sh
set -x

sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table dept \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database db_hive1 \
--hive-table dept_etl \
--fields-terminated-by '\t' \
-m 1

sqoop import \
--connect jdbc:mysql://JD:3306/bigdata \
--password mysqladmin \
--username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database db_hive1 \
--hive-table emp_etl \
--fields-terminated-by '\t' \
-m 1

sql="insert overwrite table result_etl select e.empno, e.ename, e.deptno, d.dname from emp_etl e join dept_etl d on e.deptno=d.deptno"

hive -e "$sql"

mysql -uroot -pruozedata <

你可能感兴趣的:(Sqoop)