sqoop的基本语法详解及可能遇到的错误

0 sqoop介绍

Apache Sqoop是专为Apache Hadoop和结构化数据存储如关系数据库之间的数据转换工具的有效工具。你可以使用Sqoop从外部结构化数据存储的数据导入到Hadoop分布式文件系统或相关系统如Hive和HBase。相反,Sqoop可以用来从Hadoop的数据提取和导出到外部结构化数据存储如关系数据库和企业数据仓库。
Sqoop专为大数据批量传输设计,能够分割数据集并创建Hadoop任务来处理每个区块。
sqoop的安装和下载可参考该地址

1 使用场景

Sqoop底层其实就是MapReduce,那使用Sqoop必然会存在数据延迟的问题,我个人建议哈,在数据量不是很大或者实时性要求不高的情况下,使用Sqoop还是很方便的哈,直接通过配置就可以各种数据的导入导出。如果大家学习过Spark应该知道其实Spark SQL用起来也是很方便的,给我们提供了外部数据源,方便我们对各种数据类型的读写哈。 所以怎么用看大家的自己选择,没有最好的,只有最适合自己的~。

官网
使用手册

2 查看帮助命令

查看命令帮助(sqoop help

[hadoop@zhangyu lib]$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

这里提示我们使用sqoop help command(要查询的命令)进行该命令的详细查询

3 list-databases

[hadoop@zhangyu lib]$ sqoop help list-databases
  1. –connect jdbc:mysql://hostname:port/database指定mysql数据库主机名和端口号和数据库名(默认端口号为3306);
  2. –username : root 指定数据库用户名
  3. –password :123456 指定数据库密码
[hadoop@zhangyu lib]$ sqoop list-databases \
> --connect jdbc:mysql://localhost:3306 \
> --username root \
> --password 123456

结果:
information_schema
basic01
mysql
performance_schema
sqoop
test

4 list-tables

[hadoop@zhangyu lib]$ sqoop list-tables \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456
结果:
stu

5 将mysql导入HDFS中(import):

(默认导入当前用户目录下/user/用户名/表名)
说到这里扩展一个小知识点:hdfs dfs -lshdfs dfs -ls \的区别。(自己动手去测试下~~~)

sqoop import --connect jdbc:mysql://localhost/database --username root --password 123456 --table example –m 1
  1. –table : example mysql中即将导出的表
  2. -m 1 指定启动一个map进程,如果表很大,可以启动多个map进程,默认是4个

这里可能会出现两个错误,如下:

  • 第一个错误
18/01/14 16:01:19 ERROR tool.ImportTool: Error during import: No primary key could be found for table stu. Please specify one with --split-by or perform a sequential import with '-m 1'.
  • 提示可以看出,在我们从mysql中导出的表没有设定主键,提示我们使用把--split-by或者把参数-m设置为1,这里大家会不会问到,这倒是是为什么呢?
  1. Sqoop通可以过–split-by指定切分的字段,–m设置mapper的数量。通过这两个参数分解生成m个where子句,进行分段查询。
  2. split-by 根据不同的参数类型有不同的切分方法,如表共有100条数据其中id为int类型,并且我们指定–split-by id,我们不设置map数量使用默认的为四个,首先Sqoop会取获取切分字段的MIN()和MAX()即(–split -by),再根据map数量进行划分,这是字段值就会分为四个map:(1-25)(26-50)(51-75)(75-100)。
  3. 根据MIN和MAX不同的类型采用不同的切分方式支持有Date,Text,Float,Integer, Boolean,NText,BigDecimal等等。
  4. 所以,若导入的表中没有主键,将-m 设置称1或者设置split-by,即只有一个map运行,缺点是不能并行map录入数据。(注意,当-m 设置的值大于1时,split-by必须设置字段) 。
  5. split-by即便是int型,若不是连续有规律递增的话,各个map分配的数据是不均衡的,可能会有些map很忙,有些map几乎没有数据处理的情况。
  • 第二个错误
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
        at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42)
        at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742)
        at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369)
        at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 15 more

这里我们需要导入java-json.jar包,下载地址,把java-json.jar添加到…/sqoop/lib目录

  • 说了那么多来看我们的第一个导入语句:
[hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root --password 123456 --table stu

生成的日志信息大家一定要好好理解

我们查看HDFS上的文件

[hadoop@zhangyu lib]$ hdfs dfs -ls /user/hadoop/stu
Found 4 items
-rw-r--r--   1 hadoop supergroup          0 2018-01-14 17:07 /user/hadoop/stu/_SUCCESS
-rw-r--r--   1 hadoop supergroup         11 2018-01-14 17:07 /user/hadoop/stu/part-m-00000
-rw-r--r--   1 hadoop supergroup          7 2018-01-14 17:07 /user/hadoop/stu/part-m-00001
-rw-r--r--   1 hadoop supergroup          9 2018-01-14 17:07 /user/hadoop/stu/part-m-00002
[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/stu/"part*"
1,zhangsan
2,lisi
3,wangwu

  • 加上参数m
[hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root --password 123456 --table stu
 -m 1

  • 第三个错误
    这里第三个错误,是我在把Postgres中的数据导入到HDFS中出现的错误:
[hdfs@hik003 /apps/sqoop]$ sqoop import \
> --connect jdbc:postgresql://localhost:5432/test \
> --username root \
> --password 123456 \
> --table test_info \
> --target-dir /test/ \
> --mapreduce-job-name FromPgSQL2HDFS \
> -m 1
Warning: /opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/05/17 20:38:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.1
19/05/17 20:38:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/17 20:38:15 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/17 20:38:15 INFO tool.CodeGenTool: Beginning code generation
19/05/17 20:38:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "camera_govern_info" AS t LIMIT 1
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.sqoop.orm.ClassWriter.parseNullVal(ClassWriter.java:1389)
        at org.apache.sqoop.orm.ClassWriter.parseColumn(ClassWriter.java:1414)
        at org.apache.sqoop.orm.ClassWriter.myGenerateParser(ClassWriter.java:1540)
        at org.apache.sqoop.orm.ClassWriter.generateParser(ClassWriter.java:1503)
        at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1958)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1749)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
		
		
		https://yq.aliyun.com/articles/578819

这里就是字段类型的问题了,postgresql数据库中geometry类型的字段,是用于存放经纬度的字段类型,在转化时出现了错误。改变数据库中的字段类型。

这里大家可能也会出现一个错误,在hdfs上已经存,错误如下:

18/01/14 17:52:47 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.137.200:9000/user/hadoop/stu already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
  • 删除目标目录后在导入,并且指定mapreduce的job的名字
    参数:--delete-target-dir --mapreduce-job-name
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --table stu \
> -m 1
  • 导入到指定目录

参数:--target-dir /directory

[hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root -password 123456 --table stu -m 1 --target-dir /sqoop/

查看HDFS上的文件

[hadoop@zhangyu lib]$ hdfs dfs -ls /sqoop
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2018-01-14 18:07 /sqoop/_SUCCESS
-rw-r--r--   1 hadoop supergroup         27 2018-01-14 18:07 /sqoop/part-m-00000
[hadoop@zhangyu lib]$ hdfs dfs -cat /sqoop/part-m-00000
1,zhangsan
2,lisi
3,wangwu
  • 指定字段之间的分隔符
    参数--fields-terminated-by
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 

HDFS上查询结果:
[hadoop@zhangyu lib]$ hdfs dfs -ls /user/hadoop/stu/ 
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2018-01-14 19:47 /user/hadoop/stu/_SUCCESS
-rw-r--r--   1 hadoop supergroup         27 2018-01-14 19:47 /user/hadoop/stu/part-m-0000 

[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/stu/part-m-00000   
1       zhangsan
2       lisi
3       wangwu
(字段之间变为空格)
  • 如果表中的字段为null转化为0

参数--null-non-string

  1. –null-string含义是 string类型的字段,当Value是NULL,替换成指定的字符
  2. –null-non-string 含义是非string类型的字段,当Value是NULL,替换成指定字符先
导入薪资表
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table sal \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1

查询结果:
[hadoop@zhangyu lib]$  hdfs dfs -cat /user/hadoop/sal/part-m-00000
zhangsan        1000
lisi    2000
wangwu  null

加上参数`--null-string
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table sal \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-string 0

查看结果
[hadoop@zhangyu lib]$  hdfs dfs -cat /user/hadoop/sal/part-m-00000
zhangsan        1000
lisi    2000
wangwu  0
  • 导入表中的部分字段
    参数--columns
[hadoop@zhangyu ~]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-string 0 \
> --columns "name" 

查询结果:
[hadoop@zhangyu ~]$ hdfs dfs -cat /user/hadoop/stu/part-m-00000
zhangsan
lisi
wangwu
  • 按条件导入数据
    参数--where
[hadoop@zhangyu ~]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-string 0 \
> --columns "name" \
> --target-dir STU_COLUMN_WHERE \
> --where 'id<3'

查询结果:
                                                                                   
zhangsan
lisi
  • 按照sql语句进行导入
    参数--query
    使用--query关键字,就不能使用--table--columns
    自定义sql语句的where条件中必须包含字符串 $CONDITIONS$CONDITIONS是一个变量,用于给多个map任务划分任务范 围;
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username root --password 123456 \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' \
-m 1 \
--null-string 0 \
--target-dir STU_COLUMN_QUERY \
--query "select * from stu where id>1 and \$CONDITIONS"
                                                                              (或者quer使用这种格式:--query 'select * from emp where id>1 and $CONDITIONS'

结果:

2       lisi
3       wangwu

6 在文件中执行

  • 创建文件sqoop-import-hdfs.txt
[hadoop@zhangyu data]$ vi sqoop-import-hdfs.txt                                   
import
--connect
jdbc:mysql://localhost:3306/sqoop
--username
root
--password
123456
--table
stu
--target-dir 
STU_option_file
  • 执行
[hadoop@zhangyu data]$ sqoop --option-file /home/hadoop/data/sqoop-import-hdfs.txt

查询结果:
[hadoop@zhangyu data]$ hdfs dfs -cat STU_option_file/"part*"
1,zhangsan
2,lisi
3,wangwu

7 eval

查看帮助命令对与该命令的解释为: Evaluate a SQL statement and display the results,也就是说执行一个SQL语句并查询出结果。

[hadoop@zhangyu data]$ sqoop eval \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --query "select * from stu" 
Warning: /opt/software/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /opt/software/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/software/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/software/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/01/14 21:35:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/01/14 21:35:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/14 21:35:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
--------------------------------------
| id          | name                 | 
--------------------------------------
| 1           | zhangsan             | 
| 2           | lisi                 | 
| 3           | wangwu               | 
--------------------------------------

8 HDFS数据导出到MySQL(Hive中的数据导入到MySQL)

导出HDFS上的sal数据,查询数据:

[hadoop@zhangyu data]$ hdfs dfs -cat sal/part-m-00000
zhangsan        1000
lisi    2000
wangwu  0
  • 在执行导出语句前先创建sal_demo表(不创建表会报错):
mysql> create table sal_demo like sal;
  • 导出语句:
[hadoop@zhangyu data]$ sqoop export \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table sal_demo \
> --input-fields-terminated-by '\t'\
> --export-dir /user/hadoop/sal/
  1. –table sal_demo :指定导出表的名称;
  2. –input-fields-terminated-by:可以用来指定hdfs上文件的分隔符,默认是逗号(查询数据室可以看出我是用的是\t,所以这里指定为\t ,这里大家小心可能因为分隔符的原因报错
  3. –export-dir :导出数据的目录。
结果:
mysql> select * from sal_demo;
+----------+--------+
| name     | salary |
+----------+--------+
| zhangsan | 1000   |
| lisi     | 2000   |
| wangwu   | 0      |
+----------+--------+
3 rows in set (0.00 sec)

(如果在导入一次会追加在表中)

  • 插入中文乱码问题
sqoop export --connect "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table sal -m 1 --export-dir /user/hadoop/sal/

  • 指定导出的字段
    --columns
[hadoop@zhangyu data]$ sqoop export \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table sal_demo3 \
> --input-fields-terminated-by '\t' \
> --export-dir /user/hadoop/sal/ \
> --columns name

查询结果:

mysql> select * from sal_demo3  
    -> ;
+----------+--------+
| name     | salary |
+----------+--------+
| zhangsan | NULL   |
| lisi     | NULL   |
| wangwu   | NULL   |
+----------+--------+
3 rows in set (0.00 sec)

9 MySQL的中的数据导入到Hive中

  • 执行导入语句
[hadoop@zhangyu ~]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --create-hive-table \
> --hive-database hive \
> --hive-import \
> --hive-overwrite \
> --hive-table stu_import \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-non-string 0

–create-hive-table :创建目标表,如果有会报错;
–hive-database:指定hive数据库;
–hive-import :指定导入hive(没有这个条件导入到hdfs中);
–hive-overwrite :覆盖;
–hive-table stu_import :指定hive中表的名字,如果不指定使用导入的表的表名。

这里可能会报错,错误如下:

18/01/15 01:29:28 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
18/01/15 01:29:28 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
        at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
        ... 12 more

网上找的资料基本都在说配置个人环境变量,并没有卵用,到hive目录的lib下拷贝几个jar包,问题就解决了!

[hadoop@zhangyu lib]$ cp hive-common-1.1.0-cdh5.7.0.jar /opt/software/sqoop/lib/
[hadoop@zhangyu lib]$ cd hive-shims* /opt/software/sqoop/lib/
  • 查看hive中导入的数据
hive> show tables;
OK
stu_import
Time taken: 0.067 seconds, Fetched: 1 row(s)
hive> select * from emp_import
    > ;
OK
1       zhangsan
2       lisiw 
3       wangwu

导入Hive不建议大家使用–create-hive-table,建议事先创建好hive表
使用create创建表后,我们可以查看字段对应的类型,发现有些并不是我们想要的类型,所以我们要事先创建好表的结构再导入数据。

  • 导入到hive指定分区
--hive-partition-key <partition-key>             Sets the partition key
                                                    to use when importing
                                                    to hive
 --hive-partition-value <partition-value>         Sets the partition
                                                    value to use when
                                                    importing to hive
  • 示例:
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --create-hive-table \
> --hive-database hive \
> --hive-import \
> --hive-overwrite \
> --hive-table stu_import1 \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-non-string 0 \
> --hive-partition-key dt \
> --hive-partition-value "2018-08-08"

  • hive上进行查询
hive> select * from stu_import1;
OK
1       zhangsan        2018-08-08
2       lisi    2018-08-08
3       wangwu  2018-08-08
Time taken: 0.121 seconds, Fetched: 3 row(s)

10 sqoop job的使用

就是把sqoop执行的语句变成一个job,并不是在创建语句的时候执行,你可以查看该job,可以任何时候执行该job,也可以删除job,这样就方便我们进行任务的调度

--create   创建一个新的job.
--delete   删除job
--exec      执行job
--show     显示job的参数
--list                     列出所有的job
  • 创建一个job
sqoop job --create person_job1 -- import --connect jdbc:mysql://localhost:3306/sqoop \
--username root \
--password 123456 \
--table sal_demo3 \
-m 1 \
--delete-target-dir 
  • 查看可用的job
[hadoop@zhangyu lib]$ sqoop job --list
Available jobs:
  person_job1
  • 执行person_job完成导入
[hadoop@zhangyu lib]$ sqoop job --exec person_job1

[hadoop@zhangyu lib]$ hdfs dfs -ls
Found 6 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 20:40 EMP_COLUMN_WHERE
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 20:49 STU_COLUMN_QUERY
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 20:45 STU_COLUMN_WHERE
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 21:10 STU_option_file
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 20:24 sal
drwxr-xr-x   - hadoop supergroup          0 2018-01-15 03:08 sal_demo3

问题:执行person_job的时候,需要输入数据库的密码,怎么样能不输入密码呢?

  • 配置sqoop-site.xml
 <property>
     <name>sqoop.metastore.client.record.password</name>
     <value>true</value>
     <description>If true, allow saved passwords in the metastore.
     </description>
</property> 

最后我们看看常用脚本把

#!/bin/bash

#Hive2Pg

IP=localhost
Database=dufault
Driver=jdbc:postgresql

pg_connection=${Driver}://${IP}/${Database}
pg_username=root
pg_password=123456

hive_db=default
hive_table=hive_table
hive_columns=id,name,age
 
 
#需要导出的数据pg表名
export_table_name=pg_test
 
 
#需要导出的数据pg列名
export_table_columns=id,name,age

 
#需要导出到pg的数据的临时文件目录
sqoop_export_data_dir=/tmp/sqoop/export/${export_table_name}
 
hadoop dfs -rm -r ${sqoop_export_data_dir};
#创建用于导出到pg的临时数据
hive -v -e "
use ${hive_db};
insert overwrite directory '${sqoop_export_data_dir}'
select
${hive_columns}
from ${hive_db}.${hive_table};";

#先删除目的数据库的数据
sqoop eval --connect ${pg_connection}  \
--username ${pg_username}  \
--password ${pg_password} \
--verbose --query "delete from ${export_table_name}";

 
#再导出数据
sqoop export --connect ${pg_connection} \
--username ${pg_username} \
--password ${pg_password} \
--export-dir ${sqoop_export_data_dir} \
--verbose --num-mappers 1 \
--table ${export_table_name} \
--columns ${export_table_columns} \
--input-fields-terminated-by '\001' \
--input-lines-terminated-by '\n' \
--input-null-string '\\N' \
--input-null-non-string '\\N'

你可能感兴趣的:(Hadoop,sqoop)