Apache Sqoop是专为Apache Hadoop和结构化数据存储如关系数据库之间的数据转换工具的有效工具。你可以使用Sqoop从外部结构化数据存储的数据导入到Hadoop分布式文件系统或相关系统如Hive和HBase。相反,Sqoop可以用来从Hadoop的数据提取和导出到外部结构化数据存储如关系数据库和企业数据仓库。
Sqoop专为大数据批量传输设计,能够分割数据集并创建Hadoop任务来处理每个区块。
sqoop的安装和下载可参考该地址
Sqoop底层其实就是MapReduce,那使用Sqoop必然会存在数据延迟的问题,我个人建议哈,在数据量不是很大或者实时性要求不高的情况下,使用Sqoop还是很方便的哈,直接通过配置就可以各种数据的导入导出。如果大家学习过Spark应该知道其实Spark SQL用起来也是很方便的,给我们提供了外部数据源,方便我们对各种数据类型的读写哈。 所以怎么用看大家的自己选择,没有最好的,只有最适合自己的~。
官网
使用手册
查看命令帮助(sqoop help
)
[hadoop@zhangyu lib]$ sqoop help
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
这里提示我们使用sqoop help command(要查询的命令)进行该命令的详细查询
[hadoop@zhangyu lib]$ sqoop help list-databases
[hadoop@zhangyu lib]$ sqoop list-databases \
> --connect jdbc:mysql://localhost:3306 \
> --username root \
> --password 123456
结果:
information_schema
basic01
mysql
performance_schema
sqoop
test
[hadoop@zhangyu lib]$ sqoop list-tables \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456
结果:
stu
(默认导入当前用户目录下/user/用户名/表名)
说到这里扩展一个小知识点:hdfs dfs -ls
和hdfs dfs -ls \
的区别。(自己动手去测试下~~~)
sqoop import --connect jdbc:mysql://localhost/database --username root --password 123456 --table example –m 1
这里可能会出现两个错误,如下:
18/01/14 16:01:19 ERROR tool.ImportTool: Error during import: No primary key could be found for table stu. Please specify one with --split-by or perform a sequential import with '-m 1'.
--split-by
或者把参数-m设置为1,这里大家会不会问到,这倒是是为什么呢?Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42)
at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742)
at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369)
at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 15 more
这里我们需要导入java-json.jar包,下载地址,把java-json.jar添加到…/sqoop/lib目录
[hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root --password 123456 --table stu
生成的日志信息大家一定要好好理解
我们查看HDFS上的文件
[hadoop@zhangyu lib]$ hdfs dfs -ls /user/hadoop/stu
Found 4 items
-rw-r--r-- 1 hadoop supergroup 0 2018-01-14 17:07 /user/hadoop/stu/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 11 2018-01-14 17:07 /user/hadoop/stu/part-m-00000
-rw-r--r-- 1 hadoop supergroup 7 2018-01-14 17:07 /user/hadoop/stu/part-m-00001
-rw-r--r-- 1 hadoop supergroup 9 2018-01-14 17:07 /user/hadoop/stu/part-m-00002
[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/stu/"part*"
1,zhangsan
2,lisi
3,wangwu
[hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root --password 123456 --table stu
-m 1
[hdfs@hik003 /apps/sqoop]$ sqoop import \
> --connect jdbc:postgresql://localhost:5432/test \
> --username root \
> --password 123456 \
> --table test_info \
> --target-dir /test/ \
> --mapreduce-job-name FromPgSQL2HDFS \
> -m 1
Warning: /opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/05/17 20:38:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.1
19/05/17 20:38:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/17 20:38:15 INFO manager.SqlManager: Using default fetchSize of 1000
19/05/17 20:38:15 INFO tool.CodeGenTool: Beginning code generation
19/05/17 20:38:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "camera_govern_info" AS t LIMIT 1
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: Cannot resolve SQL type 1111
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR orm.ClassWriter: No Java type for SQL type 1111 for column geom
19/05/17 20:38:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.orm.ClassWriter.parseNullVal(ClassWriter.java:1389)
at org.apache.sqoop.orm.ClassWriter.parseColumn(ClassWriter.java:1414)
at org.apache.sqoop.orm.ClassWriter.myGenerateParser(ClassWriter.java:1540)
at org.apache.sqoop.orm.ClassWriter.generateParser(ClassWriter.java:1503)
at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1958)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1749)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
https://yq.aliyun.com/articles/578819
这里就是字段类型的问题了,postgresql数据库中geometry类型的字段,是用于存放经纬度的字段类型,在转化时出现了错误。改变数据库中的字段类型。
这里大家可能也会出现一个错误,在hdfs上已经存,错误如下:
18/01/14 17:52:47 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.137.200:9000/user/hadoop/stu already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
--delete-target-dir
--mapreduce-job-name
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --table stu \
> -m 1
参数:--target-dir /directory
[hadoop@zhangyu lib]$ sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root -password 123456 --table stu -m 1 --target-dir /sqoop/
查看HDFS上的文件
[hadoop@zhangyu lib]$ hdfs dfs -ls /sqoop
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-01-14 18:07 /sqoop/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 27 2018-01-14 18:07 /sqoop/part-m-00000
[hadoop@zhangyu lib]$ hdfs dfs -cat /sqoop/part-m-00000
1,zhangsan
2,lisi
3,wangwu
--fields-terminated-by
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1
HDFS上查询结果:
[hadoop@zhangyu lib]$ hdfs dfs -ls /user/hadoop/stu/
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2018-01-14 19:47 /user/hadoop/stu/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 27 2018-01-14 19:47 /user/hadoop/stu/part-m-0000
[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/stu/part-m-00000
1 zhangsan
2 lisi
3 wangwu
(字段之间变为空格)
参数--null-non-string
导入薪资表
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table sal \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1
查询结果:
[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/sal/part-m-00000
zhangsan 1000
lisi 2000
wangwu null
加上参数`--null-string
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table sal \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-string 0
查看结果
[hadoop@zhangyu lib]$ hdfs dfs -cat /user/hadoop/sal/part-m-00000
zhangsan 1000
lisi 2000
wangwu 0
--columns
[hadoop@zhangyu ~]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-string 0 \
> --columns "name"
查询结果:
[hadoop@zhangyu ~]$ hdfs dfs -cat /user/hadoop/stu/part-m-00000
zhangsan
lisi
wangwu
--where
[hadoop@zhangyu ~]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-string 0 \
> --columns "name" \
> --target-dir STU_COLUMN_WHERE \
> --where 'id<3'
查询结果:
zhangsan
lisi
--query
--query
关键字,就不能使用--table
和--columns
$CONDITIONS
,$CONDITIONS
是一个变量,用于给多个map任务划分任务范 围;sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username root --password 123456 \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' \
-m 1 \
--null-string 0 \
--target-dir STU_COLUMN_QUERY \
--query "select * from stu where id>1 and \$CONDITIONS"
(或者quer使用这种格式:--query 'select * from emp where id>1 and $CONDITIONS')
结果:
2 lisi
3 wangwu
[hadoop@zhangyu data]$ vi sqoop-import-hdfs.txt
import
--connect
jdbc:mysql://localhost:3306/sqoop
--username
root
--password
123456
--table
stu
--target-dir
STU_option_file
[hadoop@zhangyu data]$ sqoop --option-file /home/hadoop/data/sqoop-import-hdfs.txt
查询结果:
[hadoop@zhangyu data]$ hdfs dfs -cat STU_option_file/"part*"
1,zhangsan
2,lisi
3,wangwu
查看帮助命令对与该命令的解释为: Evaluate a SQL statement and display the results,也就是说执行一个SQL语句并查询出结果。
[hadoop@zhangyu data]$ sqoop eval \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --query "select * from stu"
Warning: /opt/software/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /opt/software/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/software/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/software/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/01/14 21:35:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/01/14 21:35:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/14 21:35:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
--------------------------------------
| id | name |
--------------------------------------
| 1 | zhangsan |
| 2 | lisi |
| 3 | wangwu |
--------------------------------------
导出HDFS上的sal数据,查询数据:
[hadoop@zhangyu data]$ hdfs dfs -cat sal/part-m-00000
zhangsan 1000
lisi 2000
wangwu 0
mysql> create table sal_demo like sal;
[hadoop@zhangyu data]$ sqoop export \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table sal_demo \
> --input-fields-terminated-by '\t'\
> --export-dir /user/hadoop/sal/
结果:
mysql> select * from sal_demo;
+----------+--------+
| name | salary |
+----------+--------+
| zhangsan | 1000 |
| lisi | 2000 |
| wangwu | 0 |
+----------+--------+
3 rows in set (0.00 sec)
(如果在导入一次会追加在表中)
sqoop export --connect "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table sal -m 1 --export-dir /user/hadoop/sal/
--columns
[hadoop@zhangyu data]$ sqoop export \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password 123456 \
> --table sal_demo3 \
> --input-fields-terminated-by '\t' \
> --export-dir /user/hadoop/sal/ \
> --columns name
查询结果:
mysql> select * from sal_demo3
-> ;
+----------+--------+
| name | salary |
+----------+--------+
| zhangsan | NULL |
| lisi | NULL |
| wangwu | NULL |
+----------+--------+
3 rows in set (0.00 sec)
[hadoop@zhangyu ~]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --create-hive-table \
> --hive-database hive \
> --hive-import \
> --hive-overwrite \
> --hive-table stu_import \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-non-string 0
–create-hive-table :创建目标表,如果有会报错;
–hive-database:指定hive数据库;
–hive-import :指定导入hive(没有这个条件导入到hdfs中);
–hive-overwrite :覆盖;
–hive-table stu_import :指定hive中表的名字,如果不指定使用导入的表的表名。
这里可能会报错,错误如下:
18/01/15 01:29:28 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
18/01/15 01:29:28 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
... 12 more
网上找的资料基本都在说配置个人环境变量,并没有卵用,到hive目录的lib下拷贝几个jar包,问题就解决了!
[hadoop@zhangyu lib]$ cp hive-common-1.1.0-cdh5.7.0.jar /opt/software/sqoop/lib/
[hadoop@zhangyu lib]$ cd hive-shims* /opt/software/sqoop/lib/
hive> show tables;
OK
stu_import
Time taken: 0.067 seconds, Fetched: 1 row(s)
hive> select * from emp_import
> ;
OK
1 zhangsan
2 lisiw
3 wangwu
导入Hive不建议大家使用–create-hive-table,建议事先创建好hive表
使用create创建表后,我们可以查看字段对应的类型,发现有些并不是我们想要的类型,所以我们要事先创建好表的结构再导入数据。
--hive-partition-key <partition-key> Sets the partition key
to use when importing
to hive
--hive-partition-value <partition-value> Sets the partition
value to use when
importing to hive
[hadoop@zhangyu lib]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root --password 123456 \
> --table stu \
> --create-hive-table \
> --hive-database hive \
> --hive-import \
> --hive-overwrite \
> --hive-table stu_import1 \
> --mapreduce-job-name FromMySQL2HDFS \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> -m 1 \
> --null-non-string 0 \
> --hive-partition-key dt \
> --hive-partition-value "2018-08-08"
hive> select * from stu_import1;
OK
1 zhangsan 2018-08-08
2 lisi 2018-08-08
3 wangwu 2018-08-08
Time taken: 0.121 seconds, Fetched: 3 row(s)
就是把sqoop执行的语句变成一个job,并不是在创建语句的时候执行,你可以查看该job,可以任何时候执行该job,也可以删除job,这样就方便我们进行任务的调度
--create 创建一个新的job.
--delete 删除job
--exec 执行job
--show 显示job的参数
--list 列出所有的job
sqoop job --create person_job1 -- import --connect jdbc:mysql://localhost:3306/sqoop \
--username root \
--password 123456 \
--table sal_demo3 \
-m 1 \
--delete-target-dir
[hadoop@zhangyu lib]$ sqoop job --list
Available jobs:
person_job1
[hadoop@zhangyu lib]$ sqoop job --exec person_job1
[hadoop@zhangyu lib]$ hdfs dfs -ls
Found 6 items
drwxr-xr-x - hadoop supergroup 0 2018-01-14 20:40 EMP_COLUMN_WHERE
drwxr-xr-x - hadoop supergroup 0 2018-01-14 20:49 STU_COLUMN_QUERY
drwxr-xr-x - hadoop supergroup 0 2018-01-14 20:45 STU_COLUMN_WHERE
drwxr-xr-x - hadoop supergroup 0 2018-01-14 21:10 STU_option_file
drwxr-xr-x - hadoop supergroup 0 2018-01-14 20:24 sal
drwxr-xr-x - hadoop supergroup 0 2018-01-15 03:08 sal_demo3
问题:执行person_job的时候,需要输入数据库的密码,怎么样能不输入密码呢?
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>
#!/bin/bash
#Hive2Pg
IP=localhost
Database=dufault
Driver=jdbc:postgresql
pg_connection=${Driver}://${IP}/${Database}
pg_username=root
pg_password=123456
hive_db=default
hive_table=hive_table
hive_columns=id,name,age
#需要导出的数据pg表名
export_table_name=pg_test
#需要导出的数据pg列名
export_table_columns=id,name,age
#需要导出到pg的数据的临时文件目录
sqoop_export_data_dir=/tmp/sqoop/export/${export_table_name}
hadoop dfs -rm -r ${sqoop_export_data_dir};
#创建用于导出到pg的临时数据
hive -v -e "
use ${hive_db};
insert overwrite directory '${sqoop_export_data_dir}'
select
${hive_columns}
from ${hive_db}.${hive_table};";
#先删除目的数据库的数据
sqoop eval --connect ${pg_connection} \
--username ${pg_username} \
--password ${pg_password} \
--verbose --query "delete from ${export_table_name}";
#再导出数据
sqoop export --connect ${pg_connection} \
--username ${pg_username} \
--password ${pg_password} \
--export-dir ${sqoop_export_data_dir} \
--verbose --num-mappers 1 \
--table ${export_table_name} \
--columns ${export_table_columns} \
--input-fields-terminated-by '\001' \
--input-lines-terminated-by '\n' \
--input-null-string '\\N' \
--input-null-non-string '\\N'