Kettle与Hadoop(七)执行Sqoop作业

目录

一、Sqoop export

二、Sqoop import


一、Sqoop export

1. 建立一个作业,将HDFS文件导出到MySQL数据库。
(1)打开PDI,新建一个作业,如图1所示。

图1

 

(2)编辑'Sqoop export'作业项,如图2所示。

Kettle与Hadoop(七)执行Sqoop作业_第1张图片 图2

        说明:

  • CDH631是已经建好的Hadoop集群连接,参见“https://wxy0327.blog.csdn.net/article/details/106406702#%E4%BA%8C%E3%80%81%E8%BF%9E%E6%8E%A5Hadoop%E9%9B%86%E7%BE%A4”。
  • mysql_node3是已经建好的MySQL数据库连接,参见“https://wxy0327.blog.csdn.net/article/details/106406702#%E4%BA%94%E3%80%81%E5%90%8E%E7%BB%AD%EF%BC%88%E5%BB%BA%E7%AB%8BMySQL%E6%95%B0%E6%8D%AE%E5%BA%93%E8%BF%9E%E6%8E%A5%EF%BC%89”。
  • /user/root/sales_order_ext/part-m-00000是HDFS上一个CSV格式的文件,具有102行记录,如图3所示。
Kettle与Hadoop(七)执行Sqoop作业_第2张图片 图3
  • sales_order是MySQL中的目标表,结构与part-m-00000文件匹配,如图4所示。
Kettle与Hadoop(七)执行Sqoop作业_第3张图片 图4

2. 保存并执行作业,日志如下所示。

2020/06/08 09:28:40 - Spoon - Starting job...
2020/06/08 09:28:40 - sqoop_export - Start of job execution
2020/06/08 09:28:40 - sqoop_export - Starting entry [Sqoop export]
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Running Sqoop version: 1.4.7-cdh6.1.0
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Setting your password on the command-line is insecure. Consider using -P instead.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Preparing to use a MySQL streaming resultset.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Beginning code generation
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $HADOOP_MAPRED_HOME is not set
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Writing jar file: /tmp/sqoop-root/compile/7c09fc949771ba78a584cc07c1cd6b3f/sales_order.jar
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Beginning export of sales_order
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - SQOOP_HOME is unset. May not be able to find all job dependencies.
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Connecting to ResourceManager at manager/172.16.1.124:8032
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Disabling Erasure Coding for path: /user/root/.staging/job_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Total input files to process : 1
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Total input files to process : 1
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - number of splits:4
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Submitting tokens for job: job_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Executing with tokens: []
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Submitted application application_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - The url to track the job: http://manager:8088/proxy/application_1591323999364_0046/
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Running job: job_1591323999364_0046
2020/06/08 09:28:51 - Sqoop export - 2020/06/08 09:28:51 - Job job_1591323999364_0046 running in uber mode : false
2020/06/08 09:28:51 - Sqoop export - 2020/06/08 09:28:51 -  map 0% reduce 0%
2020/06/08 09:28:58 - Sqoop export - 2020/06/08 09:28:58 -  map 75% reduce 0%
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 -  map 100% reduce 0%
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Job job_1591323999364_0046 completed successfully
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Counters: 34
2020/06/08 09:28:59 - Sqoop export -     File System Counters
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of bytes read=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of bytes written=968088
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of read operations=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of large read operations=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of write operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes read=15868
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes written=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of read operations=16
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of large read operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of write operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes read erasure-coded=0
2020/06/08 09:28:59 - Sqoop export -     Job Counters 
2020/06/08 09:28:59 - Sqoop export -         Launched map tasks=4
2020/06/08 09:28:59 - Sqoop export -         Data-local map tasks=2
2020/06/08 09:28:59 - Sqoop export -         Rack-local map tasks=2
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all maps in occupied slots (ms)=18912
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all reduces in occupied slots (ms)=0
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all map tasks (ms)=18912
2020/06/08 09:28:59 - Sqoop export -         Total vcore-milliseconds taken by all map tasks=18912
2020/06/08 09:28:59 - Sqoop export -         Total megabyte-milliseconds taken by all map tasks=19365888
2020/06/08 09:28:59 - Sqoop export -     Map-Reduce Framework
2020/06/08 09:28:59 - Sqoop export -         Map input records=102
2020/06/08 09:28:59 - Sqoop export -         Map output records=102
2020/06/08 09:28:59 - Sqoop export -         Input split bytes=556
2020/06/08 09:28:59 - Sqoop export -         Spilled Records=0
2020/06/08 09:28:59 - Sqoop export -         Failed Shuffles=0
2020/06/08 09:28:59 - Sqoop export -         Merged Map outputs=0
2020/06/08 09:28:59 - Sqoop export -         GC time elapsed (ms)=351
2020/06/08 09:28:59 - Sqoop export -         CPU time spent (ms)=5660
2020/06/08 09:28:59 - Sqoop export -         Physical memory (bytes) snapshot=857292800
2020/06/08 09:28:59 - Sqoop export -         Virtual memory (bytes) snapshot=10344894464
2020/06/08 09:28:59 - Sqoop export -         Total committed heap usage (bytes)=739770368
2020/06/08 09:28:59 - Sqoop export -         Peak Map Physical memory (bytes)=228241408
2020/06/08 09:28:59 - Sqoop export -         Peak Map Virtual memory (bytes)=2586439680
2020/06/08 09:28:59 - Sqoop export -     File Input Format Counters 
2020/06/08 09:28:59 - Sqoop export -         Bytes Read=0
2020/06/08 09:28:59 - Sqoop export -     File Output Format Counters 
2020/06/08 09:28:59 - Sqoop export -         Bytes Written=0
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Transferred 15.4961 KB in 18.7657 seconds (845.5859 bytes/sec)
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Exported 102 records.
2020/06/08 09:28:59 - sqoop_export - Finished job entry [Sqoop export] (result=[true])
2020/06/08 09:28:59 - sqoop_export - Job execution finished
2020/06/08 09:28:59 - Spoon - Job has ended.

        从日志中可以看到,作业已经成功执行。在作业所在目录,会生成一个名为sales_order.java的文件。

3. 检查MySQL表,结果如图5所示。

Kettle与Hadoop(七)执行Sqoop作业_第4张图片 图5

        从图5可以看到,通过Sqoop export作业项将HDFS文件内容导出到MySQL表中。
 

二、Sqoop import

1. 建立一个作业,将MySQL表数据导入到HDFS。
(1)打开PDI,新建一个作业,如图6所示。

图6

(2)编辑'Sqoop import'作业项,如图7所示。

Kettle与Hadoop(七)执行Sqoop作业_第5张图片 图7

        说明:

  • /user/root/sales_order是HDFS的目标目录,该目录应该尚不存在。

2. 保存并执行作业,日志如下所示。

2020/06/08 10:00:52 - Spoon - Starting job...
2020/06/08 10:00:52 - sqoop_import - Start of job execution
2020/06/08 10:00:52 - sqoop_import - Starting entry [Sqoop import]
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Running Sqoop version: 1.4.7-cdh6.1.0
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Setting your password on the command-line is insecure. Consider using -P instead.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Preparing to use a MySQL streaming resultset.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Beginning code generation
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $HADOOP_MAPRED_HOME is not set
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Writing jar file: /tmp/sqoop-root/compile/7c09fc949771ba78a584cc07c1cd6b3f/sales_order.jar
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Setting zero DATETIME behavior to convertToNull (mysql)
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Beginning import of sales_order
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - SQOOP_HOME is unset. May not be able to find all job dependencies.
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Connecting to ResourceManager at manager/172.16.1.124:8032
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Disabling Erasure Coding for path: /user/root/.staging/job_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Using read commited transaction isolation
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - BoundingValsQuery: SELECT MIN(`order_number`), MAX(`order_number`) FROM `sales_order`
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Split size: 25; Num splits: 4 from: 101 to: 202
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - number of splits:4
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Submitting tokens for job: job_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Executing with tokens: []
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Submitted application application_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - The url to track the job: http://manager:8088/proxy/application_1591323999364_0048/
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Running job: job_1591323999364_0048
2020/06/08 10:01:02 - Sqoop import - 2020/06/08 10:01:02 - Job job_1591323999364_0048 running in uber mode : false
2020/06/08 10:01:02 - Sqoop import - 2020/06/08 10:01:02 -  map 0% reduce 0%
2020/06/08 10:01:09 - Sqoop import - 2020/06/08 10:01:09 -  map 75% reduce 0%
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 -  map 100% reduce 0%
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Job job_1591323999364_0048 completed successfully
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Counters: 33
2020/06/08 10:01:11 - Sqoop import -     File System Counters
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of bytes read=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of bytes written=969236
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of read operations=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of large read operations=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of write operations=0
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes read=481
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes written=6120
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of read operations=24
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of large read operations=0
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of write operations=8
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes read erasure-coded=0
2020/06/08 10:01:11 - Sqoop import -     Job Counters 
2020/06/08 10:01:11 - Sqoop import -         Launched map tasks=4
2020/06/08 10:01:11 - Sqoop import -         Other local map tasks=4
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all maps in occupied slots (ms)=19048
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all reduces in occupied slots (ms)=0
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all map tasks (ms)=19048
2020/06/08 10:01:11 - Sqoop import -         Total vcore-milliseconds taken by all map tasks=19048
2020/06/08 10:01:11 - Sqoop import -         Total megabyte-milliseconds taken by all map tasks=19505152
2020/06/08 10:01:11 - Sqoop import -     Map-Reduce Framework
2020/06/08 10:01:11 - Sqoop import -         Map input records=102
2020/06/08 10:01:11 - Sqoop import -         Map output records=102
2020/06/08 10:01:11 - Sqoop import -         Input split bytes=481
2020/06/08 10:01:11 - Sqoop import -         Spilled Records=0
2020/06/08 10:01:11 - Sqoop import -         Failed Shuffles=0
2020/06/08 10:01:11 - Sqoop import -         Merged Map outputs=0
2020/06/08 10:01:11 - Sqoop import -         GC time elapsed (ms)=500
2020/06/08 10:01:11 - Sqoop import -         CPU time spent (ms)=5680
2020/06/08 10:01:11 - Sqoop import -         Physical memory (bytes) snapshot=783245312
2020/06/08 10:01:11 - Sqoop import -         Virtual memory (bytes) snapshot=10357563392
2020/06/08 10:01:11 - Sqoop import -         Total committed heap usage (bytes)=659554304
2020/06/08 10:01:11 - Sqoop import -         Peak Map Physical memory (bytes)=202842112
2020/06/08 10:01:11 - Sqoop import -         Peak Map Virtual memory (bytes)=2591666176
2020/06/08 10:01:11 - Sqoop import -     File Input Format Counters 
2020/06/08 10:01:11 - Sqoop import -         Bytes Read=0
2020/06/08 10:01:11 - Sqoop import -     File Output Format Counters 
2020/06/08 10:01:11 - Sqoop import -         Bytes Written=6120
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Transferred 5.9766 KB in 18.733 seconds (326.697 bytes/sec)
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Retrieved 102 records.
2020/06/08 10:01:11 - sqoop_import - Finished job entry [Sqoop import] (result=[true])
2020/06/08 10:01:11 - sqoop_import - Job execution finished
2020/06/08 10:01:11 - Spoon - Job has ended.

        从日志中可以看到,作业已经成功执行。在作业所在目录,会生成一个名为sales_order.java的文件。

3. 检查HDFS目标目录,结果如图8所示。

Kettle与Hadoop(七)执行Sqoop作业_第6张图片 图8

        从图8可以看到,通过Sqoop import作业项将MySQL表中数据导入到HDFS中。

你可能感兴趣的:(Kettle与Hadoop(七)执行Sqoop作业)