sqoop2 shell 数据导入导出


title: sqoop2 shell 使用
tags: 新建,模板,小书匠
grammar_cjkRuby: true


从MySQL数据库中导入数据到HDFS中

显示 可用创建link的connector

sqoop:000> show connector
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+------------------------+---------+------------------------------------------------------------+----------------------+
|          Name          | Version |                           Class                            | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
+------------------------+---------+------------------------------------------------------------+----------------------+

创建mysqllink

sqoop:000> update link -n mysqllink
Updating link with name mysqllink
Please update link:
Name: mysqllink

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master:3306/forsqoop?useSSL=false
Username: root
Password: ******
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose: `
link was successfully updated with status OK

创建 hdfslink

sqoop:000> update link -n hdfslink
Updating link with name hdfslink
Please update link:
Name: hdfslink

HDFS cluster

URI: hdfs://master:9000
Conf directory: /usr/hadoop-2.6.4/etc/hadoop/
Additional configs:: 
There are currently 0 values in the map:
entry# 
link was successfully updated with status OK

显示创建成功的link

sqoop:000> show link
+-----------+------------------------+---------+
|   Name    |     Connector Name     | Enabled |
+-----------+------------------------+---------+
| mysqllink | generic-jdbc-connector | true    |
| hdfslink  | hdfs-connector         | true    |
+-----------+------------------------+---------+

更新 job

sqoop:000> update job -n frommysqljob
Updating job with name frommysqljob
Please update job:
Name: frommysqljob

Database source

Schema name: forsqoop
Table name: sds
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: SD_ID
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom codec: 
Output directory: /user/root/sqoop/frommysql
Append mode: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
Job was successfully updated with status OK

开启任务

sqoop:000> start job -n frommysqljob
Submission details
Job Name: frommysqljob
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2017-06-25 14:43:47 CST
Lastly updated by: root
External ID: job_1498352893725_0013
        http://master:8088/proxy/application_1498352893725_0013/
2017-06-25 14:43:47 CST: BOOTING  - Progress is not available

查看执行任务的结果

enter description here

易出现问题的地方

1 mysql 没有打开远程链接,导致can't get a connector 错误的发生
进入mysql开启远程访问
@'192.168.137.121'可以替换为@‘%’就可任意ip访问,当然我们也可以直接用 UPDATE 更新 root 用户 Host, 但不推荐,123456是密码 SQL如下:

GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.137.121' IDENTIFIED BY '123456' WITH GRANT OPTION;
flush privileges;

2 配置 generic-jdbc-connector link时 Identifier enclose: `参数要设置为`或者是空格

3 mysql导入hdfs时表结构如果没有主键需要显示指定Partition column: 参数为表中的一个字段分区,否则会因为没有主键导致导入报错。

4 MySQL中的Connection String:参数配置路径不能使用localhost因为你不知道yarn会分配任务给那一台机器执行此操作,使用master从机后都配过master的IP地址,这样就不用担心从机获取不到master主机上的数据库数据了。

从HDFS导出数据到MySQL数据库

创建 tomysqllink

sqoop:000> create link -n generic-jdbc-connector
Invalid command invocation: Unknown option encountered: -n
sqoop:000> create link  generic-jdbc-connector
Invalid command invocation: Missing required option: c
sqoop:000> create link -c  generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: tomysqllink

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master:3306/forsqoop?useSSL=false
Username: root
Password: ******
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose: `
New link was successfully created with validation status OK and name tomysqllink

创建fromhdfslink link

sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: fromhdfslink

HDFS cluster

URI: hdfs://master:9000
Conf directory: /usr/hadoop-2.6.4/etc/hadoop/
Additional configs:: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and name fromhdfslink

展示创建成功的link

sqoop:000> show link
+--------------+------------------------+---------+
|     Name     |     Connector Name     | Enabled |
+--------------+------------------------+---------+
| mysqllink    | generic-jdbc-connector | true    |
| tomysqllink  | generic-jdbc-connector | true    |
| hdfslink     | hdfs-connector         | true    |
| fromhdfslink | hdfs-connector         | true    |
+--------------+------------------------+---------+

创建 tomysqljob job

sqoop:000> create job --f fromhdfslink --t tomysqllink
Creating job for links with from name fromhdfslink and to name tomysqllink
Please fill following values to create new job object
Name: tomysqljob

Input configuration

Input directory: /user/root/sqoop/frommysql
Override null value: N
Null value: ture

Incremental import

Incremental type: 
  0 : NONE
  1 : NEW_FILES
Choose: 0
Last imported date: 

Database target

Schema name: forsqoop
Table name: sds
Column names: 
There are currently 0 values in the list:
element# 
Staging table: 
Clear stage table: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name tomysqljob

展示创建成功的job

sqoop:000> show job
+----+--------------+------------------------------------+--------------------------------------+---------+
| Id |     Name     |           From Connector           |             To Connector             | Enabled |
+----+--------------+------------------------------------+--------------------------------------+---------+
| 1  | frommysqljob | mysqllink (generic-jdbc-connector) | hdfslink (hdfs-connector)            | true    |
| 2  | tomysqljob   | fromhdfslink (hdfs-connector)      | tomysqllink (generic-jdbc-connector) | true    |
+----+--------------+------------------------------------+--------------------------------------+---------+

执行导入MySQL数据库表的job

sqoop:000> start job -n tomysqljob
Submission details
Job Name: tomysqljob
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2017-06-25 15:03:10 CST
Lastly updated by: root
External ID: job_1498352893725_0014
        http://master:8088/proxy/application_1498352893725_0014/
2017-06-25 15:03:10 CST: BOOTING  - Progress is not available

查看job的执行情况

enter description here

相关链接

Command Line Shell 命令

Connector-GenericJDBC 配置
Connector-HDFS 配置
其中的 FROM Job ConfigurationTO Job Configuration 分别会在创建job的时候用到里面的参数

自定义 Connector Development 默认支持导入导出CSV

你可能感兴趣的:(sqoop2 shell 数据导入导出)