Sqoop导入mysql数据到HDFS

Sqoop导入mysql数据到HDFS

打算把mysql中爬虫爬到的数据导入到HDFS中供后续学习kylin以及研究使用,以前导数据经常会自己写一段java脚本去导入,这次打算用sqoop试一试效果

首先看了下mysql的数据量,不到900万条数据,不多不少

Sqoop导入mysql数据到HDFS_第1张图片

从官网下载好sqoop进行安装

http://sqoop.apache.org/docs/1.99.7/admin/Installation.html
中间也经历了一点波折,一个是hadoop的conf文件目录要配置正确
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/apps/hadoop-2.7.7/etc/hadoop
将mysql connector jar包丢到/apps/sqoop-1.99.7-bin-hadoop200/server/lib目录

启动sqoop

root@hadoop:/apps/sqoop-1.99.7-bin-hadoop200/bin# ./sqoop2-server start
Setting conf dir: /apps/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /apps/sqoop-1.99.7-bin-hadoop200
Starting the Sqoop2 server...
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
15   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Sqoop2 server started.

使用sqoopshell

http://sqoop.apache.org/docs/1.99.7/user/Sqoop5MinutesDemo.html

root@hadoop:/apps/sqoop-1.99.7-bin-hadoop200/bin# ./sqoop2-shell 
Setting conf dir: /apps/sqoop-1.99.7-bin-hadoop200/bin/../conf
Sqoop home directory: /apps/sqoop-1.99.7-bin-hadoop200
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000> set server --host hadoop --port 12000 --webapp sqoop
Server is set successfully

创建source link

sqoop:000> create link -connector generic-jdbc-connector
1    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: mysql-source

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://10.0.0.20:3306/jd
Username: root
Password: ***********
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
entry# 

SQL Dialect

Identifier enclose:

创建sink link

sqoop:000> create link -connector hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfs-sink

HDFS cluster

URI: hdfs://hadoop:9000
Conf directory: /apps/hadoop-2.7.7/etc/hadoop
Additional configs:: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and name hdfs-sink

创建job

sqoop:000> create job -f "mysql-source" -t "hdfs-sink"
Creating job for links with from name mysql-source and to name hdfs-sink
Please fill following values to create new job object
Name: jd_trans

Database source

Schema name: jd
Table name: items
SQL statement: select * from items//此处有错误,必须加条件
Column names: 
There are currently 0 values in the list:
element# 
Partition column: 
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom codec: 
Output directory: /sqoop/jd/items
Append mode: 

Throttling resources

Extractors: 2
Loaders: 2

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 

 最后报错了,后来发现是select语句必须带条件,于是干脆删了语句

执行任务

开始导数据,cpu一下满了

sqoop:000> start job -name jd_trans
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Sat Aug 11 05:25:46 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sat Aug 11 05:25:48 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2018-08-11 05:25:53,723 [LocalJobRunner Map Task Executor #0] INFO  org.apache.sqoop.job.mr.SqoopMapper  - Starting progress service
2018-08-11 05:25:53,726 [LocalJobRunner Map Task Executor #0] INFO  org.apache.sqoop.job.mr.SqoopMapper  - Running extractor class org.apache.sqoop.connector.jdbc.GenericJdbcExtractor
Sat Aug 11 05:25:53 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2018-08-11 05:25:53,766 [LocalJobRunner Map Task Executor #0] INFO  org.apache.sqoop.connector.jdbc.GenericJdbcExtractor  - Creating PreparedStatement with query: SELECT * FROM  jd . items  WHERE ? <=  id  AND  id  < ?

你可能感兴趣的:(大数据)