sqoop:mysql import to hdfs

sqoop import
–connect jdbc:mysql://localhost:3306/chen
–username root
–password 123
–table test

报错:
Exception in thread “main” java.lang.NoClassDefFoundError: org/json/JSONObject
at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42)

解决方案:

下载 java-json.jar

导出的位置: hdfs dfs -ls /user/hadoop

导出日志:

18/10/30 09:46:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/10/30 09:46:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/10/30 09:46:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/10/30 09:46:59 INFO tool.CodeGenTool: Beginning code generation
18/10/30 09:46:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM test AS t LIMIT 1
18/10/30 09:46:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM test AS t LIMIT 1
18/10/30 09:46:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop
Note: /tmp/sqoop-hadoop/compile/c09c40ce6aefc16447052c26316ab4af/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/10/30 09:47:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/c09c40ce6aefc16447052c26316ab4af/test.jar
18/10/30 09:47:01 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/10/30 09:47:01 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/10/30 09:47:01 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/10/30 09:47:01 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/10/30 09:47:01 INFO mapreduce.ImportJobBase: Beginning import of test
18/10/30 09:47:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
18/10/30 09:47:02 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/10/30 09:47:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/10/30 09:47:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/30 09:47:04 INFO db.DBInputFormat: Using read commited transaction isolation
18/10/30 09:47:04 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(user), MAX(user) FROM test
18/10/30 09:47:04 WARN db.TextSplitter: Generating splits for a textual index column.
18/10/30 09:47:04 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
18/10/30 09:47:04 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
18/10/30 09:47:04 INFO mapreduce.JobSubmitter: number of splits:4
18/10/30 09:47:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540200454240_0014
18/10/30 09:47:04 INFO impl.YarnClientImpl: Submitted application application_1540200454240_0014
18/10/30 09:47:04 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1540200454240_0014/
18/10/30 09:47:04 INFO mapreduce.Job: Running job: job_1540200454240_0014
18/10/30 09:47:12 INFO mapreduce.Job: Job job_1540200454240_0014 running in uber mode : false
18/10/30 09:47:12 INFO mapreduce.Job: map 0% reduce 0%
18/10/30 09:47:22 INFO mapreduce.Job: map 25% reduce 0%
18/10/30 09:47:24 INFO mapreduce.Job: map 50% reduce 0%
18/10/30 09:47:25 INFO mapreduce.Job: map 75% reduce 0%
18/10/30 09:47:26 INFO mapreduce.Job: map 100% reduce 0%
18/10/30 09:47:26 INFO mapreduce.Job: Job job_1540200454240_0014 completed successfully
18/10/30 09:47:26 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=550136
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=493
HDFS: Number of bytes written=51
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=35952
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=35952
Total vcore-seconds taken by all map tasks=35952
Total megabyte-seconds taken by all map tasks=36814848
Map-Reduce Framework
Map input records=4
Map output records=4
Input split bytes=493
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=184
CPU time spent (ms)=3950
Physical memory (bytes) snapshot=703848448
Virtual memory (bytes) snapshot=6232666112
Total committed heap usage (bytes)=457179136
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=51
18/10/30 09:47:26 INFO mapreduce.ImportJobBase: Transferred 51 bytes in 23.7328 seconds (2.1489 bytes/sec)
18/10/30 09:47:26 INFO mapreduce.ImportJobBase: Retrieved 4 records

sqoop import
–connect jdbc:mysql://localhost:3306/chen
–username root
–password 123
–table test
–mapreduce-job-name FromMySQLToHDFS
–delete-target-dir
-m 1

常用参数:

–mapreduce-job-name 改MR名字
-m 默认是4 能忍受就行 不要太大
–columns 需要导出的列
–target-dir 目标位置
–null-non-string ‘0’ \ 非字符串空值转换
–null-string ‘’ \ 字符串空值转换
–where ‘SAL>2000’ where

示例:

sqoop import
–connect jdbc:mysql://localhost:3306/chen
–username root
–password 123
–table chen
–mapreduce-job-name FromMySQLToHDFS
–delete-target-dir
-m 1
–null-non-string ‘0’
–null-string ‘0’

sqoop import
–connect jdbc:mysql://localhost:3306/chen
–username root
–password 123
–table chen
–mapreduce-job-name FromMySQLToHDFS
–delete-target-dir
-m 1
–null-non-string ‘0’
–null-string ‘0’
–where “user=‘chen’”

sqoop import
–connect jdbc:mysql://localhost:3306/chen
–username root
–password 123
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
–target-dir EMP_COLUMN_QUERY
–fields-terminated-by ‘\t’
–null-non-string ‘0’
–null-string ‘’
-e ‘select * from chen where host is not null and $CONDITIONS’

或者 “SELECT * FROM x WHERE a=‘foo’ AND $CONDITIONS”

sqoop import
–connect jdbc:mysql://localhost:3306/chen
–username root
–password 123
–table d5_emp
–delete-target-dir
–fields-terminated-by ‘\t’
–target-dir EMP_COLUMN_columns
–columns ‘EMPNO,ENAME,JOB,SAL,COMM’
-m 1

你可能感兴趣的:(sqoop)