安装
导入,一个mysql的坑
我们导入hive表的DBS表
➜ sqoop git:(master) ✗ sqoop import --connect jdbc:mysql://localhost:3306/hive --table DBS --username root -password root
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3901d134 is still active.
Warning: /Users/chenxiaokang/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/chenxiaokang/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/08/07 10:52:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
18/08/07 10:52:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/08/07 10:52:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/08/07 10:52:24 INFO tool.CodeGenTool: Beginning code generation
18/08/07 10:52:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `DBS` AS t LIMIT 1
18/08/07 10:52:25 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3901d134 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3901d134 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
这是MySQL的一个bug,把(lib目录下)mysql的连接jar包mysql-connector-java-5.1.13-bin.jar
换成mysql-connector-java-5.1.32.jar
就好了。
➜ lib git:(master) ✗ sqoop import --connect jdbc:mysql://localhost:3306/hive --table DBS --username root -password root
Warning: /Users/chenxiaokang/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/chenxiaokang/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/08/07 11:01:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
18/08/07 11:01:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/08/07 11:01:47 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/08/07 11:01:47 INFO tool.CodeGenTool: Beginning code generation
18/08/07 11:01:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `DBS` AS t LIMIT 1
18/08/07 11:01:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `DBS` AS t LIMIT 1
18/08/07 11:01:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/chenxiaokang/hadoop-2.7.6
Note: /tmp/sqoop-chenxiaokang/compile/3ecfbbea71dfb1dd1314eba358b9a7d7/DBS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/08/07 11:01:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-chenxiaokang/compile/3ecfbbea71dfb1dd1314eba358b9a7d7/DBS.jar
18/08/07 11:01:52 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/08/07 11:01:52 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/08/07 11:01:52 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/08/07 11:01:52 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/08/07 11:01:52 INFO mapreduce.ImportJobBase: Beginning import of DBS
18/08/07 11:01:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/07 11:01:53 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/08/07 11:01:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/08/07 11:01:56 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
18/08/07 11:02:02 INFO db.DBInputFormat: Using read commited transaction isolation
18/08/07 11:02:02 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`DB_ID`), MAX(`DB_ID`) FROM `DBS`
18/08/07 11:02:02 INFO mapreduce.JobSubmitter: number of splits:4
18/08/07 11:02:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533537460397_0001
18/08/07 11:02:04 INFO impl.YarnClientImpl: Submitted application application_1533537460397_0001
18/08/07 11:02:05 INFO mapreduce.Job: The url to track the job: http://172.20.10.3:8088/proxy/application_1533537460397_0001/
18/08/07 11:02:05 INFO mapreduce.Job: Running job: job_1533537460397_0001
18/08/07 11:02:23 INFO mapreduce.Job: Job job_1533537460397_0001 running in uber mode : false
18/08/07 11:02:23 INFO mapreduce.Job: map 0% reduce 0%
18/08/07 11:02:38 INFO mapreduce.Job: map 50% reduce 0%
18/08/07 11:02:39 INFO mapreduce.Job: map 100% reduce 0%
18/08/07 11:02:40 INFO mapreduce.Job: Job job_1533537460397_0001 completed successfully
18/08/07 11:02:40 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=565736
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=417
HDFS: Number of bytes written=158
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Killed map tasks=1
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=50716
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=50716
Total vcore-milliseconds taken by all map tasks=50716
Total megabyte-milliseconds taken by all map tasks=51933184
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=417
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=454
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=440926208
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=158
18/08/07 11:02:40 INFO mapreduce.ImportJobBase: Transferred 158 bytes in 44.4693 seconds (3.553 bytes/sec)
18/08/07 11:02:40 INFO mapreduce.ImportJobBase: Retrieved 2 records.
可以看到hdfs中就有了我们导入的mysql的数据
0: jdbc:hive2://localhost:10000> dfs -ls /user/hdfs;
+----------------------------------------------------------------------------+--+
| DFS Output |
+----------------------------------------------------------------------------+--+
| Found 1 items |
| drwxr-xr-x - hdfs supergroup 0 2018-08-07 11:02 /user/hdfs/DBS |
+----------------------------------------------------------------------------+--+
2 rows selected (0.01 seconds)
0: jdbc:hive2://localhost:10000> dfs -cat /user/hdfs/DBS/part-m-00000;
+----------------------------------------------------------------------------------------+--+
| DFS Output |
+----------------------------------------------------------------------------------------+--+
| 1,Default Hive database,hdfs://localhost:9000/user/hive/warehouse,default,public,ROLE |
+----------------------------------------------------------------------------------------+--+
1 row selected (0.024 seconds)
导入过程
SELECT col1,col2,... FROM table WHERE id >= 0 AND id < 50000;
,SELECT col1,col2,... FROM table WHERE id >= 50000 AND id < 100000;
sqoop create-hive-table --connect jdbc://master:3306/hive --table DBS --fields-terminated-by ',' --username [username] --password [password]
,然后LOAD数据即可。--fields-terminated-by ','
指明Hive中的DBS表的列分隔符。sqoop import --connect jdbc:mysql://master:3306/hive --table DBS --username [usernmae] --password [password] -m [num] --hive-import
导出过程
例子:导出sc表到MySQL
0: jdbc:hive2://localhost:10000> DESC sc;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| id | bigint | |
| courseid | bigint | |
| account | string | |
+-----------+------------+----------+--+
3 rows selected (0.185 seconds)
0: jdbc:hive2://localhost:10000>
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table sc(id bigint, courseid bigint, account varchar(32));
Query OK, 0 rows affected (0.03 sec)
➜ bin git:(master) ✗ sqoop export --connect jdbc:mysql://localhost:3306/test --table sc --export-dir /user/hive/warehouse/sc --username root --password root -m 1 --fields-terminated-by ',';
Warning: /Users/chenxiaokang/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/chenxiaokang/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/08/07 12:23:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
18/08/07 12:23:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/08/07 12:23:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/08/07 12:23:32 INFO tool.CodeGenTool: Beginning code generation
18/08/07 12:23:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sc` AS t LIMIT 1
18/08/07 12:23:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sc` AS t LIMIT 1
18/08/07 12:23:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/chenxiaokang/hadoop-2.7.6
Note: /tmp/sqoop-chenxiaokang/compile/fb350fa941a369d077323a7b646b5380/sc.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/08/07 12:23:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-chenxiaokang/compile/fb350fa941a369d077323a7b646b5380/sc.jar
18/08/07 12:23:35 INFO mapreduce.ExportJobBase: Beginning export of sc
18/08/07 12:23:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/07 12:23:35 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/08/07 12:23:37 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/08/07 12:23:37 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/08/07 12:23:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/08/07 12:23:37 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
18/08/07 12:23:42 INFO input.FileInputFormat: Total input paths to process : 1
18/08/07 12:23:42 INFO input.FileInputFormat: Total input paths to process : 1
18/08/07 12:23:43 INFO mapreduce.JobSubmitter: number of splits:1
18/08/07 12:23:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
18/08/07 12:23:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533537460397_0002
18/08/07 12:23:44 INFO impl.YarnClientImpl: Submitted application application_1533537460397_0002
18/08/07 12:23:44 INFO mapreduce.Job: The url to track the job: http://172.20.10.3:8088/proxy/application_1533537460397_0002/
18/08/07 12:23:44 INFO mapreduce.Job: Running job: job_1533537460397_0002
18/08/07 12:23:57 INFO mapreduce.Job: Job job_1533537460397_0002 running in uber mode : false
18/08/07 12:23:57 INFO mapreduce.Job: map 0% reduce 0%
18/08/07 12:24:04 INFO mapreduce.Job: map 100% reduce 0%
18/08/07 12:24:05 INFO mapreduce.Job: Job job_1533537460397_0002 completed successfully
18/08/07 12:24:05 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=141082
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1044
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4797
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4797
Total vcore-milliseconds taken by all map tasks=4797
Total megabyte-milliseconds taken by all map tasks=4912128
Map-Reduce Framework
Map input records=82
Map output records=82
Input split bytes=132
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=74
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=121110528
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
18/08/07 12:24:05 INFO mapreduce.ExportJobBase: Transferred 1.0195 KB in 28.176 seconds (37.0528 bytes/sec)
18/08/07 12:24:05 INFO mapreduce.ExportJobBase: Exported 82 records.