Sqoop导入hive注意事项

转:http://blog.csdn.net/chaiyiping/article/details/40622507

Sqoop从oracle导入数据到hive,示例:

[plain]   view plain  copy
  1. sqoop import --connect jdbc:oracle:thin:@oracle-host:port:orcl --username name--password passwd --hive-import -table tablename  

如果不加其他参数,导入的数据默认的列分隔符是'\001',默认的行分隔符是'\n'。

这样问题就来了,如果导入的数据中有'\n',hive会认为一行已经结束,后面的数据被分割成下一行。这种情况下,导入之后hive中数据的行数就比原先数据库中的多,而且会出现数据不一致的情况。

Sqoop也指定了参数 --fields-terminated-by --lines-terminated-by来自定义行分隔符和列分隔符。

可是当你真的这么做时.........o(╯□╰)o就会出现如下错误:

[plain]   view plain  copy
  1. INFO hive.HiveImport: FAILED: SemanticException 1:381 LINES TERMINATED BY only supports newline '\n' right now.  
也就是说虽然你通过 --lines-terminated-by 指定了其他的字符作为行分隔符,但是hive只支持'\n'作为行分隔符。

简单的解决办法就是加上参数--hive-drop-import-delims来把导入数据中包含的hive默认的分隔符去掉。





3. null字段填充符需指定

没有指定null字段分隔符,导致错位。
[hadoop@hs11 ~]$ sqoop export –connect jdbc:mysql://10.10.20.11/test –username root  –password admin –table test –export-dir /user/hive/warehouse/actmp –input-fields-tminated-by ‘\001′
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
13/08/21 09:21:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/08/21 09:21:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/08/21 09:21:07 INFO tool.CodeGenTool: Beginning code generation
13/08/21 09:21:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
13/08/21 09:21:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
13/08/21 09:21:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-1.1.2
Note: /tmp/sqoop-hadoop/compile/04d183c9e534cdb8d735e1bdc4be3deb/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/08/21 09:21:08 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/04d183c9e534cdb8d735e1bdc4be3deb/test.jar
13/08/21 09:21:08 INFO mapreduce.ExportJobBase: Beginning export of test
13/08/21 09:21:09 INFO input.FileInputFormat: Total input paths to process : 1
13/08/21 09:21:09 INFO input.FileInputFormat: Total input paths to process : 1
13/08/21 09:21:09 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/21 09:21:09 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/21 09:21:10 INFO mapred.JobClient: Running job: job_201307251523_0061
13/08/21 09:21:11 INFO mapred.JobClient:  map 0% reduce 0%
13/08/21 09:21:17 INFO mapred.JobClient:  map 25% reduce 0%
13/08/21 09:21:19 INFO mapred.JobClient:  map 50% reduce 0%
13/08/21 09:21:21 INFO mapred.JobClient: Task Id : attempt_201307251523_0061_m_000001_0, Status : FAILED
java.io.IOException: Can’t export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NumberFormatException: For input string: “665A5FFA-32C9-9463-1943-840A5FEAE193″
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java.lang.Integer.valueOf(Integer.java:554)
at test.__loadFromFields(test.java:264)
at test.parse(test.java:201)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
… 10 more
===========

4.成功

[hadoop@hs11 ~]$ sqoop export –connect jdbc:mysql://10.10.20.11/test –username root  –password admin –table test –export-dir /user/hive/warehouse/actmp –input-fields-terminated-by ‘\001′ –input-null-string ‘\\N’ –input-null-non-string ‘\\N’
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
13/08/21 09:36:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/08/21 09:36:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/08/21 09:36:13 INFO tool.CodeGenTool: Beginning code generation
13/08/21 09:36:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
13/08/21 09:36:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
13/08/21 09:36:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-1.1.2
Note: /tmp/sqoop-hadoop/compile/e22d31391498b790d799897cde25047d/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/08/21 09:36:14 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e22d31391498b790d799897cde25047d/test.jar
13/08/21 09:36:14 INFO mapreduce.ExportJobBase: Beginning export of test
13/08/21 09:36:15 INFO input.FileInputFormat: Total input paths to process : 1
13/08/21 09:36:15 INFO input.FileInputFormat: Total input paths to process : 1
13/08/21 09:36:15 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/21 09:36:15 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/21 09:36:16 INFO mapred.JobClient: Running job: job_201307251523_0064
13/08/21 09:36:17 INFO mapred.JobClient:  map 0% reduce 0%
13/08/21 09:36:23 INFO mapred.JobClient:  map 25% reduce 0%
13/08/21 09:36:25 INFO mapred.JobClient:  map 100% reduce 0%
13/08/21 09:36:27 INFO mapred.JobClient: Job complete: job_201307251523_0064
13/08/21 09:36:27 INFO mapred.JobClient: Counters: 18
13/08/21 09:36:27 INFO mapred.JobClient:   Job Counters
13/08/21 09:36:27 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=13151
13/08/21 09:36:27 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/08/21 09:36:27 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/08/21 09:36:27 INFO mapred.JobClient:     Rack-local map tasks=2
13/08/21 09:36:27 INFO mapred.JobClient:     Launched map tasks=4
13/08/21 09:36:27 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/08/21 09:36:27 INFO mapred.JobClient:   File Output Format Counters
13/08/21 09:36:27 INFO mapred.JobClient:     Bytes Written=0
13/08/21 09:36:27 INFO mapred.JobClient:   FileSystemCounters
13/08/21 09:36:27 INFO mapred.JobClient:     HDFS_BYTES_READ=1519
13/08/21 09:36:27 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=234149
13/08/21 09:36:27 INFO mapred.JobClient:   File Input Format Counters
13/08/21 09:36:27 INFO mapred.JobClient:     Bytes Read=0
13/08/21 09:36:27 INFO mapred.JobClient:   Map-Reduce Framework
13/08/21 09:36:27 INFO mapred.JobClient:     Map input records=6
13/08/21 09:36:27 INFO mapred.JobClient:     Physical memory (bytes) snapshot=663863296
13/08/21 09:36:27 INFO mapred.JobClient:     Spilled Records=0
13/08/21 09:36:27 INFO mapred.JobClient:     CPU time spent (ms)=3720
13/08/21 09:36:27 INFO mapred.JobClient:     Total committed heap usage (bytes)=2013790208
13/08/21 09:36:27 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=5583151104
13/08/21 09:36:27 INFO mapred.JobClient:     Map output records=6
13/08/21 09:36:27 INFO mapred.JobClient:     SPLIT_RAW_BYTES=571
13/08/21 09:36:27 INFO mapreduce.ExportJobBase: Transferred 1.4834 KB in 12.1574 seconds (124.9446 bytes/sec)
13/08/21 09:36:27 INFO mapreduce.ExportJobBase: Exported 6 records.

你可能感兴趣的:(sqoop)