1、sqoop从MySQL导入数据到hive时,报:
20/09/18 11:20:33 INFO mapreduce.Job: Job job_1600395587790_0002 failed with state FAILED due to: Application application_1600395587790_0002 failed 2 times due to AM Container for appattempt_1600395587790_0002_000002 exited with exitCode: -104
Failing this attempt.Diagnostics: [2020-09-18 11:20:32.442]Container [pid=69122,containerID=container_e106_1600395587790_0002_02_000001] is running 58400768B beyond the 'PHYSICAL' memory limit. Current usage: 225.7 MB of 170 MB physical memory used; 2.0 GB of 357.0 MB virtual memory used. Killing container.
解决办法:
关键错误为“Current usage: 225.7 MB of 170 MB physical memory used; 2.0 GB of 357.0 MB virtual memory used”,意思为用了225.7M物理内存,但是只有170M,用了2.0G虚拟内存,但是只有357M。
在yarn.site 中设置 yarn.scheduler.minimum-allocation-mb 的值为256MB问题解决
如果提示虚拟内存不足,可以关闭检查,如下设置:yarn.nodemanager.vmem-check-enabled false
2、从sqoop导入到hive表的数据全为NULL
解决办法:
建表和导入的字段分隔符fields terminated by '\001'不一致造成的,统一设置为'\001'即可
3、sqoop import 导入到hive后数据量变多的问题:
从sqoop导入到HIV中后,使用select count(*) 进行统计,会发现数据量比原MySQL数据库的数据量多,而且比sqoop日志打印的“Retrieved 52136 records.”也要多
解决办法:
1)--split-by时,使用的切分字段不是int型,有重复造成的,详见:sqoop import 导入到hive后数据量变多的问题_IKnowNothinglee的博客-CSDN博客
2)因为分隔符的问题造成的,详见:关于在sqoop导入数据的时候,数据量变多的解决方案。_weixin_30693183的博客-CSDN博客
4、sqoop导出到mysql,报错:java.io.FileNotFoundException: Path is not a file
执行如下导出命令时
sqoop export --connect jdbc:mysql://192.56.1.111:3306/bigdata --username root --password 2342344 --table user_tb_summary --fields-terminated-by '\001' --update-key date_str --update-mode allowinsert --export-dir /warehouse/tablespace/managed/hive/taxbook1.db/user_summary/
报错以下错误:
20/10/10 18:34:03 ERROR tool.ExportTool: Encountered IOException running export job:
java.io.FileNotFoundException: Path is not a file: /warehouse/tablespace/managed/hive/taxbook1.db/user_summary/delta_0000053_0000053_0000
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:90)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
查看user_summary表下目录,发现多了一层目录,经过重重排查发现hive版本是3.1,hive3.x用的不是mr引擎,用的tez引擎,所以会生成一层目录(貌似是这个原因)
[hdfs@hadoop05 sqoop_job]$ hdfs dfs -ls /warehouse/tablespace/managed/hive/taxbook1.db/user_summary
Found 10 items
drwxrwx---+ - hive hadoop 0 2020-10-10 14:56 /warehouse/tablespace/managed/hive/taxbook1.db/user_summary/delta_0000053_0000053_0000
drwxrwx---+ - hive hadoop 0 2020-10-10 14:57 /warehouse/tablespace/managed/hive/taxbook1.db/user_summary/delta_0000054_0000054_0000
解决办法:使用占位符
sqoop export --connect jdbc:mysql://192.56.1.111:3306/bigdata --username root --password 2342344 --table user_tb_summary --fields-terminated-by '\001' --update-key date_str --update-mode allowinsert --export-dir /warehouse/tablespace/managed/hive/taxbook1.db/user_summary/delta*
5、在执行到 连接hive2时卡住
21/05/25 14:02:42 INFO hive.HiveImport: Connecting to jdbc:hive2://hadoop02.com:2181,hadoop01.com:2181,hadoop03.com:2181,hadoop04.com:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
初次使用时执行到这里会卡住,解决办法如下:
在hive conf目录(一般在/etc/hive/conf)新建(如果没有)一个beeline-hs2-connection.xml文件(使用hive用户),然后再次执行即可。
beeline.hs2.connection.user
hive
beeline.hs2.connection.password
hive
6、sqoop任务运行中mapreduce job报错:
21/06/11 09:40:08 INFO impl.YarnClientImpl: Submitted application application_1622620346132_0051
21/06/11 09:40:08 INFO mapreduce.Job: The url to track the job: http://hadoop01.com:8088/proxy/application_1622620346132_0051/
21/06/11 09:40:08 INFO mapreduce.Job: Running job: job_1622620346132_0051
21/06/11 09:49:33 INFO mapreduce.Job: Job job_1622620346132_0051 running in uber mode : false
21/06/11 09:49:33 INFO mapreduce.Job: map 0% reduce 0%
21/06/11 09:49:38 INFO mapreduce.Job: map 100% reduce 0%
21/06/11 09:52:37 INFO mapreduce.Job: Job job_1622620346132_0051 failed with state FAILED due to: Task failed task_1622620346132_0051_m_000003
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 021/06/11 09:52:37 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=1
Killed map tasks=3
Launched map tasks=4
Rack-local map tasks=4
Total time spent by all maps in occupied slots (ms)=10237
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10237
Total vcore-milliseconds taken by all map tasks=10237
Total megabyte-milliseconds taken by all map tasks=10482688
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
21/06/11 09:52:37 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21/06/11 09:52:37 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 774.6075 seconds (0 bytes/sec)
21/06/11 09:52:37 INFO mapreduce.ExportJobBase: Exported 0 records.
21/06/11 09:52:37 ERROR mapreduce.ExportJobBase: Export job failed!
21/06/11 09:52:37 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:930)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:94)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:113)
at org.apache.sqoop.Sqoop.run(Sqoop.java:151)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:187)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:241)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:250)
at org.apache.sqoop.Sqoop.main(Sqoop.java:259)
没有任何详细报错信息,但是可以看出是Task failed task_1622620346132_0051_m_000003任务报错了,那么此时可以查看yarn日志,拿到该task的所属的applicationId,然后再yarn所在主机执行以下命令即可看到详细日志信息:
yarn logs -applicationId application_1622620346132_0051