sqoop完成hive和mysql之间互导数据遇到的坑

sqoop完成hive  export到 mysql或者mysql import到hive时,其实是翻译成MapReduce job来执行的,所以看日志的地方有两个,一个是http://node1:8088(resoucemanager的web入口),另一个是http://node1:19888(jobhistory的web入口),我的是这么配置的。以下是我此过程遇到过的坑:

1.

18/04/18 04:50:06 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/04/18 04:50:07 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/04/18 04:50:08 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/04/18 04:50:08 ERROR tool.ExportTool: Encountered IOException running export job: 
java.io.IOException: java.net.ConnectException: Call From node1/172.16.15.130 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:338)
	at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:423)
	at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:576)
	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326)
	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
	at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:623)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1330)
	at org.apache.sqoop.mapreduce.ExportJobBase.doSubmitJob(ExportJobBase.java:324)
	at org.apache.sqoop.mapreduce.ExportJobBase.runJob(ExportJobBase.java:301)
	at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:442)
	at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
	at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
	at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.net.ConnectException: Call From node1/172.16.15.130 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
	at org.apache.hadoop.ipc.Client.call(Client.java:1474)
	at org.apache.hadoop.ipc.Client.call(Client.java:1401)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
	at com.sun.proxy.$Proxy14.getJobReport(Unknown Source)
	at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
	at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:324)
	... 22 more

 

这是因为jobhistory没开启导致的,解决方法是在你配置jobhistory的主机上mr-jobhistory-daemon.sh  start  historyserver。然后jps看下如果出现JobHistoryServer这个进程再重新执行sqoop命令就行了。

 

注意事项:

a.首先要在mapred-site.xml里配置jobhistory的rpc端口和web端口:

        
                mapreduce.jobhistory.address
                node1:10020
        
                mapreduce.jobhistory.webapp.address
                node1:19888
        

b.如果你没有配置环境变量的话mr-jobhistory-daemon.sh  start  historyserver  指令前要加上路径

2.

18/04/18 04:42:13 INFO mapreduce.Job: Counters: 12
	Job Counters 
		Failed map tasks=1
		Killed map tasks=3
		Launched map tasks=4
		Data-local map tasks=4
		Total time spent by all maps in occupied slots (ms)=69773
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=69773
		Total vcore-milliseconds taken by all map tasks=69773
		Total megabyte-milliseconds taken by all map tasks=71447552
	Map-Reduce Framework
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
18/04/18 04:42:13 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/04/18 04:42:13 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 59.7689 seconds (0 bytes/sec)
18/04/18 04:42:13 INFO mapreduce.ExportJobBase: Exported 0 records.
18/04/18 04:42:13 ERROR mapreduce.ExportJobBase: Export job failed!
18/04/18 04:42:13 ERROR tool.ExportTool: Error during export: 
Export job failed!
	at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
	at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
	at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
	at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

 

这种erro出现的种类有很多,比如hive表字段和mysql字段对不上,或者hive中分隔符,我的原因和这两个都不一样,从上面可以看出fail的map task 有一个,kill的map task有三个,到jobhistory中查看被kill的task原因是Aggregation is not enabled.解决方案是在yarn-site.xml中配置:

 

    
    yarn.log-aggregation-enable    
    true    

fail的task原因是:Error: java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure 。。。

看起来是mysql-jdbc-connector 版本的问题,但是我改了配置重启hadoop后发现这个erro没了。

3.还有一些connection refused 的问题是由于mysql的远程登录没设置好。这个放到另一篇blog里单独写了。

 

你可能感兴趣的:(sqoop完成hive和mysql之间互导数据遇到的坑)