最近在hive上执行python脚本出现了以下问题,在hive命令行里,执行时报错信息如下:
hive> from records
> select transform(year,temperature,quality)
> using 'python /user/hive/script/is_good_quality.py'
> as year,temperature;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201112291016_0023, Tracking URL = http://10.200.187.26:50030/jobdetails.jsp?jobid=job_201112291016_0023
Kill Command = /opt/hadoop-0.20.205.0/libexec/../bin/hadoop job -Dmapred.job.tracker=10.200.187.26:9001 -kill job_201112291016_0023
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2011-12-29 14:56:34,192 Stage-1 map = 0%, reduce = 0%
2011-12-29 14:57:16,405 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201112291016_0023 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201112291016_0023_m_000002 (and more) from job job_201112291016_0023
Exception in thread "Thread-248" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
at java.net.URL.openStream(URL.java:1010)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
在hadoop的日志文件里(/opt/hadoop-0.20.205.0/logs/hadoop-root-jobtracker-chenyi3.log),错误信息如下:
2011-12-29 14:57:06,865 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201112291016_0023_m_000000_3: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hit error while closing ..
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:452)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
... 7 more
按照之前找到的解决方案如下:
A few things I'd check for if I were debugging this:
1) Is the python file set to be executable (chmod +x file.py)
2) Make sure the python file is in the same place on all machines. Probably better - put the file in hdfs then you can use " using 'hdfs://path/to/file.py' " instead of a local path
3) Take a look at your job on the hadoop dashboard (http://master-node:9100), if you click on a failed task it will give you the actual java error and stack trace so you can see what actually went wrong with the execution
4) make sure python is installed on all the slave nodes! (I always overlook this one)
Hope that helps.....
还是无法执行成功,暂时无解中(如有网友知道,请告知)……………………
通过几天的努力,终于把这个问题解决了,原因在于配置/etc/hosts文件,请参考我另一篇文章《Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.》。