运行 Giraph 提示 too many counters
在加入 -ca mapreduce.job.counters.limit=1000
后,仍然运行失败
16/10/20 08:56:08 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
16/10/20 08:56:38 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer s3:22181 --zkNode /_hadoopBsp/job_1476868823433_0017/_haltComputation'
16/10/20 08:56:38 INFO mapreduce.Job: Running job: job_1476868823433_0017
16/10/20 08:56:39 INFO mapreduce.Job: Job job_1476868823433_0017 running in uber mode : false
16/10/20 08:56:39 INFO mapreduce.Job: map 50% reduce 0%
16/10/20 08:56:47 INFO mapreduce.Job: map 100% reduce 0%
16/10/20 08:56:47 INFO mapreduce.Job: Job job_1476868823433_0017 failed with state FAILED due to: Task failed task_1476868823433_0017_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/10/20 08:56:47 INFO mapreduce.Job: Counters: 34
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=97529
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=76
HDFS: Number of bytes written=0
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Failed map tasks=1
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=33269
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=33269
Total vcore-seconds taken by all map tasks=33269
Total megabyte-seconds taken by all map tasks=34067456
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=44
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=130
CPU time spent (ms)=7280
Physical memory (bytes) snapshot=186077184
Virtual memory (bytes) snapshot=823398400
Total committed heap usage (bytes)=200802304
Zookeeper base path
/_hadoopBsp/job_1476868823433_0017=0
Zookeeper halt node
/_hadoopBsp/job_1476868823433_0017/_haltComputation=0
Zookeeper server:port
s3:22181=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
Log 日志显示:
2016-10-20 08:56:38,569 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 199 took 0.016 seconds ended with state ALL_SUPERSTEPS_DONE and is now on superstep 200
2016-10-20 08:56:38,573 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on superstep 200
2016-10-20 08:56:38,574 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1}
2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x236f zxid:0x143b txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NoNode for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710001 type:create cxid:0xd8f zxid:0x143c txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_masterJobState
2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Notifying master its okay to cleanup with /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir/0_master
2016-10-20 08:56:38,575 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x2375 zxid:0x143f txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Node /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir already exists, no need to create.
2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 1 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanedUpZooKeeper: Waiting for the children of /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir to change since only got 1 nodes.
2016-10-20 08:56:40,710 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737)
at java.lang.Thread.run(Thread.java:745)
2016-10-20 08:56:40,879 INFO [main-EventThread] org.apache.giraph.bsp.BspService: process: cleanedUpChildrenChanged signaled
2016-10-20 08:56:40,880 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 2 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir
2016-10-20 08:56:40,880 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x157df96ea710001
2016-10-20 08:56:40,882 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:22181] org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /219.223.239.57:49390 which had sessionid 0x157df96ea710001
2016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Removed HDFS checkpoint directory (_bsp/_checkpoints//job_1476868823433_0017) with return = false since the job Giraph: cost.Test succeeded
2016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Halting netty client
2016-10-20 08:56:40,890 INFO [netty-client-worker-0] org.apache.giraph.comm.netty.NettyClient: stop: reached wait threshold, 1 connections closed, releasing resources now.
2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Netty client halted
2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Halting netty server
2016-10-20 08:56:43,106 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Start releasing resources
2016-10-20 08:56:43,780 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737)
at java.lang.Thread.run(Thread.java:745)
2016-10-20 08:56:43,793 INFO [communication thread] org.apache.hadoop.mapred.Task: Process Thread Dump: Communication exception
46 active threads
Thread 56 (netty-server-worker-15):
State: RUNNABLE
Blocked count: 0
Waited count: 1
Stack:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:596)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:306)
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
java.lang.Thread.run(Thread.java:745)
Thread 55 (netty-server-worker-14):
State: RUNNABLE
Blocked count: 0
Waited count: 1
还是 Counters 的问题
网上查好像是说在命令行设置 mapreduce.job.counters.limit
属性没有作用
于是设置 $HADOOP_HOME/conf/mapred-site.xml 中的属性
<property>
<name>mapreduce.job.counters.limitname>
20000
Limit on the number of counters allowed per job. The default value is 200.
property>
一切正常~