hadoop集群日常维护中遇到的一些问题汇总

 Connection reset by peer

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)

        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)

        at sun.nio.ch.IOUtil.write(IOUtil.java:65)

        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)

        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)

        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

        at java.io.DataOutputStream.flush(DataOutputStream.java:123)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1396)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1335)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1256)

        at java.lang.Thread.run(Thread.java:745)

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)

        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)

        at sun.nio.ch.IOUtil.write(IOUtil.java:65)

        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)

        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)

        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)

datanode重置链接   The client is stuck in an RPC to NameNode. Currently RPCs can be wait for a long time if the server is busy.  

可以通过修改下面几个参数来优化

dfs.namenode.handler.count(加大)  NN的服务线程数。用于处理RPC请求

dfs.namenode.replication.interval(减小)  NN周期性计算DN的副本情况的频率,秒

dfs.client.failover.connection.retries(建议加大)  专家设置。IPC客户端失败重试次数。在网络不稳定时建议加大此值

dfs.client.failover.connection.retries.on.timeouts(网络不稳定建议加大)专家设置。IPC客户端失败重试次数,此失败仅指超时失败。在网络不稳定时建议加大此值

参考资料:https://issues.apache.org/jira/browse/HADOOP-3657

你可能感兴趣的:(运维)