当系统有区块丢失的时候,我们在9870端口这个web页面已经可以看到哪些丢失的区块了。
当然用下面命令也可以查看。
hdfs fsck / -list-corruptfileblocks
hdfs fsck 路径 -delete
先检查文件是否属于损坏文件,如果是(corrupt)就删除,否则就被诊断出不是(healthy),就不会被删除
查看日志信息如下:
src=/flink/job/sjzt/checkpoints/5a4be983e14b9538d344b7fb9584cded/chk-1435 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:19,021 INFO [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.23 cmd=mkdirs src=/flink/job/sjzt/checkpoints/4aafc4e3232f8ffa015aa42500c3ac7f/chk-24012 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:19,847 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:20,648 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:20,795 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:20,795 INFO [IPC Server handler 2 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.22 cmd=mkdirs src=/flink/job/sjzt/checkpoints/d16a0d39cd2693e12af0d2abbdf7b2fb/chk-26898 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:20,848 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:21,395 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:21,649 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 7002 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:21,848 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:22,110 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:22,110 INFO [IPC Server handler 0 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.35 cmd=mkdirs src=/flink/job/sjzt/checkpoints/74eb2abea93b52545b7e6ba10a962df1/chk-14970 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:22,587 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:22,650 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 8003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:22,848 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:23,075 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:23,075 INFO [IPC Server handler 9 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.51 cmd=create src=/user/hive/warehouse/xy_ods.db/ods_bigquery_contract_new/pk_year=2022/pk_month=2022-08/pk_day=2022-08-22/bigquery_contract_new.1661142083054.tmp dst=null perm=hadoop:hadoop:rw-r--r-- proto=rpc
2022-08-22 12:21:23,651 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 9004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:23,850 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:24,203 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:24,652 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 10005 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:24,850 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:25,068 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:25,166 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:25,167 INFO [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.21 cmd=delete src=/user/hive/warehouse/iceberg_ods.db/ods_nft_listing/metadata/19b9ce1f689833f8b96925446296d8e8-00000-46952-370054-00001.avro dst=null perm=null proto=rpc
2022-08-22 12:21:25,652 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 11006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:25,765 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:25,765 INFO [IPC Server handler 4 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.34 cmd=mkdirs src=/flink/job/sjzt/checkpoints/353ca0c605b10433daf2e88ce9a9feb1/chk-601 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:25,850 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:26,654 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 12007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:26,851 INFO [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:27,655 INFO [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 13008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:28,186 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:28,186 INFO [IPC Server handler 8 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.35 cmd=mkdirs src=/flink/job/sjzt/checkpoints/8e31c6b93b5ff61a0c811066fb666dab/chk-1494 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:28,656 WARN [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 14009 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:29,259 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:29,259 INFO [IPC Server handler 3 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.35 cmd=delete src=/flink/job/sjzt/checkpoints/19b9ce1f689833f8b96925446296d8e8/chk-370055 dst=null perm=null proto=rpc
2022-08-22 12:21:29,649 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:29,649 INFO [IPC Server handler 5 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.51 cmd=mkdirs src=/flink/job/sjzt/checkpoints/fe154b52ce78ac3c9568314c424cc0eb/chk-44218 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:29,657 WARN [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 15010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:29,789 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:29,789 INFO [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.60 cmd=mkdirs src=/flink/job/sjzt/checkpoints/b99c1b359b92a91573d25107708f62ef/chk-1460 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:30,658 WARN [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 16011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:30,951 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:31,659 WARN [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 17012 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:32,660 WARN [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 18013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:32,927 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:33,087 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:33,661 WARN [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 19014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:34,229 INFO [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:34,229 INFO [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true ugi=hadoop (auth:SIMPLE) ip=/172.20.192.59 cmd=mkdirs src=/flink/job/sjzt/checkpoints/fa0629a9682efc2a685d5f29b665a5fc/chk-157 dst=null perm=hadoop:hadoop:rwxr-xr-x proto=rpc
2022-08-22 12:21:34,647 FATAL [FSEditLogAsync] namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [172.20.192.56:8485, 172.20.192.57:8485, 172.20.192.58:8485], stream=QuorumOutputStream starting at txid 1010629910))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:115)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:109)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:713)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:243)
at java.lang.Thread.run(Thread.java:748)
2022-08-22 12:21:34,647 WARN [FSEditLogAsync] client.QuorumJournalManager (QuorumOutputStream.java:abort(73)) - Aborting QuorumOutputStream starting at txid 1010629910
2022-08-22 12:21:34,652 INFO [FSEditLogAsync] util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [172.20.192.56:8485, 172.20.192.57:8485, 172.20.192.58:8485], stream=QuorumOutputStream starting at txid 1010629910))
2022-08-22 12:21:34,656 INFO [shutdown-hook-0] namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cdh192-56/172.20.192.56
************************************************************/
可以看到Qjm通信超过20s最大值后自动断掉。
client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 19014 ms (timeout=20000 ms)
namenode宕机的原因是full gc的时间太久,跟journal node的通信断开了.
参考文章:
https://blog.csdn.net/weixin_39445556/article/details/104712157
解决方案:
1)增加和qjm的通信超时时长,默认是20s,延长为2分钟。
修改hdfs-site.xml文件,添加以下配置
<property>
<name>dfs.qjournal.write-txns.timeout.msname>
<value>120000value>
property>
2)修改namenode服务的堆内存,由默认值修改为80G。
自从上了flink+iceberg后,小文件数飞起。blocks数量暴涨到3000万多。
修改hadoop-env.sh文件,大概在52行左右。
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Xms80G -Xmx80G -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Xms10G -Xmx10G -Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
出现错误信息如下:
2022-08-22 18:32:06,108 INFO [Async disk worker #1384 for volume /dfs/data5] impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(333)) - Deleted BP-1555553207-10.0.50.200-1625229209582 blk_1201618387_127897013 URI file:/dfs/data5/current/BP-1555553207-10.0.50.200-1625229209582/current/finalized/subdir31/subdir29/blk_1201618387
2022-08-22 18:32:06,108 WARN [BP-1555553207-10.0.50.200-1625229209582 heartbeating to cdh192-56/172.20.192.56:8020] datanode.DataNode (BPServiceActor.java:run(855)) -Unexpected exception in block pool Block pool BP-1555553207-10.0.50.200-1625229209582 (Datanode Uuid 7a182b3f-caa2-4f13-8f87-6781fe3d9e46) service to cdh192-56/172.20.192.56:8020
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2115)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:734)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:881)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:676)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
at java.lang.Thread.run(Thread.java:748)
2022-08-22 18:32:06,108 WARN [BP-1555553207-10.0.50.200-1625229209582 heartbeating to cdh192-56/172.20.192.56:8020] datanode.DataNode (BPServiceActor.java:run(858)) -Ending block pool service for: Block pool BP-1555553207-10.0.50.200-1625229209582 (Datanode Uuid 7a182b3f-caa2-4f13-8f87-6781fe3d9e46) service to cdh192-56/172.20.192.56:8020
2022-08-22 18:32:06,128 INFO [PacketResponder: BP-1555553207-10.0.50.200-1625229209582:blk_1201617661_127896282, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.20.192.60:9866]] DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1533)) - src: /172.20.192.51:45422, dest: /172.20.192.59:9866, bytes: 7783726, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1109828295_43, offset: 0, srvID: 7a182b3f-caa2-4f13-8f87-6781fe3d9e46, blockid: BP-1555553207-10.0.50.200-1625229209582:blk_1201617661_127896282, duration(ns): 126218643103
2022-08-22 18:32:10,372 INFO [DataXceiver for client DFSClient_NONMAPREDUCE_-585719459_66 at /172.20.192.51:46504 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618421_127897047]] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-08-22 18:32:10,373 ERROR [DataXceiver for client DFSClient_NONMAPREDUCE_-585719459_66 at /172.20.192.51:46504 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618421_127897047]] datanode.DataNode (DataXceiver.java:run(324)) - cdh192-59:9866:DataXceiver error processing WRITE_BLOCK operation src: /172.20.192.51:46504 dst: /172.20.192.59:9866
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:968)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:908)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
at java.lang.Thread.run(Thread.java:748)
2022-08-22 18:32:11,329 INFO [DataXceiver for client DFSClient_NONMAPREDUCE_821955818_84 at /172.20.192.37:40696 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618425_127897052]] datanode.DataNode (DataXceiver.java:writeBlock(747)) - Receiving BP-1555553207-10.0.50.200-1625229209582:blk_1201618425_127897052 src: /172.20.192.37:40696 dest: /172.20.192.59:9866
2022-08-22 18:32:11,329 INFO [DataXceiver for client DFSClient_NONMAPREDUCE_821955818_84 at /172.20.192.37:40696 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618425_127897052]] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false,remoteHostTrusted = false
同样增加datanode服务的堆内存空间,修改为10G。修改参考上面。