hadoop日常运维问题汇总

检查区块是否丢失

当系统有区块丢失的时候,我们在9870端口这个web页面已经可以看到哪些丢失的区块了。
hadoop日常运维问题汇总_第1张图片
当然用下面命令也可以查看。

hdfs fsck / -list-corruptfileblocks

hadoop日常运维问题汇总_第2张图片

删除数据块信息

hdfs fsck 路径 -delete

先检查文件是否属于损坏文件,如果是(corrupt)就删除,否则就被诊断出不是(healthy),就不会被删除

namenode节点服务挂了

查看日志信息如下:

src=/flink/job/sjzt/checkpoints/5a4be983e14b9538d344b7fb9584cded/chk-1435	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:19,021 INFO  [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.23	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/4aafc4e3232f8ffa015aa42500c3ac7f/chk-24012	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:19,847 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:20,648 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:20,795 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:20,795 INFO  [IPC Server handler 2 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.22	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/d16a0d39cd2693e12af0d2abbdf7b2fb/chk-26898	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:20,848 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:21,395 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:21,649 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 7002 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:21,848 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:22,110 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:22,110 INFO  [IPC Server handler 0 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.35	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/74eb2abea93b52545b7e6ba10a962df1/chk-14970	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:22,587 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:22,650 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 8003 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:22,848 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:23,075 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:23,075 INFO  [IPC Server handler 9 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.51	cmd=create	src=/user/hive/warehouse/xy_ods.db/ods_bigquery_contract_new/pk_year=2022/pk_month=2022-08/pk_day=2022-08-22/bigquery_contract_new.1661142083054.tmp	dst=null	perm=hadoop:hadoop:rw-r--r--	proto=rpc
2022-08-22 12:21:23,651 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 9004 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:23,850 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:24,203 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:24,652 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 10005 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:24,850 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:25,068 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:25,166 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:25,167 INFO  [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.21	cmd=delete	src=/user/hive/warehouse/iceberg_ods.db/ods_nft_listing/metadata/19b9ce1f689833f8b96925446296d8e8-00000-46952-370054-00001.avro	dst=null	perm=null	proto=rpc
2022-08-22 12:21:25,652 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 11006 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:25,765 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:25,765 INFO  [IPC Server handler 4 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.34	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/353ca0c605b10433daf2e88ce9a9feb1/chk-601	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:25,850 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:26,654 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 12007 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]
2022-08-22 12:21:26,851 INFO  [Logger channel (from single-thread executor) to cdh192-57/172.20.192.57:8485] ipc.Client (Client.java:handleConnectionFailure(958)) - Retrying connect to server: cdh192-57/172.20.192.57:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-22 12:21:27,655 INFO  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(188)) - Waited 13008 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:28,186 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:28,186 INFO  [IPC Server handler 8 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.35	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/8e31c6b93b5ff61a0c811066fb666dab/chk-1494	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:28,656 WARN  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 14009 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:29,259 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:29,259 INFO  [IPC Server handler 3 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.35	cmd=delete	src=/flink/job/sjzt/checkpoints/19b9ce1f689833f8b96925446296d8e8/chk-370055	dst=null	perm=null	proto=rpc
2022-08-22 12:21:29,649 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:29,649 INFO  [IPC Server handler 5 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.51	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/fe154b52ce78ac3c9568314c424cc0eb/chk-44218	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:29,657 WARN  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 15010 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:29,789 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:29,789 INFO  [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.60	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/b99c1b359b92a91573d25107708f62ef/chk-1460	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:30,658 WARN  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 16011 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:30,951 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:31,659 WARN  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 17012 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:32,660 WARN  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 18013 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:32,927 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:33,087 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:33,661 WARN  [FSEditLogAsync] client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 19014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [172.20.192.56:8485]. Exceptions so far: [172.20.192.57:8485: Journal disabled until next roll]
2022-08-22 12:21:34,229 INFO  [Socket Reader #1 for port 8020] authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(138)) - Authorization successful for hadoop (auth:SIMPLE) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2022-08-22 12:21:34,229 INFO  [IPC Server handler 7 on default port 8020] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7990)) - allowed=true	ugi=hadoop (auth:SIMPLE)	ip=/172.20.192.59	cmd=mkdirs	src=/flink/job/sjzt/checkpoints/fa0629a9682efc2a685d5f29b665a5fc/chk-157	dst=null	perm=hadoop:hadoop:rwxr-xr-x	proto=rpc
2022-08-22 12:21:34,647 FATAL [FSEditLogAsync] namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [172.20.192.56:8485, 172.20.192.57:8485, 172.20.192.58:8485], stream=QuorumOutputStream starting at txid 1010629910))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
	at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:138)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:113)
	at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:115)
	at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:109)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:713)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:243)
	at java.lang.Thread.run(Thread.java:748)
2022-08-22 12:21:34,647 WARN  [FSEditLogAsync] client.QuorumJournalManager (QuorumOutputStream.java:abort(73)) - Aborting QuorumOutputStream starting at txid 1010629910
2022-08-22 12:21:34,652 INFO  [FSEditLogAsync] util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [172.20.192.56:8485, 172.20.192.57:8485, 172.20.192.58:8485], stream=QuorumOutputStream starting at txid 1010629910))
2022-08-22 12:21:34,656 INFO  [shutdown-hook-0] namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at cdh192-56/172.20.192.56
************************************************************/

可以看到Qjm通信超过20s最大值后自动断掉。
client.QuorumJournalManager (QuorumCall.java:waitFor(186)) - Waited 19014 ms (timeout=20000 ms)
namenode宕机的原因是full gc的时间太久,跟journal node的通信断开了.

参考文章:
https://blog.csdn.net/weixin_39445556/article/details/104712157
hadoop日常运维问题汇总_第3张图片
解决方案:
1)增加和qjm的通信超时时长,默认是20s,延长为2分钟。
修改hdfs-site.xml文件,添加以下配置

<property>
    <name>dfs.qjournal.write-txns.timeout.msname>
    <value>120000value>
property>

2)修改namenode服务的堆内存,由默认值修改为80G。
自从上了flink+iceberg后,小文件数飞起。blocks数量暴涨到3000万多。
hadoop日常运维问题汇总_第4张图片
修改hadoop-env.sh文件,大概在52行左右。

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Xms80G -Xmx80G -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Xms10G -Xmx10G -Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"


datanode服务节点偶尔挂掉

出现错误信息如下:

2022-08-22 18:32:06,108 INFO  [Async disk worker #1384 for volume /dfs/data5] impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:run(333)) - Deleted BP-1555553207-10.0.50.200-1625229209582 blk_1201618387_127897013 URI file:/dfs/data5/current/BP-1555553207-10.0.50.200-1625229209582/current/finalized/subdir31/subdir29/blk_1201618387
2022-08-22 18:32:06,108 WARN  [BP-1555553207-10.0.50.200-1625229209582 heartbeating to cdh192-56/172.20.192.56:8020] datanode.DataNode (BPServiceActor.java:run(855)) -Unexpected exception in block pool Block pool BP-1555553207-10.0.50.200-1625229209582 (Datanode Uuid 7a182b3f-caa2-4f13-8f87-6781fe3d9e46) service to cdh192-56/172.20.192.56:8020
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2115)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:734)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:881)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:676)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
        at java.lang.Thread.run(Thread.java:748)
2022-08-22 18:32:06,108 WARN  [BP-1555553207-10.0.50.200-1625229209582 heartbeating to cdh192-56/172.20.192.56:8020] datanode.DataNode (BPServiceActor.java:run(858)) -Ending block pool service for: Block pool BP-1555553207-10.0.50.200-1625229209582 (Datanode Uuid 7a182b3f-caa2-4f13-8f87-6781fe3d9e46) service to cdh192-56/172.20.192.56:8020
2022-08-22 18:32:06,128 INFO  [PacketResponder: BP-1555553207-10.0.50.200-1625229209582:blk_1201617661_127896282, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[172.20.192.60:9866]] DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1533)) - src: /172.20.192.51:45422, dest: /172.20.192.59:9866, bytes: 7783726, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1109828295_43, offset: 0, srvID: 7a182b3f-caa2-4f13-8f87-6781fe3d9e46, blockid: BP-1555553207-10.0.50.200-1625229209582:blk_1201617661_127896282, duration(ns): 126218643103


2022-08-22 18:32:10,372 INFO  [DataXceiver for client DFSClient_NONMAPREDUCE_-585719459_66 at /172.20.192.51:46504 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618421_127897047]] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-08-22 18:32:10,373 ERROR [DataXceiver for client DFSClient_NONMAPREDUCE_-585719459_66 at /172.20.192.51:46504 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618421_127897047]] datanode.DataNode (DataXceiver.java:run(324)) - cdh192-59:9866:DataXceiver error processing WRITE_BLOCK operation  src: /172.20.192.51:46504 dst: /172.20.192.59:9866
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:968)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:908)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
        at java.lang.Thread.run(Thread.java:748)
2022-08-22 18:32:11,329 INFO  [DataXceiver for client DFSClient_NONMAPREDUCE_821955818_84 at /172.20.192.37:40696 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618425_127897052]] datanode.DataNode (DataXceiver.java:writeBlock(747)) - Receiving BP-1555553207-10.0.50.200-1625229209582:blk_1201618425_127897052 src: /172.20.192.37:40696 dest: /172.20.192.59:9866
2022-08-22 18:32:11,329 INFO  [DataXceiver for client DFSClient_NONMAPREDUCE_821955818_84 at /172.20.192.37:40696 [Receiving block BP-1555553207-10.0.50.200-1625229209582:blk_1201618425_127897052]] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false,remoteHostTrusted = false

同样增加datanode服务的堆内存空间,修改为10G。修改参考上面。

你可能感兴趣的:(hadoop,hdfs,hadoop,大数据)