本文结合运行环境实际日志对FISCO BCOS PBFT共识算法过程进行说明,此说明源自合约并发执行导致共识无法达成问题排查。
问题环境为4节点4c16G内存500G空间环境,出现问题为共识无法达成,初步分析原因为并发执行导致同一个交易块得出的hash不一致。
此篇展示为正常共识情况。还有一篇展示的是问题情况。
3.1 leader打包区块
PBFT共识算法中,共识节点轮流出块,每一轮共识仅有一个leader打包区块,leader索引通过公式(block_number + current_view) % consensus_node_num计算得出。
节点计算当前leader索引与自己索引相同后,就开始打包区块。区块打包主要由PBFTSealer线程完成,Sealer线程的主要工作如下图所示:
167日志:
注释:达成共识
info|2020-06-11 05:48:37.916823|[g:1][CONSENSUS][PBFT]checkAndChangeView: Reach consensus,org_view=24,cur_changeCycle=20,to_view=25
注释:处理view变更成功,应该是有异常
debug|2020-06-11 05:48:37.917612|[g:1][CONSENSUS][PBFT]handleViewChangeMsg Succ ,reqNum=25137,curNum=25137,GenIdx=1,Cview=25,view=24,fromIdx=1,fromNode=6d8de0ea...,fromIp=172.16.18.168:53130,hash=ad451194...,nodeIdx=2,myNode=b40c45de...
注释:判断事务足够,开始封包
debug|2020-06-11 05:48:37.924489|[g:1][CONSENSUS][SEALER][checkTxsEnough] Tx enough: [txNum]: 2
注释:生成封包,块数25138,tx两个,node索引2,hash e952
info|2020-06-11 05:48:37.925729|[g:1][CONSENSUS][SEALER]++++++++++++++++ Generating seal on,blkNum=25138,tx=2,nodeIdx=2,hash=e952d39e...
3.2 pre-prepare阶段
共识节点收到Prepare包后,进入pre-prepare阶段,此阶段的主要工作流程包括:
# 缓存Prepare请求
info|2020-06-11 05:48:37.928782|[g:1][CONSENSUS]addRawPrepare,height=25138,reqIdx=2,hash=e952d39e...,time=0
# 执行prepare内区块
info|2020-06-11 05:48:37.929061|[g:1][BLOCKVERIFIER][executeBlock]Executing block,txNum=2,num=25138,parentHash=ad45119443cdbf6ac46f130d7c05019acf757346640b87ff334482309b36ad71,parentNum=25137,parentStateRoot=49b56f731da59cb104a6f04f6c5e5c0561033bbace066f9c58f0f7ee70c330d9
# 执行prepare内区块详情
debug|2020-06-11 05:48:37.941330|[g:1][BLOCKVERIFIER][executeBlock]Para execute block takes,time(ms)=12,txNum=2,blockNumber=25138,blockHash=03b7b162e64602593f2d5dae1b2c85c162eb96ea2c3025c5529cc825f74559ad,stateRoot=4a4b62461bda534a4d374b62aca0c94286807154fbd8a36778ff15da2adc154c,dbHash=4a4b62461bda534a4d374b62aca0c94286807154fbd8a36778ff15da2adc154c,transactionRoot=af096a97ee83e7aeb5ae403a4aab2a85a2064fa889fb41c42abe695b04dfdaee,receiptRoot=c673b2f49d39a6fd9fe9cc19a4392d7a31f7716a3a06091a193f7c78a5524bee,initExeCtxTimeCost=1,perpareBlockTimeCost=0,initDagTimeCost=2,exeTimeCost=8,getRootHashTimeCost=0,setAllReceiptTimeCost=0,getReceiptRootTimeCost=1,setStateRootTimeCost=0
#
info|2020-06-11 05:48:37.941400|[g:1][CONSENSUS][PBFT]execBlock,blkNum=25138,reqIdx=2,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...,decodeCost=0,checkCost=0,notifyCost=0,noteSealingCost=0,currentCycle=20,verifyAndSetSenderCost=0,execCost=13,execPerTx=6.5,totalCost=13
#重新生成prepare request,因为块已经执行过了
debug|2020-06-11 05:48:37.941811|[g:1]Re-generate prepare_requests since block has been executed, time = 1591825717941 , block_hash: 03b7b162...
#缓存执行后区块&广播签名请求
debug|2020-06-11 05:48:37.942747|[g:1][CONSENSUS][PBFT]handlePrepareMsg: add prepare cache and broadcastSignReq,reqNum=25138,hash=03b7b162...,nodeIdx=2,addPrepareTime=1,myNode=b40c45de...
#处理成功
info|2020-06-11 05:48:37.943363|[g:1][CONSENSUS][PBFT]handlePrepareMsg Succ,Timecost=14.567,INFO=handlePrepareMsg,reqIdx=2,view=25,reqNum=25138,curNum=25137,consNum=25138,curView=25,fromIp=self,hash=e952d39e...,nodeIdx=2,myNode=b40c45de...,curChangeCycle=20
3.3 Prepare阶段
共识节点收到签名包后,进入Prepare阶段,此阶段的主要工作流程包括:
#生成本地prepare
info|2020-06-11 05:48:37.943408|[g:1][CONSENSUS][PBFT]generateLocalPrepare,hash=e952d39e...,H=25138,nodeIdx=2,myNode=b40c45de...
#处理签名信息成功
info|2020-06-11 05:48:37.950536|[g:1][CONSENSUS][PBFT]handleSignMsg Succ,Timecost=1.33,INFO=handleSignMsg,num=25138,curNum=25137,GenIdx=3,Sview=25,view=25,fromIdx=3,fromNode=f25acaf6...,fromIp=172.16.18.175:58230,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...
#签名信息收集充足
debug|2020-06-11 05:48:37.950973|[g:1][CONSENSUS][PBFT]checkAndCommit, SignReq enough,number=25138,sigSize=3,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...
#备份,更新commit 准备包
info|2020-06-11 05:48:37.950989|[g:1][CONSENSUS][PBFT]checkAndCommit: backup/updateCommittedPrepare,reqNum=25138,hash=e952d39e...,nodeIdx=2,myNode=b40c45de...
#广播commit包
debug|2020-06-11 05:48:37.951003|[g:1][CONSENSUS][PBFT]checkAndCommit: broadcastCommitReq,prepareHeight=25138,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...
3.4 Commit阶段
共识节点收到Commit包后,进入Commit阶段,此阶段工作流程包括:
#处理commit消息成功
info|2020-06-11 05:48:37.951799|[g:1][CONSENSUS][PBFT]handleCommitMsg Succ,INFO=handleCommitMsg,reqNum=25138,curNum=25137,GenIdx=3,Cview=25,view=25,fromIdx=3,fromNode=f25acaf6...,fromIp=172.16.18.175:58230,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...,Timecost=0.457
#收集commit消息足够
info|2020-06-11 05:48:37.958677|[g:1][CONSENSUS][PBFT]checkAndSave: CommitReq enough,prepareHeight=25138,commitSize=3,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...
#写入交易
debug|2020-06-11 05:48:37.959202|[g:1][BLOCKCHAIN][WriteTxOnCommit]Write tx to block time record,openTableTimeCost=0,constructVectorTimeCost=0,insertTableTimeCost=0,encodeNonceVectorTimeCost=0,insertNonceVectorTimeCost=0,totalTimeCost=0
#commit 块成功
info|2020-06-11 05:48:37.961355|[g:1][CONSENSUS][PBFT]CommitBlock Succ,prepareHeight=25138,reqIdx=2,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...,genSigTimeCost=0,commitBlockTimeCost=2,dropTxsTimeCost=1,noteSealingTimeCost=0,totalTimeCost=3
#删除cache
debug|2020-06-11 05:48:37.962180|[g:1][CONSENSUS]delCache,hash=03b7b162...
#处理168来的commit成功,这个收集commit应该是并行,日志为啥是落后的呢?
info|2020-06-11 05:48:37.962205|[g:1][CONSENSUS][PBFT]handleCommitMsg Succ,INFO=handleCommitMsg,reqNum=25138,curNum=25137,GenIdx=1,Cview=25,view=25,fromIdx=1,fromNode=6d8de0ea...,fromIp=172.16.18.168:53130,hash=03b7b162...,nodeIdx=2,myNode=b40c45de...,Timecost=3.976
#报告
info|2020-06-11 05:48:37.962575|[g:1][CONSENSUS][PBFT]^^^^^^^^Report,num=25138,sealerIdx=2,hash=03b7b162...,next=25139,tx=2,nodeIdx=2
#
debug|2020-06-11 05:48:37.962592|[g:1][CONSENSUS][SEALER][reportNewBlock] Reset sealing: [number]: 25138, sealing number:25138
#
debug|2020-06-11 05:48:37.962602|[g:1][CONSENSUS][SEALER][resetSealingBlock],blkNum=25138,sealingNum=25138
#
debug|2020-06-11 05:48:37.962694|[g:1][CONSENSUS][SEALER]resetCurrentBlock to,sealingNum=25139
3.5 视图切换处理流程
当PBFT三阶段共识超时或节点收到空块时,PBFTEngine会试图切换到更高的视图(将要切换到的视图toView加一),并触发ViewChange处理流程;节点收到ViewChange包时,也会触发ViewChange处理流程: