clique共识机制流程及#17620 bug修复理解

1 clique共识机制的特性

clique是基于授权的共识机制(POA)在以太坊中的实现。计划在Ropsten攻击以后,用来支持以太坊私测试链testnet(也可以自己搭建联盟链或者私有链)。clique机制的特性有:

  • 不需挖矿,由预先制定好的节点轮流出块
  • 节点管理,可通过选举将新节点添加或剔除
  • 出块周期固定

2 clique核心源码解读

使用的版本是最新的go-ethereumc 1.8.7。lique的源码在go-ethereum/consensus/clique目录下,包括api.go、clique.go和snapshot.go。api.go中主要是rpc调用方法,clique.go中是clique共识算法的核心实现,snapshot.go中是实现了区块快照,起二级缓存的作用。下面通过阅读源码来分析clique共识机制是如何实现它的特性。

相关结构

type Clique struct {
    config *params.CliqueConfig // 共识引擎配置参数,见下方CliqueConfig源码介绍
    db     ethdb.Database       // 数据库,用来存储以及获取快照检查点
 
    recents    *lru.ARCCache // 最近区块的快照,用来加速快照重组
    signatures *lru.ARCCache // 最近区块的签名,用来加速挖矿
 
    proposals map[common.Address]bool // 目前我们正在推动的提案清单,存的是地址和布尔值的键值对映射
 
    signer common.Address // 签名者的以太坊地址
    signFn SignerFn       // 签名方法,用来授权哈希
    lock   sync.RWMutex   // 锁,保护签名字段
}

// CliqueConfig是POA挖矿的共识引擎的配置字段。
type CliqueConfig struct {
    Period uint64 `json:"period"` // 在区块之间执行的秒数(可以理解为距离上一块出块后的流逝时间秒数)
    Epoch  uint64 `json:"epoch"`  // Epoch['iːpɒk]长度,重置投票和检查点
}

// Snapshot对象是在给定点的一个认证投票的状态
type Snapshot struct {
    config   *params.CliqueConfig // 配置参数
    sigcache *lru.ARCCache        // 签名缓存,最近的区块签名加速恢复。
 
    Number  uint64                      `json:"number"`  // 快照建立的区块号
    Hash    common.Hash                 `json:"hash"`    // 快照建立的区块哈希
    Signers map[common.Address]struct{} `json:"signers"` // 当下认证签名者的集合
    Recents map[uint64]common.Address   `json:"recents"` // 最近签名区块地址的集合
    Votes   []*Vote                     `json:"votes"`   // 按时间顺序排列的投票名单。
    Tally   map[common.Address]Tally    `json:"tally"`   // 当前的投票结果,避免重新计算。

clique的Seal方法

//Seal方法实现了共识引擎,利用本地签名认证来打包新区块  
func (c *Clique) Seal(chain consensus.ChainReader, block *types.Block, results chan<- *types.Block, stop <-chan struct{}) error {  
    header := block.Header()  
  
    // genesis区块不需要打包  
    number := header.Number.Uint64()  
    if number == 0 {  
        return errUnknownBlock  
    }  
    //当区块周期为0时,禁止打包交易为空的区块  
    if c.config.Period == 0 && len(block.Transactions()) == 0 {  
        log.Info("Sealing paused, waiting for transactions")  
        return nil  
    }  
    // 在整个打包过程中,不要持有signer字段  
    c.lock.RLock()  
    signer, signFn := c.signer, c.signFn  
    c.lock.RUnlock()  
  
    // 使用snapshot方法获取快照  
    snap, err := c.snapshot(chain, number-1, header.ParentHash, nil)  
    if err != nil {  
        return err  
    }  
          
        //利用快照检验签名者是否授权  
    if _, authorized := snap.Signers[signer]; !authorized {  
        return errUnauthorizedSigner  
    }  
    // 如果我们最近刚签名过区块,就等待下一次签名  
    for seen, recent := range snap.Recents {  
        if recent == signer {  
            // Signer当前签名者在【最近签名者】中,如果当前区块没有剔除他的话只能继续等待  
            if limit := uint64(len(snap.Signers)/2 + 1); number < limit || seen > number-limit {  
                log.Info("Signed recently, must wait for others")  
                return nil  
            }  
        }  
    }  
    // 通过以上校验,到了这里说明协议已经允许我们来签名这个区块,等待此工作完成  
    delay := time.Unix(header.Time.Int64(), 0).Sub(time.Now()) // nolint: gosimple  
    if header.Difficulty.Cmp(diffNoTurn) == 0 {  
        // It's not our turn explicitly to sign, delay it a bit  
        wiggle := time.Duration(len(snap.Signers)/2+1) * wiggleTime  
        delay += time.Duration(rand.Int63n(int64(wiggle)))  
  
        log.Trace("Out-of-turn signing requested", "wiggle", common.PrettyDuration(wiggle))  
    }  
    // 进行签名  
    sighash, err := signFn(accounts.Account{Address: signer}, sigHash(header).Bytes())  
    if err != nil {  
        return err  
    }  
    copy(header.Extra[len(header.Extra)-extraSeal:], sighash)  
    // 等待签名结束或者超时  
    log.Trace("Waiting for slot to sign and propagate", "delay", common.PrettyDuration(delay))  
    go func() {  
        select {  
        case <-stop:  
            return  
        case <-time.After(delay):  
        }  
  
        select {  
        //将打包好的区块发送到results通道
        case results <- block.WithSeal(header):  
        default:  
            log.Warn("Sealing result is not read by miner", "sealhash", c.SealHash(header))  
        }  
    }()  
  
    return nil  
}  

seal函数首先检测当前是否有打包区块的资格,首先从获取区块快照snap,然后利用snap获取当前节点是否有签名资格;然后再判断当前区块是否最近刚签名过区块,如果签名过,则等待下一轮。这里的一轮是指所有签名者都依次签名完毕算一轮。在这里先请记住用来获取快照的snapshot函数。

for seen, recent := range snap.Recents {
		if recent == signer {
			// Signer is among recents, only wait if the current block doesn't shift it out
			if limit := uint64(len(snap.Signers)/2 + 1); number < limit || seen > number-limit {
				log.Info("Signed recently, must wait for others")
				return nil
			}
		}
	}

这一段,如果当前签名者在最近签名集合中,则在接下来的len(snap.Sinaers)/2+1个区块中是需要等待不能再签名。这样保证了出块机会均等,防止恶意攻击者连续出块。

生成新区快时,矿工会进行延时,对于轮到出块的高优先级矿工,出块时间是:

header.Time = new(big.Int).Add(parent.Time, new(big.Int).SetUint64(c.config.Period))

这个是在clique.go中的Prepare方法中设置的。

对于普通矿工,其出块时间需要添加一个随机延时,延时范围是:

time.Duration(len(snap.Signers)/2+1) * wiggleTime

这里wiggleTime设置的是500ms。

clique的snapshot方法

// snapshot获取在给定时间点的授权快照
func (c *Clique) snapshot(chain consensus.ChainReader, number uint64, hash common.Hash, parents []*types.Header) (*Snapshot, error) {
	// Search for a snapshot in memory or on disk for checkpoints
	var (
		headers []*types.Header
		snap    *Snapshot
	)
	for snap == nil {
		// 如果找到一个内存里的快照,使用它
		if s, ok := c.recents.Get(hash); ok {
			snap = s.(*Snapshot)
			break
		}
		// 如果在磁盘上找到一个快照,使用它
		if number%checkpointInterval == 0 {
			if s, err := loadSnapshot(c.config, c.signatures, c.db, hash); err == nil {
				log.Trace("Loaded voting snapshot from disk", "number", number, "hash", hash)
				snap = s
				break
			}
		}
		// 如果是创世区块,或者在检查点并且没有父区块,则创建快照
		if number == 0 || (number%c.config.Epoch == 0 && chain.GetHeaderByNumber(number-1) == nil) {
			checkpoint := chain.GetHeaderByNumber(number)
			if checkpoint != nil {
				hash := checkpoint.Hash()

				signers := make([]common.Address, (len(checkpoint.Extra)-extraVanity-extraSeal)/common.AddressLength)
				for i := 0; i < len(signers); i++ {
					copy(signers[i][:], checkpoint.Extra[extraVanity+i*common.AddressLength:])
				}
				snap = newSnapshot(c.config, c.signatures, number, hash, signers)
				if err := snap.store(c.db); err != nil {
					return nil, err
				}
				log.Info("Stored checkpoint snapshot to disk", "number", number, "hash", hash)
				break
			}
		}
		// 没有针对这个区块头的快照,则收集区块头并向后移动
		var header *types.Header
		if len(parents) > 0 {
			// 如果有制定的父区块,则挑拣出来
			header = parents[len(parents)-1]
			if header.Hash() != hash || header.Number.Uint64() != number {
				return nil, consensus.ErrUnknownAncestor
			}
			parents = parents[:len(parents)-1]
		} else {
			// 如果没有制定服区块,则从数据库中获取
			header = chain.GetHeader(hash, number)
			if header == nil {
				return nil, consensus.ErrUnknownAncestor
			}
		}
		headers = append(headers, header)
		number, hash = number-1, header.ParentHash
	}
	// 找到了先前的快照,那么将所有pending的区块头都放在它的上面
	for i := 0; i < len(headers)/2; i++ {
		headers[i], headers[len(headers)-1-i] = headers[len(headers)-1-i], headers[i]
	}
	snap, err := snap.apply(headers)//通过区块头生成一个新的snapshot对象
	if err != nil {
		return nil, err
	}
	c.recents.Add(snap.Hash, snap)//将当前快照区块的hash存到recents中

	// 如果我们生成了一个新的检查点快照,保存到磁盘上
	if snap.Number%checkpointInterval == 0 && len(headers) > 0 {
		if err = snap.store(c.db); err != nil {
			return nil, err
		}
		log.Trace("Stored voting snapshot to disk", "number", snap.Number, "hash", snap.Hash)
	}
	return snap, err
}

当有新区块头到来时,则会使用snap.apply方法为这个区块头创建一个snapshot对象。

apply方法


//apply在原来快照的基础上,采用给定区块头创建一个新的授权快照
func (s *Snapshot) apply(headers []*types.Header) (*Snapshot, error) {
	//区块头为空,直接返回
	if len(headers) == 0 {
		return s, nil
	}
	// 检查区块数
	for i := 0; i < len(headers)-1; i++ {
		if headers[i+1].Number.Uint64() != headers[i].Number.Uint64()+1 {
			return nil, errInvalidVotingChain
		}
	}
	if headers[0].Number.Uint64() != s.Number+1 {
		return nil, errInvalidVotingChain
	}
	//复制一个新的快照
	snap := s.copy()
    
    //迭代区块头
	for _, header := range headers {
		// Remove any votes on checkpoint blocks
		number := header.Number.Uint64()
        //如果在Epoch检查点,则清空投票和计数
		if number%s.config.Epoch == 0 {
			snap.Votes = nil
			snap.Tally = make(map[common.Address]Tally)
		}
		// 从recent列表中删除最老的验证者以允许它继续签名
		if limit := uint64(len(snap.Signers)/2 + 1); number >= limit {
			delete(snap.Recents, number-limit)
		}
		// 从区块头中解密出来签名者地址
		signer, err := ecrecover(header, s.sigcache)
		if err != nil {
			return nil, err
		}
        //检查是否授权
		if _, ok := snap.Signers[signer]; !ok {
			return nil, errUnauthorizedSigner
		}
        //检查是否重复签名
		for _, recent := range snap.Recents {
			if recent == signer {
				return nil, errRecentlySigned
			}
		}
		snap.Recents[number] = signer

        //区块头已授权,移除关于这个签名者的投票
		for i, vote := range snap.Votes {
			if vote.Signer == signer && vote.Address == header.Coinbase {
				//从缓存计数器中移除投票
				snap.uncast(vote.Address, vote.Authorize)

				// 从序列中移除投票
				snap.Votes = append(snap.Votes[:i], snap.Votes[i+1:]...)
				break // only one vote allowed
			}
		}
		// 计数新的投票
		var authorize bool
		switch {
		case bytes.Equal(header.Nonce[:], nonceAuthVote):
			authorize = true
		case bytes.Equal(header.Nonce[:], nonceDropVote):
			authorize = false
		default:
			return nil, errInvalidVote
		}
		if snap.cast(header.Coinbase, authorize) {
			snap.Votes = append(snap.Votes, &Vote{
				Signer:    signer,
				Block:     number,
				Address:   header.Coinbase,
				Authorize: authorize,
			})
		}
		// 当投票超过半数就会通过,将新的签名者加入到签名者集合中
		if tally := snap.Tally[header.Coinbase]; tally.Votes > len(snap.Signers)/2 {
			if tally.Authorize {
				snap.Signers[header.Coinbase] = struct{}{}
			} else {
				delete(snap.Signers, header.Coinbase)

				// Signer list shrunk, delete any leftover recent caches
				if limit := uint64(len(snap.Signers)/2 + 1); number >= limit {
					delete(snap.Recents, number-limit)
				}
				// Discard any previous votes the deauthorized signer cast
				for i := 0; i < len(snap.Votes); i++ {
					if snap.Votes[i].Signer == header.Coinbase {
						// Uncast the vote from the cached tally
						snap.uncast(snap.Votes[i].Address, snap.Votes[i].Authorize)

						// Uncast the vote from the chronological list
						snap.Votes = append(snap.Votes[:i], snap.Votes[i+1:]...)

						i--
					}
				}
			}
			// Discard any previous votes around the just changed account
			for i := 0; i < len(snap.Votes); i++ {
				if snap.Votes[i].Address == header.Coinbase {
					snap.Votes = append(snap.Votes[:i], snap.Votes[i+1:]...)
					i--
				}
			}
			delete(snap.Tally, header.Coinbase)
		}
	}
	snap.Number += uint64(len(headers))
	snap.Hash = headers[len(headers)-1].Hash()

	return snap, nil
}

在这个方法中根据区块头,更新snapshot结构的相关成员。比较重要的一个是对签名者signer的管理,从recents中删除最老的签名者,并且将当前区块的签名者加入到recent缓存中。另一个是对投票的处理。投票是在apply方法中进行处理的。可以看到,在Epoch检查点,会删除原有的投票,Epoch是30000,这个也是clique的投票周期。当投票超过一半,投票才能生效。

inturn方法

// inturn returns if a signer at a given block height is in-turn or not.
func (s *Snapshot) inturn(number uint64, signer common.Address) bool {
	signers, offset := s.signers(), 0
	for offset < len(signers) && signers[offset] != signer {
		offset++
	}
	return (number % uint64(len(signers))) == uint64(offset)
}

这个方法判断当前是否轮到验证者来验证区块。就是按照顺序轮流出块。

calcDifficulty函数

func CalcDifficulty(snap *Snapshot, signer common.Address) *big.Int {
	if snap.inturn(snap.Number+1, signer) {
		return new(big.Int).Set(diffInTurn)
	}
	return new(big.Int).Set(diffNoTurn)
}

如果轮到节点出块,它的难度系数就为2,否则设置为1。区块链会选择难度系数最大的一条链为当前链。

3 clique的#17620 bug

该bug见于go-ethereum 1.8.14和1.8.15版本,用clique机制创建的私有链运行正常,但是使用一个新节点想加入区块链,在同步的时候,我的是在90001时报错:

########## BAD BLOCK #########
Chain config: {ChainID: 115 Homestead: 1 DAO:  DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople:  Engine: clique}

Number: 90001
Hash: 0xdcccdcf756f7c9e3fb5c8360bb98b2303c763126db14fb8ac499cb18ee71cd59


Error: unauthorized
##############################

网上有这个问题的讨论:

https://ethereum.stackexchange.com/questions/60023/synchronisation-failed-dropping-peer-err-retrieved-hash-chain-is-invalid-me

go-ethereum开发者karalabe关于这个bug的说法:

This is the fix for the Rinkeby consensus split.

When adding the light client checkpoint sync support for Rinkeby (Clique), we needed to relax the requirement that signing/voting snapshots are generated from previous blocks, and rather trust a standalone epoch block in itself, similar to how we trust the genesis (so light nodes can sync from there instead of verifying the entire header chain).

The oversight however was that the genesis block doesn't have previous signers (who can't sign currently), whereas checkpoint blocks do have previous signers. The checkpoint sync extension caused Clique nodes to discard previous signers at epoch blocks, allowing any authorized signer to seal the next block.

This caused signers running on v1.8.14 and v1.8.15 to create an invalid block, sealed by a node that already sealed recently and shouldn't have been allowed to do so, causing a consensus split between new nodes and old nodes.

This PR fixes the issue by making the checkpoint snapshot trust more strict, only ever trusting a snapshot block blindly if it's the genesis or if its parent is missing (i.e. we're starting sync from the middle of the chain, not the genesis). For all other scenarios, we still regenerate the snapshot ourselves along with the recent signer list.

Note, this hotfix does still mean that light clients are susceptible for the same bug - whereby they accept blocks signed by the wrong signers for a couple blocks - following a LES checkpoint, but that's fine because as long as full nodes correctly enforce the good chain, light clients can only ever import a couple bad blocks before the get stuck or switch to the properly validated chain. After len(signers) / 2 blocks after initial startup, light clients become immune tho this "vulnerability" as well.

简单说就是v1.8.14和v1.8.15俩个版本引入了这个bug,它导致一个签名者在不该轮到它签名的时候却去签名区块生成了一个无效区块。这个无效区块当时被区块链其它节点验证通过并写入了区块链。但是新节点验证时就会报错。bug的修复就是在创建snapshot快照时,进行更严格的检查,只有创世区块或者服区块缺失时(比如从区块链中间开始同步,而不是从创世区块)才允许创建快照。升级到1.8.16版本就能解决这个问题。

我们看1.8.15的关于创建快照的代码:

clique.go的snapshot方法中的:

func (c *Clique) snapshot(chain consensus.ChainReader, number uint64, hash common.Hash, parents []*types.Header) (*Snapshot, error) {
    ........
    
    // If we're at an checkpoint block, make a snapshot if it's known
		if number%c.config.Epoch == 0 {
			checkpoint := chain.GetHeaderByNumber(number)
			if checkpoint != nil {
				hash := checkpoint.Hash()

				signers := make([]common.Address, (len(checkpoint.Extra)-extraVanity-extraSeal)/common.AddressLength)
				for i := 0; i < len(signers); i++ {
					copy(signers[i][:], checkpoint.Extra[extraVanity+i*common.AddressLength:])
				}
				snap = newSnapshot(c.config, c.signatures, number, hash, signers)
				if err := snap.store(c.db); err != nil {
					return nil, err
				}
				log.Info("Stored checkpoint snapshot to disk", "number", number, "hash", hash)
				break
			}
		}

    ........
}

这个方法只要在Epoch周期检查点就会重新创建快照,创建时会将区块原有的签名者都清空,这样导致原来刚签过名的验证者也会继续签名。v1.8.17的解决方案是:

// snapshot retrieves the authorization snapshot at a given point in time.
func (c *Clique) snapshot(chain consensus.ChainReader, number uint64, hash common.Hash, parents []*types.Header) (*Snapshot, error) {
    ......
    
    // If we're at an checkpoint block, make a snapshot if it's known
		if number == 0 || (number%c.config.Epoch == 0 && chain.GetHeaderByNumber(number-1) == nil) {
			checkpoint := chain.GetHeaderByNumber(number)
			if checkpoint != nil {
				hash := checkpoint.Hash()

				signers := make([]common.Address, (len(checkpoint.Extra)-extraVanity-extraSeal)/common.AddressLength)
				for i := 0; i < len(signers); i++ {
					copy(signers[i][:], checkpoint.Extra[extraVanity+i*common.AddressLength:])
				}
				snap = newSnapshot(c.config, c.signatures, number, hash, signers)
				if err := snap.store(c.db); err != nil {
					return nil, err
				}
				log.Info("Stored checkpoint snapshot to disk", "number", number, "hash", hash)
				break
			}
		}

    ......
}

只有创世区块或者在Epoch检查点时父区块缺失时才会重新创建快照。

你可能感兴趣的:(区块链)