P2P 对等节点源码解析

p2p 模块

p2p(peer to peer) 的责任是把节点带入网络、消息传递,驱动整个网络正常运转。

p2p 模块主要分下面几个组件

  • Peer
  • Connection
  • AddrBook
  • Switch
  • Transport
  • PexReactor

下面的 UML 图列出了类中一些基本的属性和方法,只是我认为比较重要的部分。接下来为大家依次讲解各个组件的功能。

UML.jpg

还有 P2P 模块的基本调用关系

p2p 流程.jpg

peer

peer 在 p2p 中表示一个对等体。也是由peer 和应用程序其他模块之间直接进行消息交互。 peer 实现了 Peer 接口,接下来看看peer 实现的方法。

peer 分两种,一种是 inbound, 一种 outbound.
inbound 表示连接我的节点,outbound表示我对外连接的节点

peer 构造函数接收了一个 peerConn 类型的对象作为属性, peerConn 实现如下:

// peerConn contains the raw connection and its config
type peerConn struct {
    outbound        bool
    persistent      bool
    conn            net.Conn // source connection
    socketAddr      *NetAddress
    // cached RemoteIP()
    ip              net.IP
}

peerConn 封装了net.Conn, outbound 等字段,表示该对象继有可能是个服务端连接,也有可能是客户端连接。该对象可以直接用于 newOutboundPeerConnnewInboundPeerConn 。 这两个函数都是在 Switch 组件中被调用的。

// Peer interface 
FlushStop()
ID() ID // 节点的id 
RemoteIP() net.IP  // 节点的ip地址
RemoteAddr() net.Addr  // 节点网络地址 (tcp, ip: port)
IsOutbound() bool // 是否 outbound 节点
IsPersistent() bool // 是否是持久的节点(断开后自动重连)
CloseConn() error  // 断开连接 
NodeInfo() NodeInfo // 节点信息
Status() fconn.ConnectionStatus  // 节点当前状态
SocketAddr() *NetAddress // 节点的实际地址 (NetAddress) 
Send(byte, []byte) bool //  发送消息(阻塞)(调用mconnection)
TrySend(byte, []byte) bool // 尝试发送消息(不阻塞),当前情况发不出去直接返回false (调用mconnection)


// peer
func newPeer(pc peerConn, mConfig fconn.MConnConfig, nodeInfo NodeInfo, reactorByCh map[byte]Reactor,   chDescs []*fconn.ChannelDescriptor, onPeerError func(Peer, interface{}),    options ...PeerOption) *peer  // 新建 peer 节点
func (p *peer) FlushStop()  //  刷新现有的发送缓冲区,并关闭连接(调用mconnection)
func (p *peer) CloseConn() error  // 关闭连接

接下来详细讲解一些重要方法, Peer 中的重要信息不多,主要的实际流程由 MConnection 完成

  • peer 的创建,peerConn 中携带了连接的一些附加属性,比如是否持久,是否 outbound。
func newPeer(
    pc peerConn,
    mConfig fconn.MConnConfig,
    nodeInfo NodeInfo,
    reactorByCh map[byte]Reactor,
    chDescs []*fconn.ChannelDescriptor,
    onPeerError func(Peer, interface{}),
    options ...PeerOption,
    ) *peer {
  // peerConn 中携带了连接的一些附加属性,比如是否持久,是否 outbound
    p := &peer{
        peerConn:   pc,
        nodeInfo:   nodeInfo,
        channels:   nodeInfo.(DefaultNodeInfo).Channels,
        Data:       cmn.NewCMap(),
        //metrics:      NopMetrics(),
        //metricsTicker: time.NewTicker(metricsTickerDuration),
    }
  // 创建 MConnection
    p.mconn = createMConnection(
        pc.conn,
        p,
        reactorByCh,
        chDescs,
        onPeerError,
        mConfig,
        )

    p.BaseService = *cmn.NewBaseService(nil, "Peer", p)
    for _, option := range options {
        option(p)
    }
    return p
}
  • peer 的启动,主要就是 mConn 的启动
func (p *peer) OnStart() error {
    if err := p.BaseService.OnStart(); err != nil {
        return err
    }
    if err := p.mconn.Start(); err != nil {
        return err
    }
    return nil
}
  • peer 发送 msg, 最终调用的 mConn 的发送
func (p *peer) Send(chID byte, msgBytes []byte) bool {
    if !p.IsRunning() {
        return false
    } else if !p.hasChannel(chID) {
        return false
    }
    res := p.mconn.Send(chID, msgBytes)
    return res
}
  • 尝试发送消息, 和 Send 类似
func (p *peer) TrySend(chID byte, msgBytes []byte) bool {
    if !p.IsRunning() {
        return false
    } else if !p.hasChannel(chID) {
        return false
    }
    res := p.mconn.TrySend(chID, msgBytes)
    return res
}
  • peer 是否含有某个 channelID
func (p *peer) hasChannel(chID byte) bool {
    for _, ch := range p.channels {
        if ch == chID {
            return true
        }
    }
  ...
  return false
}
  • peer 刷新(发送)并停止
func (p *peer) FlushStop() {
    p.BaseService.OnStop()
    p.mconn.FlushStop()
}

MConnection

MConnection才是封装了raw connection 的对象,peer 发送和接收数据,实际都由MConnection 来实现的

MConnection 方法如下:

func NewMConnectionWithConfig(conn net.Conn, chDescs []*ChannelDescriptor, onReceive receiveCbFunc,
    onError errorCbFunc, config MConnConfig) *MConnection // 新建一个 MConncection
func (c *MConnection) OnStart() error  // connection 启动
func (c *MConnection) stopServices() (alreadyStopped bool) // conn 停止
func (c *MConnection) FlushStop() //  刷新现有的发送缓冲区,并关闭连接(同peer)
func (c *MConnection) Send(chID byte, msgBytes[]byte) bool // 发送消息blocking
func (c *MConnection) TrySend(chID byte, msgBytes []byte) bool // 尝试发送消息no-blocking, 不成功立刻返回false
func (c *MConnection) CanSend(chID byte) bool //  是否可以发送
func (c *MConnection) sendSomePacketMsgs() bool // 发送多段数据包(数据量大的,拆分为多个数据包)
func (c *MConnection) sendPacketMsg() bool // 发送当个数据包
func (c *MConnection) recvRoutine() //  接收来自 inbound 节点发来的消息
func (c *MConnection) sendRoutine() //  发送消息到 outbound 节点
func (c *MConnection) stopForError(r interface{}) // 停止节点(有error)
func (c *MConnection) Status() ConnectionStatus // 当前conn 的状态

下面展开 MConnection 的实现

  • MConnection 的创建, 定义了一些回调函数,处理收到消息,收到error 等。构造函数中,会根据传入的 chDescs []*ChannelDescriptor 创建[]Channel,保存为变量。
func createMConnection(
    conn        net.Conn,
    p           *peer,
    reactorsByCh map[byte]Reactor,
    chDescs     []*fconn.ChannelDescriptor,
    onPeerError func(Peer, interface{}),
    config      fconn.MConnConfig) *fconn.MConnection {

    onReceive := func(chID byte, msgBytes []byte) {
        reactor := reactorsByCh[chID]
        if reactor == nil {
            panic(fmt.Sprintf("Unknown channel %X", chID))
        }
        reactor.Receive(chID, p, msgBytes)
    }

    onError := func(r interface{}) {
        onPeerError(p, r)
    }
    return fconn.NewMConnectionWithConfig(
        conn,
        chDescs,
        onReceive,
        onError,
        config,
        )
}

func NewMConnectionWithConfig(conn net.Conn, chDescs []*ChannelDescriptor, onReceive receiveCbFunc,
    onError errorCbFunc, config MConnConfig) *MConnection {
    if config.PongTimeout >= config.PingInterval {
        panic("pongTimeout must be less than pingInterval (otherwise, next ping will reset pong timer)")
    }
    mconn := &MConnection{
        conn:           conn,
        bufConnReader:  bufio.NewReaderSize(conn, minReadBufferSize),
        bufConnWriter:  bufio.NewWriterSize(conn, minWriteBufferSize),
        sendMonitor:    flow.New(0, 0),
        recvMonitor:    flow.New(0, 0),
        send:           make(chan struct{}, 1),
        pong:           make(chan struct{}, 1),
        onReceive:      onReceive,
        onError:        onError,
        config:         config,
        created:        time.Now(),
    }
    var channelsIdx = map[byte]*Channel{}
    var channels = []*Channel{}
    // 循环创建 Channel
    for _, desc := range chDescs {
        channel := newChannel(mconn, *desc)
        channelsIdx[channel.desc.ID] = channel
        channels = append(channels, channel)
    }
    mconn.channels = channels
    mconn.channelsIdx = channelsIdx
    mconn.BaseService = *cmn.NewBaseService(nil, "MConnection", mconn)
    // 设置消息大小上限
    mconn._maxPacketMsgSize = mconn.maxPacketMsgSize()
    return mconn
}
  • MConnection 的启动方法,初始化一些定时器,启动两个 routine, 分别用于监听接收消息,和发送消息
func (c *MConnection) OnStart() error {
    if err := c.BaseService.OnStart(); err != nil {
        return err
    }
    c.flushTimer = cmn.NewThrottleTimer("flush", c.config.FlushThrottle)
    c.pingTimer = time.NewTicker(c.config.PingInterval)
    c.pongTimeoutCh = make(chan bool, 1)
    c.chStatsTimer = time.NewTicker(updateStats)
    c.quitSendRoutine = make(chan struct{})
    c.doneSendRoutine = make(chan struct{})
    go c.sendRoutine() // 启动发送routine
    go c.recvRoutine() // 启动接收routine
    return nil
}
  • 停止服务, 停止定时器,关闭routine(OnStop 或 FlushStop 时调用)
func (c *MConnection) stopServices() (alreadyStopped bool) {
    c.stopMtx.Lock()
    defer c.stopMtx.Unlock()

    select {
    case <-c.quitSendRoutine:
        return true
    default:
    }
    c.BaseService.OnStop()
    c.flushTimer.Stop()
    c.pingTimer.Stop()
    c.chStatsTimer.Stop()
    close(c.quitSendRoutine)
    return false
}
  • 发送消息,这边会把消息先存入对应的channel 中,然后再释放 send 信号
func (c *MConnection) Send(chID byte, msgBytes[]byte) bool {
    if !c.IsRunning() {
        return false
    }
    c.Logger.Debug("Send", "channel", chID, "conn", c, "msgBytes", fmt.Sprintf("%X", msgBytes))

    // 获取对应的channel
  channel, ok := c.channelsIdx[chID]
    if !ok {
        c.Logger.Error(fmt.Sprintf("Cannot send bytes, unknown channel %X", chID))
        return false
    }
  // 会把消息先存入对应的channel
    success := channel.sendBytes(msgBytes)
    if success {
        select {
    // 释放发送信号
        case c.send <- struct{}{}:
        default:
        }
    } else {
        c.Logger.Debug("Send failed", "channel", chID, "conn", c, "msgBytes", fmt.Sprintf("%X", msgBytes))
    }
    return success
}
  • TrySend 和 CanSend 逻辑也差不多
func (c *MConnection) TrySend(chID byte, msgBytes []byte) bool {
    // Send message to channel.
    channel, ok := c.channelsIdx[chID]
    if !ok {
        c.Logger.Error(fmt.Sprintf("Cannot send bytes, unknown channel %X", chID))
        return false
    }
    ok = channel.trySendBytes(msgBytes)
    if ok {
        // Wake up sendRoutine if necessary
        select {
        case c.send <- struct{}{}:
        default:
        }
    }
    return ok
}

// CanSend returns true if you can send more data onto the chID, false
func (c *MConnection) CanSend(chID byte) bool {
    if !c.IsRunning() {
        return false
    }
    channel, ok := c.channelsIdx[chID]
    if !ok {
        c.Logger.Error(fmt.Sprintf("Unknown channel %X", chID))
        return false
    }
    return channel.canSend()
}
  • 回到我们的发送 routine. 这里处理了心跳,定时刷新缓存区,超时等消息
func (c *MConnection) sendRoutine() {
    defer c._recover()

FOR_LOOP:
    for {
        var _n int64
        var err error
    SELECTION:
        select {
        // 定时刷新
        case <-c.flushTimer.Ch:
            c.flush()
        // 定时更新 channel 的状态
        case <-c.chStatsTimer.C:
            for _, channel := range c.channels{
                channel.updateStats()
            }
        // 发送 ping 消息
        case <-c.pingTimer.C:
            c.Logger.Debug("Send ping")
            _n, err = cdc.MarshalBinaryLengthPrefixedWriter(c.bufConnWriter, PacketPing{})
            if err != nil {
                break SELECTION
            }
            c.Logger.Debug("Starting pong timer", "dur", c.config.PongTimeout)
            c.pongTimer = time.AfterFunc(c.config.PongTimeout, func() {
                select {
                case c.pongTimeoutCh<-true:
                default:
                    // never block
                }
            })
            c.flush()
        // 发送ping后,接收pong超时
        case timeout := <- c.pongTimeoutCh:
            if timeout {
                c.Logger.Debug("pong timeout")
                err = errors.New("pong timeout")
            } else {
                c.stopPongTimer()
            }
        // 接收到 ping, 发送pong 消息
        case <-c.pong:
            c.Logger.Debug("Send pong")
            _n, err = cdc.MarshalBinaryLengthPrefixedWriter(c.bufConnWriter, PacketPong{})
            if err != nil {
                break SELECTION
            }
            c.sendMonitor.Update(int(_n))
            c.flush()
        // 退出 routine
        case <-c.quitSendRoutine:
            break FOR_LOOP
        // 发送消息,直到一条完整消息没有发完
        case <-c.send:
            eof := c.sendSomePacketMsgs()
            if !eof {
                select {
                case c.send <- struct{}{}:
                default:
                }
            }
        }
        ....
    }
    c.stopPongTimer()
    close(c.doneSendRoutine)
}

  • 再看看接收的 routine
func (c *MConnection) recvRoutine() {
    defer c._recover()

FOR_LOOP:
    for{
        var packet Packet
        var _n int64
        var err error
        _n, err = cdc.UnmarshalBinaryLengthPrefixedReader(c.bufConnReader, &packet, int64(c._maxPacketMsgSize))
        if err != nil {
            if c.IsRunning() {
                c.Logger.Error("Connection failed @ recvRoutine (reading byte)", "conn", c, "err", err)
                c.stopForError(err)
            }
            break FOR_LOOP
        }
        switch pkt := packet.(type) {
        // 接收到 ping ,触发 pong 消息
        case PacketPing:
            c.Logger.Debug("Recevie Ping")
            select {
            case c.pong <- struct{}{}:
            default:
                // never block
            }
        // 接收到 pong ,标示心跳未超时
        case PacketPong:
            c.Logger.Debug("Receive Pong")
            select {
            case c.pongTimeoutCh <- false:
            default:
                // never block
            }
        // 接收到普通 消息
        case PacketMsg:
            // 获取对应的 channel
            channel, ok := c.channelsIdx[pkt.ChannelID]
            if !ok || channel == nil {
                err = fmt.Errorf("Unknown channel %X", pkt.ChannelID)
                c.Logger.Error("Connection failed @ recvRoutine", "conn", c, "err", err)
                c.stopForError(err)
                break FOR_LOOP
            }
            msgBytes, err := channel.recvPacketMsg(pkt)
            if err != nil {
                if c.IsRunning() {
                    c.Logger.Error("Connection failed @ recvRoutine", "conn", c, "err", err)
                    c.stopForError(err)
                }
                break FOR_LOOP
            }
            if msgBytes != nil {
                c.Logger.Debug("Receive bytes","chID", pkt.ChannelID, "msgBytes", fmt.Sprintf("%X", msgBytes))
                // 将消息转发给对应的 reactor 处理
                c.onReceive(pkt.ChannelID, msgBytes)
            }
        default:
            err := fmt.Errorf("Unknown message type %v", reflect.TypeOf(packet))
            c.Logger.Error("Connection failed @ recvRoutine", "conn", c, "err", err)
            c.stopForError(err)
            break FOR_LOOP
        }
    }
}

  • 具体的发送逻辑, 只要一个完整交易发送成功,即返回true
func (c *MConnection) sendSomePacketMsgs() bool {
    for i := 0; i < numBatchPacketMsgs; i++ {
        if c.sendPacketMsg() {
            return true
        }
    }
    return false
}


func (c *MConnection) sendPacketMsg() bool {
    var leastRatio float32 = math.MaxFloat32
    var leastChannel *Channel
    // 循环,获取优先级最高的channel
    for _, channel := range c.channels {
        if !channel.isSendPending() {
            continue
        }
        ratio := float32(channel.recentlySent) / float32(channel.desc.Priority)
        if ratio < leastRatio {
            leastRatio = ratio
            leastChannel = channel
        }
    }
    // 没有可发送的数据,返回 true
    if leastChannel == nil {
        return true
    }
    // 将数据写到 写缓存区
    _n, err := leastChannel.writePacketMsgTo(c.bufConnWriter)
    if err != nil {
        c.Logger.Error("Failed to write PacketMsg", "err", err)
        c.stopForError(err)
        return true
    }
    // 刷新 缓冲区
    c.flushTimer.Set()
    return false
}

Channel

channel 里主要的就是存取数据。 channel 中定义了一个发送策略。 ratio := float32(channel.recentlySent) / float32(channel.desc.Priority), ratio 值越小的,越有可能被发送。 也就是 channel 的优先值(Priority)越大,且 recentlySent 越小,recentlySent 这个值会被一个定时器稀释,也就是说被加到 channel 中,时间越久的,优先级越高的数据 会被最先发送。

  • 将消息发送(存放)到channel
// noblocking queue message to send to this channel
func (ch *Channel) trySendBytes(bytes []byte) bool {
    select {
    case ch.sendQueue <- bytes:
        atomic.AddInt32(&ch.sendQueueSize, 1)
        return true
    default:
        return false
    }
}
  • 判断当前是否某条消息发送了一半,处于pending 状态
func (ch *Channel) isSendPending() bool {
    if len(ch.sending) == 0 {
        if len(ch.sendQueue) == 0 {
            return false
        }
        // 如果可以放入数据到 sending,返回true
        ch.sending = <-ch.sendQueue
    }
    // sending 中有数据
    return true
}
  • mConnectionchannel 中取消息, EOF 标识该消息是否取完;未取完,则将下一段放入 sending
func (ch *Channel) nextPacketMsg() PacketMsg {
    packet := PacketMsg{}
    packet.ChannelID = byte(ch.desc.ID)
    maxSize := ch.maxPacketMsgPayloadSize
    packet.Bytes = ch.sending[:cmn.MinInt(maxSize, len(ch.sending))]
    if len(ch.sending) <= maxSize {
        packet.EOF = byte(0x01)
        ch.sending = nil
        atomic.AddInt32(&ch.sendQueueSize, -1)
    } else {
        packet.EOF = byte(0x00)
        ch.sending = ch.sending[cmn.MinInt(maxSize, len(ch.sending)):]
    }
    return packet
}
  • mConnection 接收到消息的时候, 也会传给 channel处理,直到接收到完整的(EOF=1)消息,才会返回
func (ch *Channel) recvPacketMsg(packet PacketMsg) ([]byte, error) {
    ch.Logger.Debug("Read PacketMsg", "conn", ch.conn, "packet", packet)
    var recvCap, recvReceived = ch.desc.RecvMessageCapacity, len(ch.recving) + len(packet.Bytes)
    if recvCap < recvReceived {
        return nil, fmt.Errorf("Received message exceeds avaliable capacity: %v < %v", recvCap, recvReceived)
    }
    // 将数据放入 receving 队列
    ch.recving = append(ch.recving, packet.Bytes...)
    if packet.EOF == byte(0x01) {
        // 返回完整数据
        msgBytes := ch.recving
        // clear the slice without re-allocating
        ch.recving = ch.recving[:0]
        return msgBytes, nil
    }
    return nil, nil
}

Switch

switch

func NewSwitch( cfg *config.P2PConfig, transport Transport, options ...SwitchOption) *Switch // 新建 Switch
func (sw *Switch) AddReactor(name string, reactor Reactor) Reactor // 添加Reactor,并设置每个 reactor 的 switch
func (sw *Switch) Broadcast(chID byte, msgBytes []byte) chan bool // 广播消息
func (sw *Switch) NumPeers() (outbound, inbound, dialing int) // 不同类型的peer 数量
func (sw *Switch) StopPeerForError(peer Peer, reason interface{}) // 停止peer(error)
func (sw *Switch) StopPeerGracefully(peer Peer) // 正常停止peer
func (sw *Switch) reconnectToPeer(addr *NetAddress)  // 重新连接该地址的节点
func (sw *Switch) MarkPeerAsGood(peer Peer)  // 将该节点(地址)标志为good
func (sw *Switch) DialPeersAsync(peers []string) error // 异步连接一些节点
func (sw *Switch) DialPeerWithAddress(addr *NetAddress) error  // 连接一个节点
func (sw *Switch) AddPersistentPeers(addrs []string) error // 添加持久的 peer 信息
func (sw *Switch) addOutboundPeerWithConfig(addr *NetAddress, cfg *config.P2PConfig) error // 添加 outbound 节点
func (sw *Switch) addPeer(p Peer) error  // 添加peer
func (sw *Switch) acceptRoutine()  // 接收其他节点的连接

具体实现如下

Switch 里保存了所有的chDesc channel信息, reactorspeers

type Switch struct {
    cmn.BaseService
    config      *config.P2PConfig
    reactors    map[string]Reactor
    chDesc      []*fconn.ChannelDescriptor
    reactorByCh map[byte]Reactor
    peers       *PeerSet
    dialing     *cmn.CMap
    reconnecting *cmn.CMap
    nodeInfo    NodeInfo
    nodeKey     *NodeKey
    addrBook    AddrBook
    persistentPeersAddrs []*NetAddress
    transport   Transport
    ...
}
  • 先看启动函数, 启动了一个routine,处理 inbound 节点的连接
func (sw *Switch) OnStart() error {
    for _, reactor := range sw.reactors {
        err := reactor.Start()
        if err != nil {
            return cmn.ErrorWrap(err, "failed to start %v", reactor)
        }
    }
    go sw.acceptRoutine()
    return nil
}
  • inbound节点连接处理流程
func (sw *Switch) acceptRoutine() {
    for {
        p, err := sw.transport.Accept(peerConfig{
            chDesc:         sw.chDesc,
            onPeerError:    sw.StopPeerForError,
            reactorsByCh:   sw.reactorByCh,
            isPersistent:   sw.isPeerPersistentFn(),
            //metrics:      sw.metrics,
        })
        if err != nil {
            switch err := err.(type) {
            case ErrRejected:
            
            ....
            
            break
        }
        _, in, _ := sw.NumPeers()
        // inbound 节点数量未达到上限
        if in >= sw.config.MaxNumInboundPeers {
            sw.Logger.Info(
                "Ignoring inbound connection: already have enough inbound peers",
                "address", p.SocketAddr(),
                "have", in,
                "max", sw.config.MaxNumInboundPeers,
            )
            sw.transport.Cleanup(p)
            continue
        }
        // 这里添加的是 inbound peer
        if err := sw.addPeer(p); err != nil {
            sw.transport.Cleanup(p)
            if p.IsRunning(){
                _ = p.Stop()
            }
            sw.Logger.Info(
                "Ignoring inbound connection: error while adding peer",
                "err", err,
                "id", p.ID(),
            )
        }
    }
} 
  • 再看看 addPeer
func (sw *Switch) addPeer(p Peer) error {
    // 通过外部注入的函数筛选
    if err := sw.filterPeer(p); err != nil {
        return err
    }
    p.SetLogger(sw.Logger.With("peer", p.SocketAddr()))
    if !sw.IsRunning() {
        sw.Logger.Error("Won't start a peer - switch is not running", "peer", p)
        return nil
    }
    // 利用 peer 预处理所有的 reactor
    for _, reactor := range sw.reactors {
        p = reactor.InitPeer(p)
    }
    // 启动节点
    err := p.Start()
    if err != nil {
        sw.Logger.Error("Error starting peer", "err", err, "peer", p)
        return err
    }

    if err := sw.peers.Add(p); err != nil {
        return err
    }
    // 将peer 添加到每个reactor
    for _, reactor := range sw.reactors {
        reactor.AddPeer(p)
    }
    sw.Logger.Info("Added Peer", "peer", p)
    return nil
}
  • 添加 outbound 节点
func (sw *Switch) addOutboundPeerWithConfig(
    addr *NetAddress,
    cfg *config.P2PConfig,
) error {
    sw.Logger.Info("Dialing peer", "address", addr)
    // 模拟连接失败,重新连接
    if cfg.TestDialFail {
        go sw.reconnectToPeer(addr)
        return fmt.Errorf("dial err (peerConfig.DialFail == true)")
    }
    p, err := sw.transport.Dial(*addr, peerConfig{
        chDesc:         sw.chDesc,
        onPeerError:    sw.StopPeerForError,
        isPersistent:   sw.isPeerPersistentFn(),
        reactorsByCh:   sw.reactorByCh,
        //metrics:      sw.metrics,
    })
    if err != nil {
        switch e := err.(type) {
        case ErrRejected:
            if e.IsSelf() {
                sw.addrBook.RemoveAddress(addr)
                sw.addrBook.AddOurAddress(addr)
                return err
            }
        }
        if sw.isPeerPersistentFn()(addr) {
            go sw.reconnectToPeer(addr)
        }
        return  err
    }
    // 这里调用 addPeer,添加的 outbound peer
    if err := sw.addPeer(p); err != nil {
        sw.transport.Cleanup(p)
        if p.IsRunning() {
            _ = p.Stop()
        }
        return err
    }
    return nil
}
  • 添加一下持久的地址, 连接的时候,若该地址属于持久型的,失败后重新尝试连接(多次)
func (sw *Switch) AddPersistentPeers(addrs []string) error {
    sw.Logger.Info("Adding persistent peers", "addrs", addrs)
    netAddrs, errs := NewNetAddressStrings(addrs)

    for _, err := range errs {
        sw.Logger.Error("Error in peer's address", "err", err)
    }
    // return first non-ErrNetAddressLookup error
    for _, err := range errs {
        if _, ok := err.(ErrNetAddressLookup); ok {
            continue
        }
        return err
    }
    sw.persistentPeersAddrs = netAddrs
    return nil
}
  • 重新连接方法

func (sw *Switch) reconnectToPeer(addr *NetAddress) {
    if sw.reconnecting.Has(string(addr.ID)) {
        return
    }
    sw.reconnecting.Set(string(addr.ID), addr)
    defer sw.reconnecting.Delete(string(addr.ID))

    start := time.Now()
    sw.Logger.Info("Reconnecting to peer", "addr", addr)
    // 最大重试 reconnectAttempts 次
    for i := 0; i < reconnectAttempts; i++ {
        if !sw.IsRunning() {
            return
        }
        err := sw.DialPeerWithAddress(addr)
        if err == nil {
            return
        } else if _, ok := err.(ErrCurrentlyDialingOrExistingAddress); ok {
            return
        }
        sw.Logger.Info("Error reconnecting to peer. Try again", "tries", i, "err", err, "addr", addr)
        sw.randomSleep(reconnectInterval)
        continue
    }

    sw.Logger.Error("Failed to reconnect to peer. Beginning exponential backoff", "addr", addr, "elapsed", time.Since(start))
    // 上述多次尝试失败,再次尝试,时间间隔 指数级增长
    for i := 0; i < reconnectBackOffAttempts; i++ {
        if !sw.IsRunning() {
            return
        }
        sleepIntervalSeconds := math.Pow(reconnectBackOffBaseSeconds, float64(i))
        sw.randomSleep(time.Duration(sleepIntervalSeconds) * time.Second)

        err := sw.DialPeerWithAddress(addr)
        if err != nil {
            return
        } else if _, ok := err.(ErrCurrentlyDialingOrExistingAddress); ok {
            return
        }
        sw.Logger.Info("Error reconnectiing to peer. Try again", "tries", i, "err", err, "addr", addr)
    }
    sw.Logger.Error("Failed to reconnect to peer. Giving up", "addr", addr, "elapsed", time.Since(start))
}

  • 向节点发起连接

func (sw *Switch) DialPeersAsync(peers []string) error {
    netAddrs, errs := NewNetAddressStrings(peers)

    for _, err := range errs {
        sw.Logger.Error("Error in peer's address", "err", err)
    }
    for _, err := range errs {
        if _, ok := err.(ErrNetAddressLookup); ok {
            continue
        }
        return err
    }
    sw.dialPeersAsync(netAddrs)
    return nil
}

func (sw *Switch) dialPeersAsync(netAddrs []*NetAddress) {
    ourAddr := sw.NetAddress()

    if sw.addrBook != nil {
        for _, netAddr := range netAddrs {
            // 过滤我们自己的节点
            if !netAddr.Same(ourAddr) {
                // 添加到地址簿
                if err := sw.addrBook.AddAddress(netAddr, ourAddr); err != nil {
                    if isPrivateAddr(err) {
                        sw.Logger.Debug("Won't add peer's address to addrbook", "err", err)
                    } else {
                        sw.Logger.Debug("cann't add peer's address to addrbook", "err", err)
                    }
                }
            }
        }
        sw.addrBook.Save()
    }
    perm := sw.rng.Perm(len(netAddrs))
    // 讲地址顺序打乱,依次发起连接
    for i := 0; i < len(perm); i++ {
        go func(i int) {
            j := perm[i]
            addr := netAddrs[j]
            if addr.Same(ourAddr) {
                sw.Logger.Debug("Ignore attempt to connect to ourselves", "addr", addr, "ourAddr", ourAddr)
                return
            }
            sw.randomSleep(0)

            err := sw.DialPeerWithAddress(addr)
            if err != nil {
                switch err.(type) {
                case ErrSwitchConnectToSelf, ErrSwitchDuplicatePeerID, ErrCurrentlyDialingOrExistingAddress:
                    sw.Logger.Debug("Error dialing peer", "err", err)
                default:
                    sw.Logger.Error("Error dialing peer", "err", err)
                }
            }
        }(i)
    }
}

  • 还有一个重要的方法 broadcast, 广播一条消息,会对所有的 peer 发送该消息
func (sw *Switch) Broadcast(chID byte, msgBytes []byte) chan bool {
    sw.Logger.Debug("Broadcast", "channel", chID, "msgBytes", fmt.Sprintf("%X", msgBytes))
    peers := sw.peers.List()
    var wg sync.WaitGroup
    wg.Add(len(peers))
    successChan := make(chan bool, len(peers))
    for _, peer := range peers {
        go func(p Peer) {
            defer wg.Done()
            success := p.Send(chID, msgBytes)
            successChan <- success
        }(peer)
    }
    go func() {
        wg.Wait()
        close(successChan)
    }()
    return successChan
}

AddrBook

func (a *addrBook) AddOurAddress(addr *p2p.NetAddress)  // 添加自己的地址
func (a *addrBook) AddPrivateIDs(IDs []string)  // 添加私有地址
func (a *addrBook) AddAddress(addr *p2p.NetAddress, src *p2p.NetAddress) // 添加网络中其他地址 (outbound 地址)
func (a *addrBook) NeedMoreAddrs() bool  // 是否需要更多地址,当地址簿里的地址数量小于某个阀值,返回true
func (a *addrBook) PickAddress(biasTowardsNewAddrs int) *p2p.NetAddress // 从地址簿挑选一个地址,biasTowardsNewAddrs 表示为从 newBucket 中挑选的概率
func (a *addrBook) MarkGood(id p2p.ID) // 将一个地址从newBucket 移动到 oldBucket
func (a *addrBook) MarkAttempt(addr *p2p.NetAddress) // 尝试连接一个节点,该节点的连接次数 + 1
func (a *addrBook) GetSelection() []*p2p.NetAddress // 随机获取一批节点 (new & old)
func (a *addrBook) GetSelectionWithBias(biasTowardsNewAddrs int) []*p2p.NetAddress  // 随机获取一批节点, biasTowardsNewAddrs 为 new 地址的比例
func (a *addrBook) saveRoutine() // 一个常驻的 routine,间隔的将内存中的地址簿保存到磁盘
func (a *addrBook) addToNewBucket(ka *knownAddress, bucketIdx int) // 添加到 newBucket 中
func (a *addrBook) addToOldBucket(ka *knownAddress, bucketIdx int) bool // 添加到 oldBucket 中,如果满了,返回false
func (a *addrBook) removeFromBucket(ka *knownAddress, bucketType byte, bucketIdx int) // 从buckut中移除, ka 属性中有 bucketType
func (a *addrBook) pickOldest(bucketType byte, bucketIdx int) *knownAddress // 获取 bucket 中时间最长的 地址
func (a *addrBook) randomPickAddresses(bucketType byte, num int) []*p2p.NetAddress // 从bucketType中, 随机获取 num 个地址

下面来看一下主要流程的实现

  • 从地址簿获取一个地址,biasTowardsNewAddrs 为获取newbucket 中地址的概率
func (a *addrBook) PickAddress(biasTowardsNewAddrs int) *p2p.NetAddress {
    a.mtx.Lock()
    defer a.mtx.Unlock()

    bookSize := a.size()
    if bookSize <= 0 {
        if bookSize < 0 {
            panic(fmt.Sprintf("Addrbook size %d (new: %d + old: %d) is less than 0", a.nNew+a.nOld, a.nNew, a.nOld))
        }
        return nil
    }
    if biasTowardsNewAddrs > 100 {
        biasTowardsNewAddrs = 100
    }
    if biasTowardsNewAddrs < 0 {
        biasTowardsNewAddrs = 0
    }

    // Bias between new and old addresses.
    oldCorrelation := math.Sqrt(float64(a.nOld)) * (100.0 - float64(biasTowardsNewAddrs))
    newCorrelation := math.Sqrt(float64(a.nNew)) * float64(biasTowardsNewAddrs)

    // pick a random peer from a random bucket
    var bucket map[string]*knownAddress
    pickFromOldBucket := (newCorrelation+oldCorrelation) * a.rand.Float64() < oldCorrelation
    if (pickFromOldBucket && a.nOld == 0) ||
        (!pickFromOldBucket && a.nNew == 0) {
        return nil
    }
    // loop until we pick a random non-empty bucket
    for len(bucket) == 0 {
        if pickFromOldBucket {
            bucket = a.bucketsOld[a.rand.Intn(len(a.bucketsOld))]
        } else {
            bucket = a.bucketsNew[a.rand.Intn(len(a.bucketsNew))]
        }
    }
    // 从bucket 中获取随机 index 的地址
    randIndex := a.rand.Intn(len(bucket))
    for _, ka := range bucket {
        if randIndex == 0 {
            return ka.Addr
        }
        randIndex--
    }
    return nil
}

  • 将地址标志为 Good
// MarkGood implements AddrBook - 标志节点为 good ,并移动到 old bucket
func (a *addrBook) MarkGood(id p2p.ID) {
    a.mtx.Lock()
    defer a.mtx.Unlock()

    ka := a.addrLookup[id]
    if ka == nil {
        return
    }
    ka.markGood()
    if ka.isNew() {
        a.moveToOld(ka)
    }
}
  • 标志节点为尝试状态,尝试次数 +1
// MarkAttempt implements AddrBook 
// 标志着尝试去连接这个地址
func (a *addrBook) MarkAttempt(addr *p2p.NetAddress) {
    a.mtx.Lock()
    defer a.mtx.Unlock()

    ka := a.addrLookup[addr.ID]
    if ka == nil {
        return
    }
    ka.markAttempt()
}

  • 一旦一个地主被标志为 bad, 则从地址簿中移除
// MarkBad implements AddrBook.
// 移除该地址.
func (a *addrBook) MarkBad(addr *p2p.NetAddress) {
    a.RemoveAddress(addr)
}
  • 随机获取一批地址,具体数量由程序中计算。在 newBucketoldBucket 中随机挑选
// GetSelection implements AddrBook.
// It randomly selects some addresses (old & new). Suitable for peer-exchange protocols.
// Must never return a nil address.
func (a *addrBook) GetSelection() []*p2p.NetAddress {
    a.mtx.Lock()
    defer a.mtx.Unlock()

    bookSize := a.size()
    if bookSize <= 0 {
        if bookSize < 0 {
            panic(fmt.Sprintf("Addrbook size %d (new: %d + old: %d) is less than 0", a.nNew+a.nOld, a.nNew, a.nOld))
        }
        return nil
    }
    // 地址数量的计算方式
    numAddresses := cmn.MaxInt(
        cmn.MinInt(minGetSelection, bookSize),
        bookSize*getSelectionPercent/100)
    numAddresses = cmn.MinInt(maxGetSelection, numAddresses)

    allAddr := make([]*p2p.NetAddress, bookSize)
    i := 0
    for _, ka := range a.addrLookup {
        allAddr[i] = ka.Addr
        i++
    }
    // Fisher-Yates shuffle the array. We only need to do the first
    // `numAddresses' since we are throwing the rest.
    for i := 0; i < numAddresses; i++ {
        // pick a number between current index and the end
        j := cmn.RandIntn(len(allAddr)-i) + i
        allAddr[i], allAddr[j] = allAddr[j], allAddr[i]
    }
    return allAddr[:numAddresses]
}
  • 随机挑选,biasTowardsNewAddrs 为新节点的比例,与上述方法效果类似。
func (a *addrBook) GetSelectionWithBias(biasTowardsNewAddrs int) []*p2p.NetAddress {
    a.mtx.Lock()
    defer a.mtx.Unlock()

    bookSize := a.size()
    if bookSize <= 0 {
        if bookSize < 0 {
            panic(fmt.Sprintf("Addrbook size %d (new: %d + old: %d) is less than 0", a.nNew+a.nOld, a.nNew, a.nOld))
        }
        return nil
    }

    if biasTowardsNewAddrs > 100 {
        biasTowardsNewAddrs = 100
    }
    if biasTowardsNewAddrs < 0 {
        biasTowardsNewAddrs = 0
    }

    numAddresses := cmn.MaxInt(
        cmn.MinInt(minGetSelection, bookSize),
        bookSize*getSelectionPercent/100)
    numAddresses = cmn.MinInt(maxGetSelection, numAddresses)
    
    // number of new addresses that, if possible, should be in the beginning of the selection
    // if there are no enough old addrs, will choose new addr instead.
    numRequiredNewAdd := cmn.MaxInt(percentageOfNum(biasTowardsNewAddrs, numAddresses), numAddresses-a.nOld)
    selection := a.randomPickAddresses(bucketTypeNew, numRequiredNewAdd)
    selection = append(selection, a.randomPickAddresses(bucketTypeOld, numAddresses-len(selection))...)
    return selection
}

  • 持久化routine, 定时将内存中地址保存到磁盘
func (a *addrBook) saveRoutine() {
    defer a.wg.Done()
    saveFileTicker := time.NewTicker(dumpAddressInterval)
out:
    for {
        select {
        case <-saveFileTicker.C:
            a.saveToFile(a.filePath)
        case <-a.Quit():
            break out
        }
    }
    saveFileTicker.Stop()
    a.saveToFile(a.filePath)
}
  • 获取 bucket中 最先加入的地址
// 对应bucketType 中,最久的地址
func (a *addrBook) pickOldest(bucketType byte, bucketIdx int) *knownAddress {
    bucket := a.getBucket(bucketType, bucketIdx)
    var oldest *knownAddress
    for _, ka := range bucket {
        if oldest == nil || ka.LastAttempt.Before(oldest.LastAttempt) {
            oldest = ka
        }
    }
    return oldest
}
  • 淘汰Bucket中的地址; newBucket中如果尝试连接次数很多也没有成功的, oldBucket 中时间最久的。都要被淘汰掉
func (a *addrBook) expireNew(bucketIdx int) {
    for addrStr, ka := range a.bucketsNew[bucketIdx] {
        // If an entry is bad, throw it away
        if ka.isBad() {
            a.Logger.Info(fmt.Sprintf("expiring bad address %v", addrStr))
            a.removeFromBucket(ka, bucketTypeNew, bucketIdx)
            return
        }
    }
    // If we haven't thrown out a bad entry, throw out the oldest entry
    oldest := a.pickOldest(bucketTypeNew, bucketIdx)
    a.removeFromBucket(oldest, bucketTypeNew, bucketIdx)
}

Transport

Transport 主要是向对方发起 Dial 连接,或者 Accept 对方的连接请求的。
如果 Dial 成功,则添加一个 outbound peer, 如果 Accept 成功,则添加一个 inbound peer

  • 我们先看 Accept 方法, 监听 <-mt.acceptc, 一旦有 conn 接入, 调用 wrapPeer() 封装一个 inbound peer, 返回给Switch
func (mt *MultiplexTransport) Accept(cfg peerConfig) (Peer, error) {
    select {
    case a := <-mt.acceptc:
        if a.err != nil {
            return nil, a.err
        }
        cfg.outbound = false
        return mt.wrapPeer(a.conn, a.nodeInfo, cfg, a.netAddr), nil
    case <-mt.closec:
        return nil, ErrTransportClosed{}
    }
}

  • 监听的 routine, 在 switch 创建的时候就启动了
// Listen implements transportLifecycle.
func (mt *MultiplexTransport) Listen(addr NetAddress) error {
    ln, err := net.Listen("tcp", addr.DialString())
    if err != nil {
        return err
    }

    mt.netAddr = addr
    mt.listener = ln

    go mt.acceptPeers()

    return nil
}

func (mt *MultiplexTransport) acceptPeers() {
    for {
        c, err := mt.listener.Accept()
        if err != nil {
            select {
            case _, ok := <-mt.closec:
                if !ok {
                    return
                }
            default:

            }
            mt.acceptc <- accept{err: err}
            return
        }

        go func(c net.Conn) {
            var (
                nodeInfo        NodeInfo
                secretConn      net.Conn
                netAddr         *NetAddress
            )
            err := mt.filterConn(c)
            if err == nil {
                //secretConn, nodeInfo, err := mt.upgrade(c, nil)
                secretConn = c
                // 握手过程,拿到对方的 nodeInfo
                nodeInfo, err = handshake(secretConn, mt.handshakeTimeout, mt.nodeInfo)
                if err == nil {
                    addr := secretConn.RemoteAddr()
                    id := PubKeyToID(mt.nodeKey.PubKey())
                    netAddr = NewNetAddress(id, addr)
                }
            }
            select {
            // 封装一个 accept,转发个 Accept 函数
            case mt.acceptc <- accept{netAddr, secretConn, nodeInfo, err}:
            case <-mt.closec:
                _ = c.Close()
                return
            }
        }(c)
    }
}
  • 在看看 Dial 方法, 返回 peer 给 Switch
func (mt *MultiplexTransport) Dial(
    addr NetAddress,
    cfg  peerConfig,
    ) (Peer, error) {
    // 调用 netAddress 的 DialTimeout()
    c, err := addr.DialTimeout(mt.dialTimeout)
    if err != nil {
        return nil, err
    }
    // upgrade 函数中调用了 handshake 函数,拿到对方的 nodeInfo
    nodeInfo, err := mt.upgrade(c, &addr)
    if err != nil {
        return nil, err
    }

    cfg.outbound = true
    // 新建一个 outbound 节点
    p := mt.wrapPeer(c, nodeInfo, cfg, &addr)
    return p, nil
}

// DialTimeout calls net.DialTimeout on the address.
func (na *NetAddress) DialTimeout(timeout time.Duration) (net.Conn, error) {
    conn, err := net.DialTimeout("tcp", na.DialString(), timeout)
    if err != nil {
        return nil, err
    }
    return conn, nil
}

  • handshake 函数也是很重要的。handshake 启动两个 routine,必须两个routine 都走完,才算握手成功。handshake 方法被两处调用, 即 Dial,Accept。 所以当两个节点连接上, handshake 即可完成。
func handshake(c    net.Conn,   timeout time.Duration,  nodeInfo NodeInfo ) (NodeInfo, error) {
    if err := c.SetDeadline(time.Now().Add(timeout)); err != nil {
        return nil, err
    }
    var (
        errc = make(chan error, 2)
        peerNodeInfo DefaultNodeInfo
        ourNodeInfo = nodeInfo.(DefaultNodeInfo)
    )
    go func(errc chan<- error, c net.Conn) {
        // 把本地peer 的nodeinfo 写入到 writer
        _, err := cdc.MarshalBinaryLengthPrefixedWriter(c, ourNodeInfo)
        errc <- err
    }(errc, c)
    go func(errc chan<- error, c net.Conn) {
        // 从 reader 中读入 对方写入的 nodeInfo
        _, err := cdc.UnmarshalBinaryLengthPrefixedReader(
            c,
            &peerNodeInfo,
            int64(MaxNodeInfoSize()),
        )
        errc <- err
    }(errc, c)

    for i := 0; i < cap(errc); i++ {
        err := <-errc
        if err != nil {
            return nil, err
        }
    }
    // 返回对方的 nodeInfo
    return peerNodeInfo, c.SetDeadline(time.Time{})
}

PEXReactor

PEXReactor 主要功能就是进行节点发现,这是p2p网络中重要的一环。PEXReactor 也是Reactor 的一个具体实现。

func (r *PEXReactor) AddPeer(p Peer) // 添加节点,添加到地址簿中
func (r *PEXReactor) RemovePeer(p Peer, reason interface{}) // 移除节点
func (r *PEXReactor) GetChannels() []*conn.ChannelDescriptor // 获取该reactor 的channel 信息
func (r *PEXReactor) Receive(chID byte, src Peer, msgBytes []byte) // 收到peer发来的消息

func NewPEXReactor(b AddrBook, config *PEXReactorConfig) *PEXReactor // 新建一个 reactor
func (r *PEXReactor) receiveRequest(src Peer) error // 处理收到的信息,判断收到的消息(向本节点请求更多地址)是否有效
func (r *PEXReactor) SendAddrs(p Peer, netAddrs []*p2p.NetAddress) // 给 peer 发送一些节点
func (r *PEXReactor) ReceiveAddrs(addrs []*p2p.NetAddress, src Peer) error // 收到 peer 发来的一些节点
func (r *PEXReactor) ensurePeersRoutine()  // 新开一个routine,不断的从地址簿取出地址,确保一定数量的连接
func (r *PEXReactor) ensurePeers() // 被 ensurePeersRoutine 调用
func (r *PEXReactor) AttemptsToDial(addr *p2p.NetAddress) int // 获取该地址的 尝试连接 次数
func (r *PEXReactor) crawlPeersRoutine() // seedMode 下启用,抓取更多peer地址
func (r *PEXReactor) dialPeer(addr *p2p.NetAddress) error // 连接该地址
func (r *PEXReactor) dialSeeds() // 如果有种子节点,连接种子节点
func (r *PEXReactor) crawlPeers(addrs []*p2p.NetAddress)  // 向这些地址请求更多的地址
func (r *PEXReactor) attemptDisconnects() // 超过 SeedDisconnectWaitPerild 时长的节点,断开连接

  • 首先还是先看服务启动函数
func (r *PEXReactor) OnStart() error {
    // 先启动地址簿
    err := r.book.Start()
    if err != nil && err != cmn.ErrAlreadyStarted {
        return err
    }
    numOnline, seedAddrs, err := r.checkSeeds()
    if err != nil {
        return err
    } else if numOnline == 0 && r.book.Empty() {
        return errors.New("Address book is empty and cann't resolve any seed nodes")
    }
    r.seedAddrs = seedAddrs
    if r.config.SeedMode {
        // 如果是 seedMode ,像其他节点请求更多地址
        go r.crawlPeersRoutine()
    } else {
        // 从地址簿获取节点通信,不请求更多节点
        go r.ensurePeersRoutine()
    }
    return nil
}
  • channel 获取; PEXReactor中定义的ChannelDescriptor(channel 描述), 用于switch 收发消息后的转发标识。
// GetChannels implements Reactor
func (r *PEXReactor) GetChannels() []*conn.ChannelDescriptor {
    return []*conn.ChannelDescriptor{
        {
            ID:                PexChannel,
            Priority:          1,
            SendQueueCapacity: 10,
        },
    }
}
  • 确保一定数量的节点在线 (如果地址簿有足够多的数量)
// 定时调用 r.ensurePeers 
func (r *PEXReactor) ensurePeersRoutine()  {
    var (
        seed = cmn.NewRand()
    )
    r.ensurePeers()

    ticker := time.NewTicker(r.ensurePeersPeriod)
    for {
        select {
        case <-ticker.C:
            r.ensurePeers()
        case <-r.Quit():
            ticker.Stop()
            return
        }
    }
}

func (r *PEXReactor) ensurePeers() {
    var (
        out, in, dial = r.Switch.NumPeers()
        // 剩余可连接数量
        numToDial = r.Switch.MaxNumOutboundPeers() - (out + dial)
    )
    if numToDial <= 0 {
        return
    }
    // [10, 90]
    newBias := cmn.MinInt(out, 8) * 10 + 10
    toDial := make(map[p2p.ID] * p2p.NetAddress)
    // 最多可连接的 总次数
    maxAttempts := numToDial * 3
    for i := 0; i < maxAttempts && len(toDial) < numToDial; i++ {
        // 获取一个地址
        try := r.book.PickAddress(newBias)
        if try == nil {
            continue
        }
        if _, selected := toDial[try.ID]; selected {
            continue
        }
        if r.Switch.IsDialingOrExistingAddress(try) {
            continue
        }
        r.Logger.Info("Will dial address", "addr", try)
        toDial[try.ID] = try
    }
    // 将获取的地址,依次发起连接
    for _, addr := range toDial {
        go func(addr *p2p.NetAddress) {
            err := r.dialPeer(addr)
            if err != nil {
                switch err.(type) {
                case errMaxAttemptsToDial, errTooEarlyToDial:
                    r.Logger.Debug(err.Error(), "addr", addr)
                default:
                    r.Logger.Error(err.Error(), "addr", addr)
                }
            }
        }(addr)
    }
}
  • 连接某个地址
func (r *PEXReactor) dialPeer(addr *p2p.NetAddress) error {
    attempts, lastDialed := r.dialAttemptsInfo(addr)
    // 是否已超过尝试连接次数
    if attempts > maxAttemptsToDial {
        r.book.MarkBad(addr)
        return errMaxAttemptsToDial{}
    }
    if attempts > 0 {
        jitterSeconds := time.Duration(cmn.RandFloat64() * float64(time.Second))
        backoffDuration := jitterSeconds + ((1 << uint(attempts)) * time.Second)
        sinceLastDialed := time.Since(lastDialed)
        if sinceLastDialed < backoffDuration {
            return errTooEarlyToDial{backoffDuration, lastDialed}
        }
    }
    // 调用 switch 的连接函数
    err := r.Switch.DialPeerWithAddress(addr)
    if err != nil {
        ....
        return errors.Wrapf(err, "dialing failed (attempts: %d)", attempts + 1)
    }
    r.attempsToDial.Delete(addr.DialString())
    return nil
}
  • 定时向其他节点请求地址
func (r *PEXReactor) crawlPeersRoutine() {
    // 如果配置了 seed 信息, 则连接 seed 节点
    if len(r.seedAddrs) > 0 {
        r.dialSeeds()
    } else {
        // 从地址簿的节点开始请求
        r.crawlPeers(r.book.GetSelection())
    }
    ticker := time.NewTicker(crawlPeerPeriod)
    for {
        select {
        case <-ticker.C:
            r.attemptDisconnects()
            r.crawlPeers(r.book.GetSelection())
            r.cleanupCrawlPeerInfos()
        case <-r.Quit():
            return
        }
    }
}
  • 连接 seed 节点
func (r *PEXReactor) dialSeeds() {
    perm := cmn.RandPerm(len(r.seedAddrs))
    for _, i := range perm {
        seedAddr := r.seedAddrs[i]
        err := r.Switch.DialPeerWithAddress(seedAddr)
        if err == nil {
            return
        }
        r.Switch.Logger.Error("Error dialing seed", "err", err, "seed", seedAddr)
    }
    r.Switch.Logger.Error("Couldn't connect to any seeds")
}
  • 请求更多地址

func (r *PEXReactor) crawlPeers(addrs []*p2p.NetAddress) {
    now := time.Now()
    for _, addr := range addrs {
        peerInfo, ok := r.crawlPeerInfos[addr.ID]

        ....
        // 先连接节点
        err := r.dialPeer(addr)
        if err != nil {
            switch err.(type) {
            case errMaxAttemptsToDial, errTooEarlyToDial:
                r.Logger.Debug(err.Error(), "addr", addr)
            default:
                r.Logger.Error(err.Error(), "addr", addr)
            }
            continue
        }
        peer := r.Switch.Peers().Get(addr.ID)
        if peer != nil {
            // 连接成功,获取节点后 发起请求更多地址
            r.RequestAddrs(peer)
        }
    }
}
  • 定时 routine 中调用 attemptDisconnects 方法,断开连接时间较长的节点
// 超过 SeedDisconnectWaitPerild 时长的节点,断开连接
func (r *PEXReactor) attemptDisconnects() {
    for _, peer := range r.Switch.Peers().List() {
        if peer.Status().Duration < r.config.SeedDisconnectWaitPeriod {
            continue
        }
        if peer.IsPersistent() {
            continue
        }
        r.Switch.StopPeerGracefully(peer)
    }
}
  • 还有一个重要的方法, 接收到消息时调用
// Receive implements Reactor by handling incoming PEX messages.
func (r *PEXReactor) Receive(chID byte, src Peer, msgBytes []byte) {
    msg, err := decodeMsg(msgBytes)
    if err != nil {
        r.Logger.Error("Error decoding message", "src", src, "chId", chID, "msg", msg, "err", err, "bytes", msgBytes)
        r.Switch.StopPeerForError(src, err)
        return
    }
    r.Logger.Debug("Received message", "src", src, "chId", chID, "msg", msg)

    switch msg := msg.(type) {
    case *pexRequestMessage:
        // 如果当前节点时seedMode ,接收对方的节点信息请求,针对一次连接,该节点只能请求一次
        if r.config.SeedMode && !src.IsOutbound() {
            id := string(src.ID())
            // 如果已经请求过,直接返回
            v := r.lastReceivedRequests.Get(id)
            if v != nil {
                return
            }
            r.lastReceivedRequests.Set(id, time.Now())
            // 返回给对方 一些地址
            r.SendAddrs(src, r.book.GetSelectionWithBias(biasToSelectNewPeers))
            go func() {
                src.FlushStop()
                r.Switch.StopPeerGracefully(src)
            }()
        } else {
            // 首先检查对方请求次数是否太频繁
            if err := r.receiveRequest(src); err != nil {
                r.Switch.StopPeerForError(src, err)
                return
            }
            r.SendAddrs(src, r.book.GetSelection())
        }
    case *pexAddrsMessage:
        // 收到 一些地址 信息
        if err := r.ReceiveAddrs(msg.Addrs, src); err != nil {
            r.Switch.StopPeerForError(src, err)
            return
        }
    default:
        r.Logger.Error(fmt.Sprintf("Unknown message type %v", reflect.TypeOf(msg)))
    }
}

Almost Done !!!

你可能感兴趣的:(P2P 对等节点源码解析)