使用 Porcupine 进行线性一致性测试

在测试分布式系统的线性一致性这篇文章里面，提到了一个用 Go 实现的线性一致性验证系统：Porcupine，这里简单介绍一下。Porcupine 是基于 Faster linearizability checking via P-compositionality 这篇 paper，具体这篇 paper 是怎样的，我数学不好，没法深入理解，如果有谁研究了，欢迎交流 :sob:。

Porcupine 的使用非常简单，他就两个验证函数，CheckOperations 和 CheckEvents，其实这两个差不多，我一般直接使用了 Event，所以这里就用 CheckEvents 来举例说明了。

Event

Event 的定义如下：

type Event struct {
    Kind  EventKind
    Value interface{}
    Id    uint
}

Kind 是 event 的类型，就两种，Call 和 Return，因为我们的任何操作，其实就是先发起一个 request，也就是 Call，然后收到 response，也就是 Return。Value 是一个 interface，也就是我们自己实际操作的 request 或者 response。Id 就是这个 event 的标识，request 和 response 必须用相同的 Id。假设我们实现一个简单的 Key-Value store，就只会对一个 key 操作，定义 request 和 response 如下：

type Request struct {
    // 0 for read, 1 for put
    Op int
    // put value
    Value int
}

type Resposne struct {
    // for read value
    Value int
    // for put ok
    Ok bool
}

那么，一个 event history 可能如下：

Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 1 }
Event {EventKind: CallEvent, Value: Request { Op: 1, Value: 10 }, Id: 2 } 
Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 1 }
Event {EventKind: ReturnEvent, Value: Response { Ok: true }, Id: 2 }

然后，我们将对应的 event list 扔给 CheckEvents 函数，就能知道这个 history 是否是线性一致性的了。那么，CheckEvents 是如何判断的呢？这里，我们就需要为相应的测试 case 定制自己的 model。

Model

在 Porcupine 里面，Model 的定义如下:

type Model struct {
    // Initial state of the system.
    Init func() interface{}
    // Step function for the system. Returns whether or not the system
    // could take this step with the given inputs and outputs and also
    // returns the new state. This should not mutate the existing state.
    Step func(state interface{}, input interface{}, output interface{}) (bool, interface{})
}

上面我们去掉了一些无关的 field function，主要关注两个，Init 和 Step。Init 是整个系统初始的状态，譬如对于我们上面的 case，我们可以让 Init 函数 return 5。重点关注的是 Step 函数，Step 其实就是一个状态机变更，它会根据当前 event 的 input 和 output 尝试从一个 state 变更到另一个 state。返回 true 就标明变更成功，进入一个新的状态了。那对我们上面的 key-value case 来说，Step 函数可以写成：

func Step(state interface{}, input interface{}, output interface{}) (bool, interface{})
        st := state.(int)
        inp := input.(Request)
        out := output.(Response)
        // for read
        if inp == 0 {
            return st == out.Value, st
        }
        
        // for write
        if !out.ok {
            return true, st
        }
        return out.ok, inp.value

对于 read request 来说，只有读出来的值跟上一次的状态值一致，我们才返回 true，而对于 write 来说，如果成功了，就返回新的状态。

对于前面的 event history，显而易见，是线性一致的，但如果我们换成这样：

Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 1 }
Event {EventKind: CallEvent, Value: Request { Op: 1, Value: 10 }, Id: 2 } 
Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 1 }
Event {EventKind: ReturnEvent, Value: Response { Ok: true }, Id: 2 }
Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 3 }
Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 3 }

那么显而易见，这并不是线性一致性的，因为操作 3 在读取之前，2 一定写入了新的值。

Timeout

上面的例子很简单，但在实际的分布式环境中，一个操作，我们有三种状态，成功，失败，当然还有超时。譬如一个写操作超时了，那么它到底写没有写进去，我们是没有办法判断的。对于这样的 event，我们只能让 Porcupine 认为，它会在 invoke time 或者 infinite time 之间的任何一个点完成，也就是，如果我们碰到了一个是 timeout 的 response，我们应该暂存起来，然后在整个 history 的最后，才把这个 response 加进去，这样 Porcupine 才能正确处理。

具体到上面的 key-value 例子，我们在 Response 上面添加一个 Unknown 字段，如果一个 response 是 unknown 的，我们也会认为它在 Step 的时候能返回 true，所以对于 write，就是 return out.Ok || out.Unknown, inp.Value

我们在最后才将这种 Unknown 的 response 放到 history 里面，譬如：

Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 1 }
Event {EventKind: CallEvent, Value: Request { Op: 1, Value: 10 }, Id: 2 } 
Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 1 }
Event {EventKind: ReturnEvent, Value: Response { Unknown: true }, Id: 2 }

如果把 Unknown 放到中间加入到 history 里面，Porcupine 是不知道如何去检测的。

小结

上面简单了介绍了 Porcupine 的使用，可以看到非常的简单。既然我们现在有了一个用 Go 实现的分布式线性一致性验证框架，那么下一步事情就非常的明了了，将 Porcupine 加入到一个分布式测试框架里面，提供一套类似 jepsen 的 Go 解决方案。虽然 Jepsen 很强大，而且 TiDB 也有 Jepsen 的 case，但相对于 Jepsen 用的 clojure，还是 Go 最舒服。

使用 Porcupine 进行线性一致性测试

Event

Model

Timeout

小结

你可能感兴趣的:(使用 Porcupine 进行线性一致性测试)