by Tal Kol
通过塔尔科尔
From time to time you may find yourself facing a daunting task: building a server that really isn’t allowed to fail, a project where the cost of error is extraordinarily high. What is the methodology for approaching such a task?
有时您可能会发现自己面临着艰巨的任务:构建真正不允许出现故障的服务器,这是一个错误成本非常高的项目。 解决这一任务的方法是什么?
Before diving into this excessive workflow, you should ask yourself — does my server really need to be bulletproof? There’s a lot of overhead involved in preparing for the worst, and it’s not always worth it.
在进入这种过多的工作流程之前,您应该问自己-我的服务器真的需要防弹吗? 为最坏的情况做准备会涉及很多开销,但这并不总是值得的。
If the cost of error isn’t extraordinarily high, a perfectly valid approach is to make a reasonable best effort for things to work, and if your server breaks, just deal with it. Monitoring tools today and modern workflows of continuous delivery allow us to spot problems in production quickly and fix them almost immediately. For many cases, this is good enough.
如果错误的代价不是很高,那么一种完美有效的方法是尽最大努力使事情正常运行,并且如果服务器出现故障,请对其进行处理。 当今的监控工具和持续交付的现代工作流程使我们能够Swift发现生产中的问题并几乎立即解决它们。 在许多情况下,这已经足够了。
In the project I’m working on today, it isn’t. I’m working on implementing a blockchain — a distributed server infrastructure for executing code securely under consensus in a low trust environment. One of the applications of this technology is digital currencies. This is a textbook example where the cost of error is literally high. We naturally want its implementation to be as bulletproof as possible.
在我今天正在从事的项目中,事实并非如此。 我正在努力实现区块链 -一种分布式服务器基础结构,用于在低信任度环境下以一致的方式安全地执行代码。 该技术的应用之一是数字货币。 这是一个教科书示例,其中错误的代价确实很高。 我们自然希望它的实现尽可能地防弹。
There are other cases though, even when not dealing with currencies, where bulletproof code makes sense. The cost of maintenance skyrockets quickly for a codebase that fails frequently. Being able to identify problems earlier in the development cycle, when the cost of fixing them is still low, has a good chance of paying back the upfront investment in a bulletproof methodology.
但是,在其他情况下,即使不处理货币,防弹代码也是有意义的。 对于经常失败的代码库,维护成本飞速上涨。 当解决问题的成本仍然很低时,能够在开发周期的早期发现问题,就很有可能以防弹方法偿还前期投资。
Test Driven Development (TDD) is often hailed as the silver bullet against malfunctioning code. It is a puristic development methodology where new code isn’t added unless it satisfies a failing test. This process guarantees test coverage of 100 percent and often gives the illusion that your code is tested against every possible scenario.
测试驱动开发 (TDD)通常被誉为防止故障代码的灵丹妙药。 这是一种纯粹的开发方法,除非满足失败的测试,否则不添加新代码。 这个过程保证了100%的测试覆盖率,并且常常给人一种幻觉,即您的代码已针对每种可能的情况进行了测试。
This isn’t the case. TDD is a great methodology that works well for some, but by itself it still isn’t enough. Even worse, TDD instills false confidence in code and may make developers lazy when considering paranoid edge cases. I’ll show a good example of this later on.
事实并非如此。 TDD是一种不错的方法,对某些人来说效果很好,但仅凭它还不够。 更糟糕的是,TDD在代码中产生了错误的信心,并可能在考虑偏执的极端情况时使开发人员变得懒惰。 稍后我将展示一个很好的例子。
It doesn’t matter if you write tests before the fact or after, using a technique like TDD or not. All that matters is that you have tests. Tests are the best line of defense for protecting your code against breaking in production.
是否在事实之前或之后使用TDD之类的技术编写测试都没有关系。 重要的是您要进行测试。 测试是保护代码免遭生产中断的最佳防线。
Since we’re going to run our entire test suite very frequently — after every new line of code if possible — tests must be automated. No part of our confidence in our code can result from a manual QA process. Humans make mistakes. Human attention to detail deteriorates after doing the same mind-numbing task a hundred times in a row.
由于我们要非常频繁地运行整个测试套件(如果可能,每行新代码之后),因此必须使测试自动化。 手动的质量检查流程不会使我们对代码充满信心。 人类会犯错误。 在进行同样的麻木之后,人们对细节的关注会下降 连续执行一百次任务。
Tests must be fast. Blazingly fast.
测试必须快速。 快如闪电。
If a test suite takes more than a few seconds to run, developers are likely going to become lazy, pushing code without running it. This is one of the great things about Go — it has one of the fastest toolchains out there. It compiles, rebuilds, and tests in seconds.
如果测试套件的运行时间超过几秒钟,则开发人员可能会变得懒惰,在不运行代码的情况下推送代码。 这是Go的一大优点-它拥有最快的工具链之一。 它可以在几秒钟内完成编译,重建和测试。
Tests are also important enablers for open-source projects. Blockchains, for example, are almost religiously open-source. The codebase must be open to establish trust — expose itself for audit and create a decentralized atmosphere where no single governing entity controls the project.
测试也是开源的重要推动力 项目。 例如,区块链在宗教上几乎是开源的。 该代码库必须开放以建立信任-将自身暴露给审计并创建一种分散的氛围,其中没有一个管理实体可以控制该项目。
It is unreasonable to expect massive external contributions in an open-source project without a thorough test suite. External contributors need a quick way to check if their contribution breaks any existing behavior. The entire test suite, in fact, must run automatically on every pull request and fail automatically if the PR broke anything.
如果没有完整的测试套件,就不可能在开源项目中做出大量外部贡献。 外部贡献者需要一种快速的方法来检查其贡献是否破坏了任何现有行为。 实际上,整个测试套件必须在每个拉取请求中自动运行,并且如果PR发生任何故障,则自动失败。
Full test coverage is a misleading metric, but it is important. It may feel excessive to reach 100% coverage, but when you think about it, it makes no sense to ship code to production that was never executed beforehand.
完整的测试覆盖率是一个令人误解的指标,但很重要。 达到100%的覆盖率可能会感觉过高,但是当您考虑它时,将代码运送到从未执行过的生产中是没有意义的。
Full test coverage doesn’t necessarily mean that we have enough tests and it doesn’t mean that our tests are meaningful. What is certain is that if we don’t have 100% coverage, we don’t have enough to consider ourselves bulletproof, since parts of our code were never tested.
完整的测试覆盖范围并不一定意味着我们有足够的测试,也并不意味着我们的测试有意义。 可以肯定的是,如果我们没有100%的覆盖率,那么我们就没有足够的能力来考虑防弹,因为我们的代码的某些部分从未经过测试。
Nevertheless, there is such a thing as too many tests. Ideally, every bug we encounter should break a single test. If we have redundant tests — different tests that check the same thing — modifying existing code and breaking existing behavior in the process will incur too much overhead in fixing failed tests.
但是,有太多的测试。 理想情况下,我们遇到的每个错误都应打破单个测试。 如果我们有冗余测试-不同的测试可以检查同一件事-修改现有代码并破坏流程中的现有行为将在修复失败的测试中产生过多的开销。
Go is statically typed. Types provide a contract between various pieces of code running together. Without automatic type checking during build, if we wanted to adhere to our strict coverage rules, we would have to implement these contract tests ourselves. This is the case with environments like Node.js and JavaScript. Writing comprehensive contract tests manually is a lot of extra work we prefer to avoid.
Go是静态类型的。 类型提供了一起运行的各种代码之间的契约。 如果在构建过程中没有自动类型检查,那么如果我们要遵守严格的覆盖规则,则必须自己实施这些合同测试。 像Node.js和JavaScript这样的环境就是这种情况。 手动编写全面的合同测试是我们要避免的许多额外工作。
Go is simple and dogmatic. Go is known for being stripped of many traditional language features like classic OOP inheritance. Complexity is the worst enemy of bulletproof code. Problems tend to creep up in the seams. While the common case is easy to test, it’s the strange edge case you haven’t thought of that will eventually get you.
围棋简单而教条。 Go被剥夺了许多传统语言功能(例如经典OOP继承)的功能。 复杂性是防弹代码的最大敌人。 问题往往在接缝处蔓延。 尽管常见的情况很容易测试,但这是您从未想到过的奇怪的极端情况,它最终会帮助您。
Dogma is also helpful in this sense. There’s often only one way to do something in Go. This may inhibit the free spirit of man, but when there’s one way to do something, it’s more difficult to get this one thing wrong.
从这个意义上讲,教条也是有帮助的。 Go中通常只有一种方法来做某事。 这可能会抑制人的自由精神,但是当有一种方法可以做某事时,将这一件事弄错就更困难了。
Go is concise yet expressive. Readable code is easier to review and audit. If the code is too verbose, its core purpose may be drowned by the noise of boilerplate. If the code is too concise, it becomes hard to follow and understand.
Go简洁而富有表现力。 可读的代码更易于查看和审核。 如果代码太冗长,其核心目的可能会被样板的噪音淹没。 如果代码过于简洁,则很难遵循和理解。
Go strikes a nice balance between the two. There’s not a lot of language boilerplate like in Java or C++, but the language is still very explicit and verbose in areas like error handling — making it easy to verify that you’ve checked every possible route.
Go在两者之间取得了很好的平衡。 没有像Java或C ++这样的语言样板很多,但是在错误处理等领域,该语言仍然非常明确和冗长-可以很容易地验证是否已检查所有可能的路线。
Go has clear paths of error and recovery. Dealing gracefully with errors in runtime is a cornerstone for bulletproof code. Go has a strict convention of how errors are returned and propagated. Environments like Node.js — where multiple flavors of control flow like callbacks, promises, and async are mixed together — often result in leakage like unhandled promise rejections. Recovering from these is almost impossible.
Go具有明确的错误和恢复路径。 优雅地处理运行时错误是防弹代码的基石。 Go对于错误的返回和传播方式具有严格的约定。 诸如Node.js之类的环境(如回调,promise和async等多种控制流混合在一起)通常会导致泄漏, 如未处理的promise拒绝 。 从这些中恢复几乎是不可能的 。
Go has an extensive standard library. Dependencies add risk, especially when coming from sources that aren’t necessarily well-maintained. When shipping your server, you ship all of your dependencies with it. You are responsible for their malfunctions as well. Environments overflowing with fragmented dependencies, like Node.js, are harder to keep bulletproof.
Go具有广泛的标准库。 依赖关系会增加风险,尤其是在来源不一定维护得当的情况下。 运送服务器时,将运送所有依赖项。 您也应对其故障负责。 诸如Node.js之类的零散依赖所溢出的环境更难保持安全。
This is also risky from a security standpoint, as you are as vulnerable as your weakest dependency. Go’s extensive standard library is well-maintained and reduces reliance on external dependencies.
从安全的角度来看,这也是有风险的,因为您与最弱的依赖一样脆弱。 Go的广泛标准库维护良好,减少了对外部依赖项的依赖。
Development velocity is still rapid. The main appeal of environments like Node.js is an extremely rapid development cycle. Code just takes less time to write and you become more productive.
发展速度仍然很快。 像Node.js这样的环境的主要吸引力在于其开发周期非常快。 代码只需花费较少的时间来编写,因此您的工作效率更高。
Go preserves these benefits quite well. The build toolchain is fast enough to make feedback immediate. Compilation time is negligible, and code seems to run like it’s interpreted. The language has enough abstractions like garbage collection to focus engineering efforts on core functionality.
Go可以很好地保留这些好处。 构建工具链足够快,可以立即反馈。 编译时间可以忽略不计,并且代码似乎像解释的那样运行。 该语言具有足够的抽象(如垃圾回收),可将工程重点放在核心功能上。
Now, with the introductions over, it’s time to dive into some code. We need an example that is simple enough so we can focus on methodology, but complicated enough to have substance. I find it’s easiest to take something from my day to day, so let’s build a server that processes currency-like transactions. Users will be able to check the balance for an account. Users will also be able to transfer funds from one account to another.
现在,随着介绍的结束,是时候深入研究一些代码了。 我们需要一个足够简单的示例,以便我们可以专注于方法论,但是要足够复杂,可以有实质内容。 我发现日常工作最简单,因此让我们构建一个服务器来处理类似货币的交易。 用户将能够检查帐户余额。 用户还可以将资金从一个帐户转移到另一个帐户。
We’ll keep things very simple. Our system will only have a single server. We’re also not going to deal with user authentication or cryptography. These are product features, whereas we want to focus on building the bulletproof software foundation.
我们将使事情变得非常简单。 我们的系统将只有一台服务器。 我们也不打算处理用户身份验证或加密。 这些是产品功能,而我们希望专注于构建防弹软件基础。
Complexity is the worst enemy of bulletproof code. One of the best ways to deal with complexity is divide and conquer — split the problem into smaller problems and solve each one separately. How do we split? We’ll follow the principle of separation of concerns. Every part should deal with a single concern.
复杂性是防弹代码的最大敌人。 解决复杂性的最佳方法之一是分而治之 -将问题分解为较小的问题,然后分别解决每个问题。 我们如何分割? 我们将遵循关注点分离的原则。 每个部分都应处理单个问题。
This goes hand in hand with the popular architecture of microservices. Our server will be comprised of services. Each service will be mandated a clear responsibility and given a well defined interface for communication with the other services.
这与流行的微服务体系结构齐头并进。 我们的服务器将包含服务。 每个服务将被赋予明确的责任,并具有与其他服务进行通信的明确定义的界面。
Once we’ve structured our server this way, we’ll be free to decide how each service is running. We can run all services together in the same process, make each service its own separate server and communicate via RPC, or split services to run on different machines.
这样构造服务器后,我们将可以自由决定每种服务的运行方式。 我们可以在同一过程中一起运行所有服务,使每个服务成为自己的独立服务器并通过RPC进行通信,或者拆分服务以在不同的计算机上运行。
Since we’re just starting out, we’ll keep things simple — all services will share the same process and communicate directly as libraries. We’ll be able to change this decision easily in the future.
由于我们才刚刚起步,因此我们将使事情变得简单—所有服务将共享相同的过程并直接作为库进行通信。 将来,我们将能够轻松更改此决定。
So which services should we have? Our server is a little too simple for splitting up, but to demonstrate this principle we’ll do so anyways. We need to respond to HTTP requests from clients for checking balances and making transactions. One service can deal with the client HTTP interface — we’ll call it PublicApi. Another service will own the state — the ledger of all balances —so we’ll call it StateStorage. The third service will connect the two and implement our business logic of the “contract” for changing balances. Since blockchains usually allow these contracts to be deployed by application developers, the third service will be charged with running them — we’ll call it VirtualMachine.
那么我们应该提供哪些服务? 我们的服务器拆分起来有点太简单了,但是为了演示这个原理,我们还是会这样做。 我们需要响应来自客户端的HTTP请求,以检查余额和进行交易。 一种服务可以处理客户端HTTP接口-我们将其称为PublicApi 。 另一个服务将拥有状态-所有余额的分类帐-因此我们将其称为StateStorage 。 第三项服务将两者联系起来,并实施我们的“合同”业务逻辑以更改余额。 由于区块链通常允许应用程序开发人员部署这些合同,因此第三项服务将负责运行它们-我们将其称为VirtualMachine 。
We’ll place the code for services in our project under /services/publicapi
, /services/virtualmachine
and /services/statestorage
.
我们将服务代码放置在项目中的/services/publicapi
, /services/virtualmachine
和/services/statestorage
。
When implementing services, we’ll want to be able to work on each one separately. Possibly even assign different services to different developers. Since services are dependent on one another and we’re going to parallelize work on their implementation, we’ll have to start by defining clear interfaces between them. Using this interface, we’ll be able to test a service individually and mock everything else.
实施服务时,我们希望能够分别处理每个服务。 甚至可能将不同的服务分配给不同的开发人员。 由于服务相互依赖,并且我们将并行执行它们的实现,因此我们必须首先定义它们之间的清晰接口。 使用此接口,我们将能够单独测试服务并模拟其他所有内容。
How can we define the interface? One option is to document it, but documentation tends to grow stale and out of sync with the code. We could use Go interface declarations. This makes sense, but it’s nicer to define the interface in a language agnostic way. Our server isn’t limited to Go only. We may decide down the road to reimplement one of the services in a different language more appropriate to its requirements.
我们如何定义接口? 一种选择是对其进行文档化,但是文档化往往会过时并且与代码不同步。 我们可以使用Go 接口声明。 这是有道理的,但是以一种与语言无关的方式定义接口会更好。 我们的服务器不仅限于Go。 我们可能会决定以更适合其要求的另一种语言重新实现其中一项服务的道路。
One approach is to use protobuf — a simple language-agnostic syntax by Google to define messages and service endpoints.
一种方法是使用protobuf ,即Google的一种与语言无关的简单语法,用于定义消息和服务端点。
Let’s start with StateStorage. We’ll structure state as a key-value store:
让我们从StateStorage开始。 我们将状态结构化为键值存储:
Although PublicApi is accessed via client HTTP, it’s still a good practice to give it a clear interface in the same way:
尽管可以通过客户端HTTP访问PublicApi ,但仍然可以采用相同的方式为其提供清晰的接口,这仍然是一个好习惯:
This will require us to define Transaction and Address data structures:
这将需要我们定义交易和地址数据结构:
We’ll place the .proto
definitions for services in our project under /types/services
and general data structures under /types/protocol
. Once the definitions are ready, they can be compiled to Go code. The benefit of this approach is that code which doesn’t meet the contract will simply not compile. Alternate methods would require us to write contract tests explicitly.
我们将服务的.proto
定义放在项目中的/types/services
并将常规数据结构放在/types/protocol
。 定义就绪后,就可以将其编译为Go代码。 这种方法的好处是,不符合约定的代码将无法编译。 替代方法将要求我们显式编写合同测试。
The complete definitions, generated Go files, and compilation instructions are available here. Kudos to Square Engineering for making goprotowrap.
完整的定义,生成的Go文件和编译说明可在此处获得 。 向Square Engineering制作goprotowrap表示敬意 。
Note that we’re not integrating an RPC transport layer yet, and calls between services will currently be regular library calls. When we’re ready to split services to different servers, we can add a transport layer like gRPC.
请注意,我们尚未集成RPC传输层,并且服务之间的调用当前将是常规库调用。 当我们准备将服务拆分到不同的服务器时,可以添加诸如gRPC的传输层。
Since tests are the key to bulletproof code, let’s discuss first which types of tests we’ll be writing:
由于测试是防弹代码的关键,因此让我们首先讨论将要编写的测试类型:
This is the base of the testing pyramid. We’ll test every unit in isolation. What’s a unit? In Go, we can define a unit to be every file in a package. If we have /services/publicapi/handlers.go
, we’ll place its unit test in the same package under /services/publicapi/handlers_test.go
.
这是测试金字塔的基础。 我们将单独测试每个单元。 什么是单位? 在Go中,我们可以将单位定义为包中的每个文件。 如果拥有/services/publicapi/handlers.go
,则将其单元测试放在/services/publicapi/handlers_test.go
下的同一程序包中。
It’s preferable to place unit tests in the same package as the tested code so the tests have access to non-exported variables and functions.
最好将单元测试与测试的代码放在同一包中,以便测试可以访问未导出的变量和函数。
The next type of tests has multiple names that all refer to the same thing — taking several units and testing them together. This is one level up the pyramid. In our case, we’ll focus on an entire service. These tests define the specifications for a service. For the StateStorage service for example, we’ll place them in /services/statestorage/spec
.
下一类测试具有多个名称,它们都指向同一事物–采取多个单元并将其一起测试。 这是金字塔上的一层。 在我们的案例中,我们将专注于整个服务。 这些测试定义了服务的规范。 例如,对于StateStorage服务,我们将其放置在/services/statestorage/spec
。
It’s preferable to place these tests in a different package than the tested code to enforce access through exported interfaces only.
最好将这些测试与测试的代码放在不同的包中,以仅通过导出的接口强制执行访问。
This is the top of the testing pyramid, where we test our entire system together with all services combined. These tests define the end-to-end specifications for the system, therefore we’ll place them in our project under /e2e/spec
.
这是测试金字塔的顶部,我们在这里测试整个系统以及所有组合的服务。 这些测试定义了系统的端到端规范,因此我们将它们放在项目中的/e2e/spec
。
These tests as well should be placed in a different package than the tested code to enforce access through exported interfaces only.
这些测试也应与测试代码放在不同的包中,以仅通过导出的接口强制进行访问。
Which tests should we write first? Do we start at the base and work our way up? Or go top-down? Both approaches are valid. The benefit of the top-down approach is for building specifications. It’s usually easier to reason about the specifications for the entire system first. Even if we split our system to services the wrong way, the system spec would remain the same. This would also help us understand that.
我们应该首先编写哪些测试? 我们是否从基础开始并逐步提高? 还是自上而下? 两种方法都是有效的。 自上而下方法的好处是用于构建规范。 首先通常更容易推断整个系统的规格。 即使我们以错误的方式拆分系统以提供服务,系统规格也将保持不变。 这也将帮助我们理解这一点。
The drawback of starting top-down is that our end-to-end tests will be the last ones to pass (only after the entire system has been implemented). This means they’ll remain failing for a long time.
自上而下开始的缺点是我们的端到端测试将是最后通过的测试(仅在整个系统实现之后)。 这意味着他们将长期处于失败状态。
Before writing tests, we need to consider whether we’re going to write everything bare-boned or use a framework. Relying on frameworks for dev dependencies is less dangerous than relying on frameworks for production code. In our case, since the Go standard library doesn’t have great support for BDD and this format is excellent for defining specs, we’ll opt for a framework.
在编写测试之前,我们需要考虑是要编写所有的内容还是使用框架。 依赖于开发依赖的框架比依赖于生产代码框架的危险要小。 在我们的案例中,由于Go标准库对BDD的支持不大,而且这种格式非常适合定义规范,因此我们将选择一个框架。
There are many excellent candidates like GoConvey and Ginkgo. My personal preference is Ginkgo with Gomega (terrible names, but what can you do) which use syntax like Describe()
and It()
.
有很多优秀的候选人,例如GoConvey和Ginkgo 。 我个人偏爱的是带有Gomega的 Ginkgo (名称很糟糕,但是您能做什么),它使用的语法类似于Describe()
和It()
。
So what does a test look like? Checking user balance:
那么测试是什么样的呢? 检查用户余额:
Since our server provides public HTTP interface to the world, we access this web API using http.Get. What about making a transaction?
由于我们的服务器向世界提供了公共HTTP接口,因此我们使用http.Get访问此Web API。 进行交易呢?
The test is very descriptive and can even replace documentation. As you can see above, we’re allowing accounts to reach a negative balance. This is a product choice. If this weren’t allowed, the test would reflect that.
该测试具有描述性,甚至可以代替文档。 正如您在上面看到的,我们允许帐户余额达到负数。 这是产品选择。 如果不允许这样做,测试将反映出来。
The complete test file is available here.
完整的测试文件可在此处获得 。
Now that we’re done with end-to-end tests, we go down the pyramid and implement service tests. This is done for every service separately. Let’s choose a service which has a dependency on another service, because this case is more interesting.
现在我们已经完成了端到端测试,接下来我们进行金字塔并实施服务测试。 分别为每个服务完成此操作。 让我们选择一个依赖于另一个服务的服务,因为这种情况更有趣。
We’ll start with VirtualMachine. The protobuf interface for this service is available here. Because VirtualMachine relies on service StateStorage and makes calls to it, we’re going to have to mock StateStorage in order to test VirtualMachine in isolation. The mock object will allow us to control StateStorage’s responses during the test.
我们将从VirtualMachine开始。 此服务的protobuf接口在此处可用。 因为VirtualMachine依赖于服务StateStorage并对其进行调用,所以我们将不得不模拟 StateStorage以便单独测试VirtualMachine 。 模拟对象将使我们能够在测试过程中控制StateStorage的响应。
How can we implement mock objects in Go? We can simply create a bare-boned stub implementation, but using a mocking library will also provide us with useful assertions during the test. My preference is go-mock.
我们如何在Go中实现模拟对象? 我们可以简单地创建一个简单的存根实现,但是使用模拟库也可以在测试过程中为我们提供有用的断言。 我的首选是模拟 。
We’ll place the mock for StateStorage in /services/statestorage/mock.go
. It’s preferable to place mocks in the same package as the mocked code to provide access to non-exported variables and functions. The mock is pretty much just boilerplate at this point, but as our services get more complicated, we may find ourselves adding some logic here. This is the mock:
我们将StateStorage的模拟放置在/services/statestorage/mock.go
。 最好将模拟与模拟代码放在同一包中,以提供对未导出的变量和函数的访问。 在这一点上,模拟几乎只是样板,但是随着我们的服务变得越来越复杂,我们可能会发现自己在这里添加了一些逻辑。 这是模拟的:
If you assign different services to different developers, it makes sense to implement the mocks first and share them between the team.
如果您将不同的服务分配给不同的开发人员,则首先实现模拟并在团队之间共享它们是有意义的。
Let’s get back to writing our service test for VirtualMachine. Which scenario should we test here exactly? It’s best to follow the interface for the service and design tests for each endpoint. We’ll implement the test for the endpoint CallContract()
with the method argument of "GetBalance"
first:
让我们回到为VirtualMachine编写服务测试。 我们应该在这里准确测试哪种情况? 最好遵循用于每个端点的服务和设计测试的界面 。 我们将首先使用"GetBalance"
的方法参数对端点CallContract()
进行测试:
Notice that the service we’re testing, VirtualMachine, receives a pointer to its dependency StateStorage in its Start()
method via simple dependency injection. That’s where we pass the mocked instance. Also notice on line 23 where we instruct the mock with how to respond when accessed. When its ReadKey
method is called, it should return the value 100
. We then verify that it indeed was called exactly once in line 28.
注意,我们正在测试的服务VirtualMachine通过简单的依赖注入在其Start()
方法中接收到一个指向其依赖StateStorage的指针。 那就是我们传递模拟实例的地方。 还要注意第23行,其中我们指示模拟程序如何在访问时做出响应。 调用其ReadKey
方法时,应返回值100
。 然后,我们验证它确实确实在第28行中被调用过一次。
These tests become the specifications for the service. The full suite for service VirtualMachine is available here. The suites for the other services are available here and here.
这些测试成为服务的规格。 完整的VirtualMachine服务套件可在此处获得 。 其他服务的套件可在此处和此处获得 。
Implementing the contract for method "GetBalance"
is a bit too simple, so let’s move instead to the slightly more complicated implementation for method "Transfer”
. The transfer contract needs to read the balances of both the sender and recipient, calculate their new balances, and write them back to state. The service integration test for it is very similar to the one we just implemented:
为"GetBalance"
方法实现合同有点太简单了,因此让我们转到"Transfer”
方法稍微复杂一点的实现。转让合同需要读取发送者和接收者的余额,计算其新余额,并将它们写回状态。针对它的服务集成测试与我们刚刚实施的测试非常相似:
We’ll finally get down to business and create a unit called processor.go
that contains the actual implementation for the contract. This is what our initial implementation turns out:
最后,我们开始做生意,创建一个名为processor.go
的单元,其中包含合同的实际实现。 这是我们最初的实现结果:
This satisfies the service integration test, but the integration test only contains a common case scenario. What about edge cases and potential failures? As you can see, any of the calls we make to StateStorage may fail. If we’re aiming for 100-percent coverage, we need to check all of these cases. A unit test would be a great place to do that.
这满足了服务集成测试,但是集成测试仅包含一个普通案例。 边缘情况和潜在故障呢? 如您所见,我们对StateStorage的任何调用都可能失败。 如果我们的目标是100%覆盖,则需要检查所有这些情况。 单元测试将是一个很好的选择。
Since we’re going to have to run the function multiple times with different inputs and mock settings to reach all flows, a table driven test would make this process a little more efficient. The convention in Go is to avoid fancy frameworks in unit tests. We can drop Ginkgo, but we should probably keep Gomega so our matchers look similar to our previous tests. This is the test:
由于我们将不得不使用不同的输入和模拟设置来多次运行该函数才能到达所有流,因此表驱动测试将使此过程效率更高。 Go中的约定是避免在单元测试中使用花哨的框架。 我们可以放下Ginkgo ,但是我们应该保留Gomega,这样我们的匹配器看起来就和我们之前的测试类似。 这是测试:
If you’re weirded out by the “Ω” symbol don’t worry, it’s just a regular variable name (holding a pointer to Gomega). You’re welcome to rename it to anything you like.
如果您对“Ω”符号感到困惑,请放心,它只是一个常规变量名(持有指向Gomega的指针)。 欢迎您将其重命名为任何喜欢的名称。
For the sake of time, we didn’t show the strict methodology of TDD where a new line of code would only be written to resolve a failing test. Using this methodology, the unit test and implementation for processTransfer()
would be implemented over several iterations.
为了节省时间,我们没有展示严格的TDD方法,即仅编写新的一行代码来解决失败的测试。 使用这种方法,将在多个迭代中实现processTransfer()
的单元测试和实现。
The full suite of unit tests in the VirtualMachine service is available here. The unit tests for the other services are available here and here.
VirtualMachine服务中的全套单元测试可在此处获得 。 其他服务的单元测试可在此处和此处获得 。
We’ve reached 100% coverage, our end-to-end tests are passing, our service integration tests are passing and our unit tests are passing. The code fulfills its requirements to the letter and is thoroughly tested.
我们已经达到100%的覆盖率,端到端测试通过,服务集成测试通过,单元测试通过。 该代码满足其要求,并经过全面测试。
Does that mean that everything is working? Unfortunately not. We still have several nasty bugs hiding in plain sight in our simple implementation.
这是否意味着一切正常? 不幸的是没有。 在我们的简单实现中,我们仍然隐藏着一些讨厌的错误。
All of our tests so far tested a single request being handled at any given time. What about synchronization issues? Every HTTP request in Go is handled in its own goroutine. Since these goroutines run concurrently, potentially on different OS threads on different CPU cores, we face synchronization problems. These are very nasty bugs that aren’t easy to track down.
到目前为止,我们所有的测试都测试了在任何给定时间处理的单个请求。 同步问题呢? Go中的每个HTTP请求都在其自己的goroutine中进行处理。 由于这些goroutine并发运行(可能在不同CPU内核上的不同OS线程上运行),因此我们面临同步问题。 这些是非常讨厌的错误,不容易找到。
One of the approaches for finding synchronization issues is stressing the system with many requests in parallel and making sure everything still works. This should be an end-to-end test because we want to test synchronization issues across our entire system with all services. We’ll place stress tests in our project under /e2e/stress
.
查找同步问题的方法之一是使系统并行处理许多请求,并确保一切正常。 这应该是一个端到端测试,因为我们想测试整个系统中所有服务的同步问题。 我们将压力测试放在项目中的/e2e/stress
。
This is what a stress test looks like:
这是压力测试的样子:
Notice that the stress test includes random data. It’s recommended to use a constant seed (see line 39) to make the test deterministic. Running a different scenario every time we run our tests isn’t a good idea. Flakiness by tests that sometimes pass and sometimes fail reduces developer confidence in the suite.
请注意,压力测试包含随机数据。 建议使用常量种子(请参阅第39行)来确定测试的确定性。 每次运行测试时都运行不同的场景不是一个好主意。 通过有时会通过有时失败的测试的松懈感会降低开发人员对该套件的信心。
The tricky part about stress tests over HTTP is that most machines have a hard time simulating thousands of concurrent users and opening thousands of concurrent TCP connections (you’ll see strange failures like “maximum file descriptors” or “connection reset by peer”). The code above tries to deal with this gracefully by limiting concurrent connections to batches of 200 and using IdleConnection Transport settings to recycle TCP sessions between batches. If this test is flaky on your machine, try reducing the batch size to 100.
关于通过HTTP进行压力测试的棘手部分是,大多数计算机很难模拟成千上万的并发用户并打开成千上万的并发TCP连接(您会看到奇怪的故障,例如“最大文件描述符”或“对等方重置连接”)。 上面的代码尝试通过将并发连接限制为200个批次并使用IdleConnection Transport设置来回收批次之间的TCP会话,来优雅地处理此问题。 如果您的计算机上的测试不稳定,请尝试将批次大小减小到100。
Oh no…the test fails:
哦,不……测试失败:
What happens here? StateStorage is implemented as simple in-memory map. It seems we’re trying to write to this map in parallel from different threads. It may seem at first that we should just replace the regular map with the thread-safe sync.map
but our problem runs a little deeper.
发生什么事了? StateStorage实现为简单的内存映射。 似乎我们正在尝试从不同线程并行写入此映射。 sync.map
看起来,我们应该只用线程安全的sync.map
替换常规映射,但是我们的问题更深了。
Take a look at the processTransfer()
implementation. It reads twice from the state and then writes twice. The set of reads and writes isn’t an atomic transaction, so if another thread changes the state after one thread read from it, we’re going to have data corruption. The fix is to make sure only one instance of processTransfer()
can run concurrently — you can see it here.
看一下processTransfer()
实现。 它从状态读取两次,然后写入两次。 读写集不是原子事务,因此,如果另一个线程从一个线程读取状态后更改了状态,则将导致数据损坏。 解决方法是确保只有processTransfer()
一个实例可以并发运行-您可以在此处查看 。
Let’s try to run the stress test again. Oh no, another failure!
让我们尝试再次运行压力测试。 哦,不,另一个失败!
This one requires a little more debugging to understand. It seems that it happens when a user tries to transfer an amount to themselves (the same user is both the sender and recipient). Looking at the implementation, it’s easy to see why this happens.
这需要更多的调试才能理解。 似乎是在用户尝试将金额转给自己时发生的(同一用户既是发送者又是接收者)。 查看实现,很容易看出为什么会发生这种情况 。
This one is a little disturbing. We’ve followed a TDD-like workflow and we still hit a hard business logic bug. How can that be? Isn’t our code tested against every scenario with 100% coverage?! Well…this bug is the result of a faulty product requirement, not a faulty implementation. The requirements for processTransfer()
should have clearly stated that if a user transfers an amount to themselves, nothing happens.
这个有点令人不安。 我们遵循了类似TDD的工作流程,但仍然遇到了严重的业务逻辑错误。 怎么可能? 我们的代码不是针对覆盖率100%的每种情况进行测试的吗? 好吧……这个错误是产品需求错误而不是实施错误的结果。 对processTransfer()
的要求应明确说明,如果用户向自己转账金额,则不会发生任何事情。
When we discover a business logic bug, we should always reproduce it first in our unit tests. It’s very easy to add this case to our table driven test from before. The fix is also simple — you can see it here.
当发现业务逻辑错误时,我们应该始终在单元测试中首先复制它。 从以前将这种情况添加到我们的表驱动测试中非常容易。 修复也很简单-您可以在此处查看 。
After adding the stress tests and making sure everything passes, is our system finally working as intended? Is it finally bulletproof?
添加压力测试并确保一切顺利之后,我们的系统最终是否按预期工作? 终于防弹了吗?
Unfortunately not.
不幸的是没有。
We still have some nasty bugs that even the stress test did not uncover. Our “simple” function processTransfer()
is still at risk. Consider what happens if we ever reach this line. The first write to state succeeded but the second fails. We’re about to return an error, but we’ve already corrupted our state by writing to it half-baked data. If we’re going to return an error, we’ll have to undo the first write.
我们仍然有一些讨厌的错误,甚至压力测试也没有发现。 我们的“简单”函数processTransfer()
仍然处于危险之中。 考虑一下,如果我们达到这条线会发生什么。 第一次写入状态成功,但是第二次失败。 我们将返回一个错误,但是我们已经通过写入半烘焙数据破坏了状态。 如果要返回错误,则必须撤消第一次写入。
This is a little more complicated to fix. The best solution is probably to change our interface altogether. Instead of having an endpoint in StateStorage named WriteKey
that we call twice, we should probably rename it to WriteKeys
— an endpoint that we’ll call once to write both keys together in one transaction.
修复起来有点复杂。 最好的解决方案可能是完全更改我们的界面 。 与其在StateStorage中有一个名为WriteKey
的端点被调用两次,我们不应该将其重命名为WriteKeys
我们将调用一次端点以将两个密钥一起写入一个事务中。
There’s a bigger lesson here: a methodical test suite is not enough. Dealing with complex bugs requires critical thinking and paranoid creativity by developers. It’s recommended to have someone else look at your code and perform code reviews in your team. Even better, open sourcing your code and encouraging the community to audit it is one of the best ways to make your code more bulletproof.
这里有一个更大的教训:有条不紊的测试套件是不够的。 处理复杂的错误需要开发人员进行批判性思考和偏执的创造力。 建议让其他人查看您的代码并在团队中执行代码审查。 更好的是,开源代码并鼓励社区对其进行审核,这是使代码更安全的最佳方法之一。
All the code in this article is available on Github as a single example repository. You’re welcome to use this project as a starter kit for your next server. You’re also welcome to review the repo and uncover more bugs and make it more bulletproof. Be creatively paranoid!
本文中的所有代码都可以在Github上作为单个示例存储库使用。 欢迎您将此项目用作下一个服务器的入门套件。 也欢迎您查看存储库并发现更多错误,并使其更加安全。 富有创意的偏执狂!
orbs-network/go-scaffoldgo-scaffold - Scaffold starter project in Go for a micro services based server with thorough testinggithub.com
orbs-network / go-scaffold go-scaffold-Go中的Scaffold入门项目,用于经过全面测试的基于微服务的服务器 github.com
Tal is a founder at Orbs.com — a public blockchain infrastructure for large scale consumer applications with millions of users. To learn more and read the Orbs white papers click here. [Follow on Telegram, Twitter, Reddit]
Tal是Orbs.com的创始人-Orbs.com是一个公共区块链基础架构,用于拥有数百万用户的大规模消费者应用程序。 要了解更多信息并阅读Orbs白皮书, 请单击此处 。 [关注电报 , Twitter , Reddit ]
Note: if you’re interested in blockchain — come contribute! Orbs is a fully open source project where anyone can participate.
注意:如果您对区块链感兴趣,请贡献力量! Orbs是一个完全开源的项目,任何人都可以参与。
翻译自: https://www.freecodecamp.org/news/how-to-write-bulletproof-code-in-go-a-workflow-for-servers-that-cant-fail-10a14a765f22/