The Death of Microservice Madness in 2018
by Dave Kerr McKinsey
2018年微服务疯狂至死
Microservices became a very popular topic in over the last couple of years1. 'Microservice madness' goes something like this:
在最近几年,微服务成为一个流行(热门)的话题。“疯狂的微服务”如下描述:
Netflix are great at devops. Netflix do microservices. Therefore: If I do microservices, I am great at devops.
Netflix公司擅长DevOps(开发运维一体化),Netflix公司实施微服务。因此(推论):如果要实施微服务,必须要擅长DevOps。
There are many cases where great efforts have been made to adopt microservice patterns without necessarily understanding how the costs and benefits will apply to the specifics of the problem at hand.
很多情况下,我们往往没有考虑到解决某个具体问题的成本和收益就付出巨大的努力去实施微服务。
I'm going to describe in detail what microservices are, why the pattern is so appealing, and also some of the key challenges that they present.
我将详细描述什么是微服务,为什么它那么有吸引力,它提出(出现)了什么关键挑战。
I'll finish with a set of simple questions might be valuable to ask yourself when you are considering whether microservices are the right pattern for you. The questions are at the end of the article.
我准备了一些问题,当你在考虑是否实施微服务(微服务是否适合你)的时候你可以用来帮你做判断。这些问题在文章的最后。
What are microservices, and why are they so popular?
Let's start with the basics. Here is how a hypothetical video sharing platform might be implemented, first in the form of a monolith (single large unit) and then in the form of microservices:
什么是微服务,它为什么那么流行。
让我们从基础谈起。这里假设我们要实现一个视频分享平台(虚构的)。第一个图表示一个单体的系统,第二图表示一个基于微服务的系统。
The difference between the two systems is that the first is a single large unit; a monolith. The second is a set of small, specific services. Each service has a specific role.
两个系统不一样的地方:第一个是大型的单体系统,第二个是一组小的、具体的服务组成的系统,每个服务都有特定的角色。
When the diagram is drawn at this level of detail, it is easy to see the appeal. There are a whole host of potential benefits:
通过上图这些细节的描述,很容易发现(微服务)确实吸引人。并且有一堆潜在的好处:
Independent Development: Small, independent components can be built by small, independent teams. A group can work on a change to the 'Upload' service without interfering with the 'Transcode' service, or even knowing about it. The amount of time to learn about a component is greatly reduced, and it is easier to develop new features.
独立开发:小的、独立的组件可以被小的、独立的团队构建。一个团队修改“下载服务”不会干扰到“转码服务”,甚至都不需要知道“转码服务”的(任何信息)。学习一个组件的时间成本可以有效的降低,且很容易去开发一个新的功能。
Independent Deployment: Each individual component can be deployed independently. This allows new features to be released with greater velocity and less risk. Fixes or features for the 'Streaming' component can be deployed without requiring other components to be deployed.
独立部署:每个独立的组件能够被独立部署。这使得新功能可以低风险和快速的发布。我们修改和调整“流服务”组件,且在不依赖其他组件部署的情况下,能够让它独立部署。
Independent Scalability: Each component can be scaled independently of each other. During busy periods when new shows are released, the 'Download' component can be scaled up to handle the increased load, without having to scale up every component, which makes elastic scaling more feasible and reduces costs.
独立伸缩:每个组件能够独立于其他组件进行伸缩。在新的视频发布的繁忙期间,“下载服务”组件能够进行扩展来适应(视频)下载压力,而不需要让每个组件都进行扩展,从而降低成本,使得伸缩能力更加可行。
Reusability: Components fulfil a small, specific function. This means that they can more easily be adapted for use in other systems, services or products. The 'Transcode' component could be used by other business units, or even turned into a new business, perhaps offering transcoding services for other groups.
复用性:组件是小的、具体的功能。这意味着它能更方便被其他系统使用。“转码服务”组件能被其他业务单元使用,甚至可以转变成一个新的业务为其他团队提供转码的服务。
At this level of detail, the benefits of a microservice model over a monolithic model seem obvious. So if that's the case - why is this pattern only recently in vogue? Where has it been all my life?
从以上的细节中我们知道,微服务模型比单体模型更具优势(好处)是显而易见的事实。如果是这样的话——为什么这种模式最近才流行起来?我都干什么去了?!
If this is so great, why hasn't it been done before?
There are two answers to this question. One is that it has to the best of our technical capabilities, and the other is that more recent technical advances have allowed us to take it to a new level.
微服务那么好,为什么之前没有人这么干呢?(言外之意是,为什么现在就可以这样干了?)
有两个答案来回答这个问题。其一,它要求我们尽最大的技术能力去做;其二,最近的技术进展把我们带到一个全新的水平。
When I started writing the answer to this question, it turned into a long description, so I'm actually going to separate it into another article and publish it a little later2. At this stage, I will skip the journey from single program to many programs, ignore ESBs and Service Orientated Architecture, component design and bounded contexts, and so on.
要探讨的这个问题不是三言两语能讲清楚的,所以,我将它拆出来作为一个独立的文章并在后续发表。在这一阶段,我将跳过此过程,不谈论ESBs和SOA架构、组件设计及限界上下文等。
Those who are interested can read more about the journey separately. Instead I'll say that in many ways we've been doing this for a while, but with the recent explosion in popularity of container technology (Docker in particular) and in orchestration technology (such as Kubernetes, Mesos, Consul and so on) this pattern has become much more viable to implement from a technical standpoint.
对此过程感兴趣的人可以单独阅读该文章。相反,我想说,之前我们在很多方面已经做了一些工作,但从技术的角度来讲,最近比较热门(呈现爆炸式发展)的容器技术(基于Docker的隔离技术)和编排技术(比如Kubernetes、Mesos和Consul等)使得这种模式(微服务)更加切实可行。
So if we take it as a given that we can implement a microservice arrangement, we need to think carefully about the should. We've seen the high-level theoretical benefits, but what about the challenges?
如果我们理所当然的认为我们能实现微服务管理的话,那么我们必须谨慎思考(我们必须谨慎思考,我们是否可以依赖这些现有的技术来实施微服务)。我们看见了(微服务)理论上有一个高的收益,但是它面临的挑战呢?
What's the problem with microservices?
If microservices are so great, what's the big deal? Here are some of the biggest issues I've seen.
微服务有哪些问题?
我们都说微服务那么牛叉,难道它没有大问题吗?(反语,其实微服务有一些问题的)
微服务有什么好炫耀的,这里我就列举下我所知的微服务面临一些大的问题。
Increased complexity for developers
Things can get a lot harder for developers. In the case where a developer wants to work on a journey, or feature which might span many services, that developer has to run them all on their machine, or connect to them. This is often more complex than simply running a single program.
This challenge can be partially mitigated with tooling3, but as the number of services which makes up a system increases, the more challenges developers will face when running the system as a whole.
给开发者带来更高的复杂度
微服务给开发者带来更多的困难。开发者想跨团队合作或者使用多个服务去实现一个功能的时候,他必须让这些服务都在其本地运行起来或者去远程连接它们。这比简单运行一个单体系统复杂多了。
这个挑战可以通过一些工具得以部分解决,但随着构成系统的服务数量的增加,在整个系统(多个服务)运行期,开发者将面临更大的挑战。
Increased complexity for operators
For teams who don't develop services, but maintain them, there is an explosion in potential complexity. Instead of perhaps managing a few running services, they are managing dozens, hundreds or thousands of running services. There are more services, more communication paths, and more areas of potential failure.
给运维人员带来更高的复杂度
对于那些不开发服务但运维服务的人来说,微服务有非常高的潜在复杂度。(运维人员)并不是管理几个运行中的服务,而是管理着几十、上百或数以千计的服务。更多的服务,带来更多的沟通,更多的潜在错误。
Increased complexity for devops
给DevOps带来更高的复杂度
Reading the two points above, it may grate that operations and development are treated separately, especially given the popularity of devops as a practice (which I am a big proponent of). Doesn't devops mitigate this?
阅读以上两点,可以知道运维与开发是独立的两个事情,把两者结合实践DevOps非常的流行(我非常赞同)。DevOps能减轻以上两点吗?
The challenge is that many organizations still run with separated development and operations teams - and a organization that does is much more likely to struggle with adoption of microservices.
问题是现在很多组织依然是把开发和运维分开——这样的组织对实施微服务更加抵触。
For organizations which have adopted devops, it's still hard. Being both a developer and an operator is already tough (but critical to build good software), but having to also understand the nuances of container orchestration systems, particularly systems which are evolving at a rapid pace, is very hard. Which brings me onto the next point.
对于已经实施了DevOps的组织来说,依然很困难。作为开发者又是运维者已经如此(困难)了(是构建优秀软件的关键),但又不得不去理解容器编排系统的细节差别,掌握这些快速演进的系统,非常的困难。让我想到了以下几点。
It requires serious expertise
它需要严谨的专业知识
When done by experts, the results can be wonderful. But imagine an organization where perhaps things are not running smoothly with a single monolithic system. What possible reason would there be that things would be any better by increasing the number of systems, which increases the operational complexity?
当由专家来实现时,结果可以很好。但是想像一下一个都无法保障单体系统运行顺畅的组织。有什么理由相信它能把服务不断增长、复杂度不断升高的系统管理好?
Yes, with effective automation, monitoring, orchestration and so on, this is all possible. But the challenge is rarely the technology - the challenge is finding people who can use it effectively. These skill sets are currently in very high demand, and may be difficult to find.
是的,通过有效的自动化、监控和编排等手段,这一切都是有可能的。但挑战很少是技术的——而是找到能熟悉使用这些技术的人才。当前,这些技术需求非常的高,但却很难找到。
Real world systems often have poorly defined boundaries
现实中的系统往往边界定义不清晰
In all of the examples we used to describe the benefits of microservices, we spoke about independent components. However in many cases components are simply not independent. On paper, certain domains may look bounded, but as you get into the muddy details, you may find that they are more challenging to model than you anticipated.
在所有说明微服务好处的例子中,我们都是在讲它的独立组件(的特点)。然而在很多情况下,组件并非简单独立的。理论上,某些领域看起来有边界,但是当你了解到混乱的细节时,它们比你预想的更具挑战性。
This is where things can get extremely complex. If your boundaries are actually not well defined, then what happens is that even though theoretically services can be deployed in isolation, you find that due to the inter-dependencies between services, you have to deploy sets of services as a group.
这是(微服务)变得极其复杂的地方。如果实际上你并没有良好的定义系统的边界,那么即使理论上服务之间能进行隔离部署,也是因为这些服务存在内部依赖,你必须同时部署一组服务。
This then means that you need to manage coherent versions of services which are proven and tested when working together, you don't actually have an independently deployable system, because to deploy a new feature, you need to carefully orchestrate the simultaneous deployment of many services.
这(一组服务一起部署)意味着你需要管理一致的服务版本,这些服务在一起运行的时候才能被验证和测试。你实际上并不拥有一个可独立部署的系统,因为当你部署一个新功能时,你需要小心编排多个服务同时部署。
The complexities of state are often ignored
状态的复杂性往往被忽略
In the previous example, I mentioned that a feature deployment may require the simultaneous rollout of many versions of many services in tandem. It is tempting to assume that sensible deployment techniques will mitigate this, for example blue/green deployments (which most service orchestration platforms handle with little effort), or multiple versions of a service being run in parallel, with consuming channels deciding which version to use.
在前面的例子中,我提到一个功能的部署需要同时部署多个版本的一系列服务。我们有理由相信合理的部署技术可以减轻这种情况。例如蓝绿部署方式(很多服务编排平台容易做到),或者一个服务多版本并行运行,通过消费渠道决定使用哪个版本。
These techniques mitigate a large number of the challenges if the services are stateless. But stateless services are quite frankly, easy to deal with. In fact, if you have stateless services, then I'd be inclined to consider skipping microservices altogether and consider using a serverless model.
如果这些服务是无状态的,那么这些技术是能够减轻大部分挑战的。但是,无状态的服务确实简单,容易处理。实际上,如果你的服务都是无状态的,你会倾向于考虑完全跳过微服务,直接考虑使用无服务器模式。
In reality, many services require state. An example from our video sharing platform might be the subscription service. A new version of the subscriptions service may store data in the subscriptions database in a different shape. If you are running both services in parallel, you are running the system with two schemas at once. If you do a blue green deployment, and other services depend on data in the new shape, then they must be updated at the same time, and if the subscription service deployment fails and rolls back, they might need to roll back too, with cascading consequences.
事实如此,很多服务是需要状态的。比如说我们的视频分享平台的订阅服务。一个新版本的订阅服务可能使用不一样数据存储模式。如果你并行运行这两个服务,相当于同时运行了两种模式的系统。如果你采用蓝绿部署方式,另一个服务依赖新的数据存储模式,两个服务的数据必须被同时更新,其中有一个失败了,另一个必须进行级联回滚。
Again, it might be tempting to think that with NoSQL databases these issues of schema go away, but they don't. Databases which don't enforce schema do not lead to schemaless systems - they just mean that schema tends to be managed at the application level, rather than the database level. The fundamental challenge of understanding the shape of your data, and how it evolves, cannot be eliminated.
进一步,你可能会想到采用NoSQL数据库,这些问题会消失,但并非如此。不强制模式的数据库不会导致无模式的数据库——只是将模式的管理放到了应用程序层面去处理,而非数据库层面。理解数据结构及其是如何演变的根本性挑战是不能消除的。
The complexities of communication are often ignored
通讯的复杂度往往被忽略
As you build a large network of services which depend on each other, the likelihood is that there will be a lot of inter-service communication. This leads to a few challenges. Firstly, there are a lot more points at which things can fail. We must expect that network calls will fail, which means when one service calls another, it should expect to have to retry a number of times at the least. Now when a service has to potentially call many services, we end up in a complicated situation.
当你为一些彼此依赖的服务构建大型的通讯网络时,可能会有大量的服务通讯。这会带来一些挑战。首先,可能有很多会出现故障的结点。我们必须预料到网络通讯可能会出现故障,这意味着当一个服务调用另一个服务时,可能需要至少多尝试几次。现在,在一个服务不得不潜在的调用很多服务的情况下,我们陷入了复杂的局面。
Image a user uploads a video in the video sharing service. We might need to run the upload service, pass data to the transcode service, update subscriptions, update recommendations and so on. All of these calls require a degree of orchestration, if things fail we need to retry.
想象一下,用户通过视频分享服务上传一个视频。我们需要运行上传服务,通过转码服务传递数据,更新订阅和推荐信息等。所有这些请求都需要一定程度的编排,如果出现故障必须重试。
This retry logic can get hard to manage. Trying to do things synchronously often ends up being untenable, there are too many points of failure. In this case, a more reliable solution is to use asynchronous patterns to handle communication. The challenge here is that asynchronous patterns inherently make a system state full. As mentioned in the previous point, state full systems and systems with distributed state are very hard to handle.
这个重试的逻辑很难管理。想同步处理很多事情往往陷入失败,有太多的结点出现故障。在这种情况下,一个更可靠的解决方案是通过异步的通讯模式。挑战是异步模式天生具有状态。根据前面提到的观点,有状态的分布式系统是难以把控的。
When a microservice system uses message queues for intra-service communication, you essentially have a large database (the message queue or broker) glueing the services together. Again, although it might not seem like a challenge at first, schema will come back to bite you. A service at version X might write a message with a certain format, services which depend on this message will also need to be updated when the sending service changes the details of the message it sends.
当一个微服务系统使用消息队列作为内部服务通讯的时候,你基本上将一个大型的数据库(消息队列与代理)和服务耦合在一起。再者,即使一开始它不像是个挑战,模式也将回头咬你一口。一个X版本的服务通过确定的格式编写消息,该消息的细节发生改变后,依赖此消息的服务必须做出修改。
It is possible to have services which can handle messages in many different formats, but this is hard to manage. Now when deploying new versions of services, you will have times where two different versions of a service may be trying to process messages from the same queue, perhaps even messages sent by different versions of a sending service. This can lead to complicated edge cases. To avoid these edge cases, it may be easier to only allow certain versions of messages to exist, meaning that you need to deploy a set of versions of a set of services as a coherent whole, ensuring messages of older versions are drained appropriately first.
可能存在可以处理多种消息格式的服务,但是它很难管理。假设部署服务的一个新版本,将会出现两个不同版本的服务同时处理来自同一个消息队列的消息,甚至消息来源于多个不同版本的消息服务。这将导致边缘复杂的情况。为了避免这种边缘情况,容易解决的方式是只允许某些消息存在。这意味着你必须把一组版本的服务作为一个整体来部署,确保旧版本的消息能被适当的排除。
This highlights again that the idea of independent deployments may not hold as expected when you get into the details.
再次强调,当你考虑到细节的时候,独立部署可能不是你期待的那样。
Versioning can be hard
版本控制可能是困难的
To mitigate the challenges mentioned previously, versioning needs to be very carefully managed. Again, there can be a tendency to assume that following a standard such as semver[4] will solve the problem. It doesn't. Semver is a sensible convention to use, but you will still have to track the versions of services and APIs which can work together.
去缓解前面提到的挑战,版本控制必须得到谨慎的管理。再者,可能有一种趋势提到,通过类似于Semver(语义化版本控制规范)这样的标准可以解决问题。其实不然。Semver是一个明智的常规方案,但你依然要跟踪服务的版本以确保API能够与之一起使用。
This can get very challenging very quickly, and may get to the point where you don't know which versions of services will actually work properly together.
这(多个版本服务共存问题)很快变得非常具有挑战性,可能你都无法理清楚哪些版本的服务实际上可以一起协同工作。
Managing dependencies in software systems is notoriously hard, whether it is node modules, Java modules, C libraries or whatever. The challenges of conflicts between independent components when consumed by a single entity are very hard to deal with.
众所周知,在软件系统中管理依赖关系非常困难,无论是node和Java模块化还是诸如C这样的类库。当被一个实体消费的时,独立组件之间冲突带来的挑战性是非常难以解决的。
These challenges are hard to deal with when the dependencies are static, and can be patched, updated, edited and so on, but if the dependencies are themselves live services, then you may not be able to just update them - you may have to run many versions (with the challenges already described) or bring down the system until it is fixed holistically.
当依赖被固化后解决它是很具挑战性的,但也许可以被修复,更新和编辑等。当依赖都是它们自己运行中的服务时,你可能无法直接去更新它们——你可能要运行多个版本的服务(挑战已经详细描述过了)或者关闭服务直到整体修复好。
Distributed Transactions
分布式事务
In situations where you need transaction integrity across an operation, microservices can be very painful. Distributed state is hard to deal with, many small units which can fail make orchestrating transactions very hard.
在跨服务操作的情况下保持事务的完整性,微服务模式下可能会非常难受。分布式下的状态难以控制,一些小的单元可能难以进行事务的编排。
It may be tempting to attempt to avoid the problem by making operations idempotent, offering retry mechanisms and so on, and in many cases this might work. But you may have scenarios where you simply need a transaction to fail or succeed, and never be in an intermediate state. The effort involved in working around this or implementing it in a microservice model may be very high.
采用幂等操作的方式来规避问题可能是诱人的,比如提供重试的机制等,在很多情况下都可行。但可能有些场景下你只需要简单的事务成功或失败就好,永远不处于中间态。工作量和实现难度都非常高。
Microservices can be monoliths in disguise
微服务可能是个伪装(变相)的单体
Yes, individual services and components may be deployed in isolation, however in most cases you are going to have to be running some kind of orchestration platform, such as Kubernetes. If you are using a managed service, such as Google's GKE5 or Amazon's EKS6, then a large amount of the complexity of managing the cluster is handled for you.
是的,个别服务和组件可能可以进行隔离部署,然而在大多数情况下你必须运行在诸如Kubernetes这样的编排平台中。当你使用一个诸如谷歌的GKE,亚马逊的EKS的管理服务的时候,你将面临巨大的集群管理的复杂性。
However, if you are managing the cluster yourself, you are managing a large, complicated, mission critical system. Although the individual services may have all of the benefits described earlier, you need to very carefully manage your cluster. Deployments of this system can be hard, updates can be hard, failover can be hard and so on.
然而,当你自己管理集群服务的时候,你相当于管理一个大型的、复杂的关键任务系统。即使个别服务可能具有之前提到过的一些好处,你也必须非常谨慎的管理这些集群服务。在这样的系统部署和更新可能是困难的,故障转移可能也难以做到。
In many cases the overall benefits are still there, but it is important not to trivialise or underestimate the additional complexity of managing another big, complex system. Managed services may help, but in many cases these services are nascent (Amazon EKS was only announced at the end of 2017 for example).
在很多情况下,整体收益仍然存在,但是也不能淡化和低估管理一个庞大复杂系统所带来的额外复杂性。管理类的服务可能所有帮助,但大部分情况这些服务都近期才出现。(例如亚马逊的EKS系统2017年底才对外宣布)
The Death of Microservice Madness!
Avoid the madness by making careful and considered decisions. To help out on this I've noted a few questions you might want to ask yourself, and what the answers might indicate:You can download a PDF copy here: microservice-questions.pdf
微服务疯狂至死
深思熟虑后才做出决定能避免疯狂的行为。为帮助大家谨慎的做决定我提出了一些问题,问题的答案可能给你启示:你可以从这下载PDF文件。
Final Thoughts: Don't Confuse Microservices with Architecture
I've deliberately avoided the 'a' word in this article. But my friend Zoltan made a very good point when proofing this article (which he has contributed to).
最后的思考:不要混淆的认为微服务是一种架构
我故意在这篇文章中避免“架构”。但我的好友Zoltan在验证这篇文章的时候,提过一个很好的观点。
There is no microservice architecture. Microservices are just another pattern or implementation of components, nothing more, nothing less. Whether they are present in a system or not does not mean that the architecture of the system is solved.
没有微服务架构。微服务只不过是组件的另一种模式或者实现,不多也不少。无论它们现在是否在系统中都不会意味着系统的架构问题得到解决。
Microservices relate in many ways more to the technical processes around packaging and operations rather than the intrinsic design of the system. Appropriate boundaries for components continues to be one of the most important challenges in engineering systems.
微服务在很多方面都涉及到打包和操作的技术过程,而不是系统的内在设计。适当的组件边界依然是工程系统中最具挑战的问题之一。
Regardless of the size of your services, whether they are in Docker containers or not, you will always need to think carefully about how to put a system together. There are no right answers, and there are a lot of options.
不管你服务的数量是多少,无论它们是否采用Docker容器技术,你依然需要谨慎的思考如何让系统一起工作。没有固定的正确答案,却有很多选择。
I hope you found this article interesting! As always, please do comment below if you have any questions or thoughts.
希望你对此文章感兴趣!一如既往,你有什么问题或者想法可以在下方进行评论。
原文链接:The Death of Microservice Madness in 2018