负载测试中极限负载_为什么您的负载测试会隐藏问题并导致生产崩溃

负载测试中极限负载

What do you do to make sure your website, backend, or API will scale to higher levels of traffic so that it doesn’t crash on the biggest day of the year for your business?

您如何确保您的网站,后端或API可以扩展到更高的流量水平,以使其不会在一年中最大的一天崩溃?

Why of course you do a load test!

当然为什么要进行负载测试!

Pick any tool, paste the URL, punch in a good number of users, and wait for a pass or fail. If it fails, fix the problems and run it again. Once it passes you’re all set.

选择任何工具,粘贴URL,打入大量用户,然后等待通过或失败。 如果失败,请修复问题并再次运行。 一旦通过,就一切就绪。

Unfortunately, this is a recipe for being unprepared and having the system crash on the biggest day of the year for your business.

不幸的是,这是您准备不足并且在一年中最大的一天系统崩溃的原因。

Not all load tests are equal. Or even remotely valid.

并非所有负载测试都相等。 甚至远程有效。

It’s very easy to create a load test that is completely detached from the real traffic conditions you will face and gives you misleading results.

创建一个完全独立于您将要面对的实际交通状况的负载测试,并给您造成误导的结果,这非常容易。

It would be like having someone drive a car around a race track, and using their performance there to evaluate whether they will be good at navigating an unfamiliar city in a foreign language with stop and go traffic and drivers honking at you from all sides.

这就好比有人在赛道上开车,然后利用他们的表现来评估他们是否会善于用陌生的语言通行不停的通行,司机会从各个角度向您鸣叫。

Your live traffic can be chaotic and have a very different impact than what you see in a simplified artificial scenario.

与简化的人工场景中看到的情况相比,您的实时流量可能会很混乱,并且会产生非常不同的影响。

Strangely when it comes to testing backend performance this is often accepted as a good approach!

奇怪的是,在测试后端性能时,这通常被认为是一种好方法!

I’ve seen these situations with several websites, typically running on AWS clusters and handling millions of dollars and/or millions of visitors.

我已经在几个网站上看到了这些情况,这些网站通常在AWS集群上运行并处理数百万美元和/或数百万的访问者。

On all of these it was important to know when and how the system will break down so that can be prevented. However I saw load tests being run that would only provide misleading information. There was no way to fix the problems with that approach.

在所有这些方面,重要的是要知道系统何时以及如何崩溃,这样才能避免。 但是,我看到正在运行的负载测试只会提供误导性信息。 没有办法解决这种方法的问题。

With a few changes to the load testing I was able to produce more realistic results that helped optimize the right parts and know the limits that can’t be changed in the short term. Those limits can be problematic but at least it gives the business valuable information to plan around.

通过对负载测试进行一些更改,我可以得出更切合实际的结果,从而有助于优化正确的部件并了解短期内无法更改的限制。 这些限制可能会带来问题,但至少可以为业务提供有价值的信息以进行计划。

Read on to see how I found effective solutions through targeted load testing!

继续阅读以了解如何通过目标负载测试找到有效的解决方案!

实际负载测试 (Real World Load Tests)

The controlled conditions of a load test can be very different from the real world results you will experience.

负载测试的受控条件可能与您将体验到的实际结果大不相同。

If you don’t do this right you may allocate way more servers than you need, you may spend months optimizing things that don’t matter and delay new development, or in the worst case you may miss the real bottlenecks and have crashes when you hit high traffic levels — the worst possible time.

如果您没有正确执行此操作,则可能会分配超出所需数量的服务器,可能会花费数月的时间来优化无关紧要的内容并延迟新开发,或者在最坏的情况下,您可能会错过真正的瓶颈并当机达到高流量-可能是最糟糕的时间。

All of those come with real costs. Having a few extra servers is not the worst outcome. The cost may be small relative to the engineering time being spent on keeping the system running smoothly. The others could be a real problem though!

所有这些都有实际成本。 拥有几个额外的服务器并不是最糟糕的结果。 相对于使系统保持平稳运行所花费的工程时间而言,成本可能很小。 其他人可能是一个真正的问题!

How do you avoid that? It depends on the situation. I’ll break down where load testing goes wrong, using a couple of examples I’ve worked on.

您如何避免这种情况? 这取决于实际情况。 我将使用几个示例来分析负载测试出错的地方。

Case #1: I worked with a large eCommerce site that had a complex interaction between several systems. It sometimes got overloaded and led to missing millions of dollars in sales.

案例1:我在一个大型电子商务网站上工作,该网站在多个系统之间具有复杂的交互作用。 它有时会变得超载,并导致数百万美元的销售额损失。

A full load test outside of production wasn’t be possible due to all the components involved. So they had to be done in the middle of the night to test the production system while avoid downtime and lost sales.

由于涉及所有组件,因此无法在生产之外进行全负荷测试。 因此,必须在半夜完成测试生产系统,同时避免停机和销售损失。

An outside contractor was running the load tests and I noticed several issues that needed to be corrected so it produced valid results. A proper load test helped to identify the limitations of those external systems and make sure they didn’t get overloaded. It also showed that there was no need to optimize other parts of the stack that had more than enough capacity, and helped to avoid wasted development effort.

一个外部承包商正在运行负载测试,我注意到有一些问题需要纠正,以便产生有效的结果。 适当的负载测试有助于确定那些外部系统的局限性,并确保它们不会过载。 它还表明,无需优化堆栈中具有足够容量的其他部分,并且可以避免浪费的开发工作。

Case #2: Another site involved exhaustive testing to prepare in advance for a worldwide event that brought a lot of traffic.

案例2:另一个站点进行了详尽的测试,以便为可能带来大量流量的全球性活动提前做好准备。

Since I didn’t have data from prior years and there was only one chance to get it right, it took a carefully designed load test to make sure all the minor issues were optimized, the caching was layered and scaled correctly, and architectural changes to remove bottlenecks were successful in increasing capacity.

由于我没有前几年的数据,只有一次机会可以正确处理,因此进行了精心设计的负载测试,以确保所有次要问题均得到优化,缓存已正确分层和缩放,并且架构发生了变化消除瓶颈成功提高了产能。

Prior to my involvement, there were several unanticipated issues that had led to the website crashing in the middle of the previous event. That created a very stressful situation as the team attempted to stabilize the servers while keeping the website up to date with what was going on!

在我参与之前,有一些无法预料的问题导致网站在上一个事件的中期崩溃。 当团队试图稳定服务器同时使网站保持最新状态时,这造成了很大的压力!

After doing the load tests with my guidance and resolving the issues identified, there was not a single issue with the system on the day of the event. In fact some of the backend servers were running so smoothly they barely seemed to show any activity even with high traffic!

在我的指导下进行了负载测试并解决了已确定的问题之后,活动当天系统没有一个单一的问题。 实际上,某些后端服务器运行非常顺畅,即使流量很高也几乎看不到任何活动!

The goal was to be over-prepared so nothing could go wrong and we were successful in doing that (although the data collected from that day could then be matched up to the load test to get an even better fine-tuning of the capacity).

我们的目标是准备过度,这样就不会出错,并且我们成功地做到了这一点(尽管随后可以将当天收集的数据与负载测试进行匹配,从而对容量进行更好的微调)。

常见的负载测试失败 (Common Load Test Failures)

Here are some load test issues I’ve seen. When these happen they hide the real bottlenecks and produce results that are either too optimistic (showing that it’s fine when it actually crashes in production, with no sign of where the problem is) or too pessimistic (crashing in the test with a load that’s a lot smaller than actual traffic, giving no information on what the actual limits are).

这是我见过的一些负载测试问题。 当这些情况发生时,它们掩盖了真正的瓶颈并产生了过于乐观的结果(表明它在生产中实际崩溃时很好,没有迹象表明问题出在哪里)或过于悲观的(在测试中由于负载而崩溃)比实际流量小得多,没有提供有关实际限制的信息)。

Ignoring caching, repeating the exact same action

忽略缓存,重复完全相同的操作

Let’s say you have an eCommerce site that involves customers searching for what they want to start the purchase process. There is some work done to process the search and then results are cached.

假设您有一个电子商务网站,其中涉及客户搜索他们想要开始购买过程的内容。 完成了一些处理搜索的工作,然后将结果缓存。

If you capture that user journey and repeat it 1,000,000 times, guess what — you’re just hitting the cache 1,000,000 times. Fail!

如果您捕获了该用户旅程并重复了1,000,000次,请猜猜是什么-您正在缓存1,000,000次。 失败!

Your load test will tell you the site can handle unlimited traffic, while it keeps crashing in production. Why?

您的负载测试将告诉您该站点可以处理无限量的流量,同时保持生产崩溃。 为什么?

In real life you have customers searching for a lot of different things. Some will be more common and see a lot of cache hits. Some will be very rare and almost never touch the cache.

在现实生活中,您有客户在寻找很多不同的东西。 一些会更常见,并且会看到很多缓存命中。 有些将是非常罕见的,几乎永远不会接触到缓存。

A proper load test will simulate the non-cached hits. The cache probably has very high performance (although we’ll address that later). The backend behind the cache is what you want to test first. And you want to see how many requests actually get that far.

适当的负载测试将模拟未缓存的匹配。 缓存可能具有很高的性能(尽管我们稍后会解决)。 缓存背后的后端是您首先要测试的。 您想知道实际上有多少个请求。

To do this you need some idea of how many different varieties of searches there are. You can get this by analyzing traffic, and confirm it by running a load test that shows the backend activity is similar to the production site.

为此,您需要对有多少种不同的搜索类型有所了解。 您可以通过分析流量并通过运行负载测试来确认它,该负载测试表明后端活动与生产站点类似。

Once you’ve done this, a main goal for the load test will be to increase the variety of searches gradually and see where you hit system bottlenecks.

完成此操作后,负载测试的主要目标将是逐渐增加搜索的种类,并查看遇到系统瓶颈的地方。

You can do this by creating a bunch of different searches in a load testing tool. In one case I wrote my own simple script that recursively looped through several variants to produce as many unique searches as I needed, and hit the website with those.

您可以通过在负载测试工具中创建一系列不同的搜索来做到这一点。 在一种情况下,我编写了自己的简单脚本,该脚本以递归方式遍历多个变体,以产生所需的尽可能多的唯一搜索,并使用这些命中网站。

This script was sufficient since I was more concerned with how many non-cached hits could be handled instead of how many repeat requests could be handled, which is where traditional load testing tools are focused.

该脚本足够了,因为我更关心的是可以处理多少非缓存的匹配,而不是可以处理多少重复请求,而这正是传统负载测试工具的重点。

Why this is important: as traffic increases you will see more hits served out of the cache, but you will also see more unique long tail searches. You need to prepare for that. If there’s a major promotion for a specific item you may see a surge of traffic that is mostly focused on a few narrow searches that get cached efficiently. Either way, make sure your test is measuring the site activity patterns of your anticipated traffic!

为何如此重要:随着流量的增加,您会看到缓存中投放了更多匹配,但您还会看到更多独特的长尾搜索。 您需要为此做准备。 如果某项商品有重大促销活动,您可能会看到流量激增,主要集中在一些可以有效缓存的狭窄搜索上。 无论哪种方式,请确保您的测试正在衡量预期流量的站点活动模式!

Bypassing caching completely

完全绕过缓存

The opposite effect can be a problem too. If you just skip the cache, or somehow make every request unique so it can’t be cached, you will quickly exceed the actual backend activity generated by real traffic.

相反的效果也可能是一个问题。 如果您只是跳过缓存,或者以某种方式使每个请求都是唯一的以便无法缓存,那么您将很快超过实际流量所产生的实际后端活动。

Your load test might crash with 1/10 of the users you see in production, causing confusion. Instead you want the right balance where you get close to the number of non-cached requests that the backend has to handle.

您的负载测试可能会因生产中看到的1/10用户而崩溃,从而引起混乱。 取而代之的是,您需要适当的平衡,以便接近后端必须处理的非缓存请求的数量。

Why this is important: as above, it’s not realistic! You might get lucky and actually fix real problems. But most likely if your test environment does not come close to production you will end up with a lot of wasted effort.

为什么这很重要:如上所述,这是不现实的! 您可能会很幸运,并且实际上可以解决实际问题。 但是最有可能的是,如果您的测试环境无法接近生产环境,您将付出大量的努力。

Complete overwhelm

完全不知所措

On a site that regularly saw 2–3,000 concurrent users, and 5,000 customers at a time was a big day, the first load test I observed was configured to run 100,000 parallel processes.

在一个经常看到2–3,000个并发用户,一次又有5,000个客户的站点上,一天是一天的忙碌,我观察到的第一个负载测试被配置为运行100,000个并行进程。

Not surprisingly the results weren’t useful.

毫不奇怪,结果没有用。

Once it was dialled back to a more realistic level, and adjusted to follow real life caching patterns, the results highlighted the real bottlenecks and led to new conversations about architectural changes and product design fixes. Those were very effective in making the site more stable, and didn’t take long to implement.

一旦将其拨回到更现实的水平,并按照实际的缓存模式进行了调整,结果就突出了真正的瓶颈,并引发了有关体系结构更改和产品设计修复的新讨论。 这些措施对于使站点更加稳定非常有效,并且无需花费很长时间即可实施。

Why this is important: unless you’re running ads in the Super Bowl, you want realistic traffic. Can you handle 2–5x what you’re used to? That’s where you’ll get the most useful results. In rare cases you might need to prepare for a larger burst of traffic once you have done that optimization.

为何如此重要:除非您在超级碗中投放广告,否则您需要现实的流量。 您可以应付惯用的2到5倍吗? 在那您将获得最有用的结果。 在极少数情况下,完成优化后,您可能需要准备更大的流量。

Testing the easy parts, not the hard ones

测试容易的部分,而不是难的部分

Imagine this system:

想象一下这个系统:

Your team is responsible for the front end and middle part. So you test them exhaustively, do a little optimization, and find that they are ok with 10x your regular traffic. Great!

您的团队负责前端和中间部分。 因此,您需要对它们进行详尽的测试,进行一些优化,然后发现它们的正常流量是正常流量的10倍。 大!

Only one problem: the external backend crashes when you have 10% more traffic than a normal busy day. You aren’t really prepared for a higher load.

唯一的问题:当您的流量比正常忙碌的日子多10%时,外部后端崩溃。 您并没有真正为更高的负载做好准备。

But since you don’t control it, should you test it? Yes! It’s better to know the limitations, find ways to work around those if you can’t change them, and then test the solutions to see if they actually prevent a crash.

但是由于您无法控制它,您应该对其进行测试吗? 是! 最好了解这些限制,找到无法解决的限制,然后测试解决方案以查看它们是否真正防止崩溃。

I’ll write another article showing how I was able to stabilize a very unreliable system that couldn’t be changed in the short term, using a very simple fix!

我将写另一篇文章,展示如何使用一个非常简单的修补程序来稳定一个短期内无法更改的非常不可靠的系统!

Why this is important: even if you can’t reconfigure or optimize a system that’s limiting you, knowing what breaks it means you can put in smarter limits and workarounds. Then you actually can handle a lot more traffic.

为什么这很重要:即使您无法重新配置或优化限制您的系统,也要知道中断的原因意味着您可以采用更智能的限制和解决方法。 然后,您实际上可以处理更多流量。

Over-reliance on one load testing tool

过度依赖一种负载测试工具

Most load testing tools give you an easy way to repeat a sequence of requests many times across multiple different sources. As we saw, that often results in a load test that is not valid for your real world traffic.

大多数负载测试工具为您提供了一种简单的方法,可以在多个不同的源上多次重复一系列请求。 如我们所见,这通常导致负载测试对您的实际流量无效。

Sometimes you can configure a pattern of requests in the tool that is close enough to realistic. In one case we ended up with a suite of 5–8 different tests that we ran at the same time. The result was exactly what we needed to validate architectural changes that allowed 3–5x as much traffic as before, which was exactly what was needed.

有时,您可以在工具中配置与实际情况足够接近的请求模式。 在一种情况下,我们最终获得了一组同时运行的5–8种不同的测试。 结果正是我们验证架构更改所需的内容,该更改所需的流量是以前的3-5倍。

In another case that I mentioned earlier I just wrote a script that made curl requests. I could easily scale it up or down to do accurate load tests (or single-handedly crash the site).

在前面提到的另一种情况下,我只是编写了一个执行curl请求的脚本。 我可以轻松地按比例放大或缩小它,以进行准确的负载测试(或单手破坏站点)。

Every tool has it’s limitations. Don’t let that prevent you from running the tests you need! You may need to run several tests at the same time in one tool, or use several tools to fully understand what you’re dealing with.

每个工具都有其局限性。 不要让这种情况妨碍您运行所需的测试! 您可能需要使用一个工具同时运行多个测试,或者使用多个工具来完全了解您要处理的内容。

Why this is important: use the tool that allows you to do a good test. Don’t just do the test that the tool allows you to do.

为何如此重要:请使用允许您进行良好测试的工具。 不要仅仅测试工具允许您执行的测试。

Not testing the caching system

不测试缓存系统

An easy way to avoid the cache interfering in your load tests (as mentioned above) is to either direct your requests to servers that are behind the cache so it’s not in play, or make requests that can’t be cached.

避免缓存干扰您的负载测试的一种简单方法(如上所述)是将您的请求定向到位于缓存后面的服务器,使其不起作用,或者发出无法缓存的请求。

However you still need to make sure the cache is working correctly. Using a separate test or a modified version of that test that verifies the backend load, you can confirm that the cache hitrate is what you expect when you have repeated traffic.

但是,您仍然需要确保缓存正常工作。 使用单独的测试或该测试的修改版本来验证后端负载,您可以确认高速缓存命中率是重复流量时的期望值。

And you can make sure that request parameters won’t break the cache. On one site I worked with, there was an issue before my involvement where a surge in traffic included a new URL parameter passed by Facebook.

并且您可以确保请求参数不会破坏缓存。 在我合作过的一个网站上,我参与之前存在一个问题,即流量激增包括Facebook传递的新URL参数。

The cache was not properly configured by this so it treated all traffic as uncacheable and had a hit rate near 0. This had to be hotfixed in production, with a very high level of traffic on the site, to bring the site back up.

缓存未正确配置,因此将所有流量都视为不可缓存,并且命中率接近0。必须在生产中对其进行修复,以使网站上的流量很高,才能恢复站点。

If you have actions that trigger a full or partial cache flush, those should be tested too. You want to make sure that suddenly refreshing a significant portion of the cache during a high traffic period won’t overload your servers.

如果您有触发全部或部分缓存刷新的操作,则也应该对其进行测试。 您想确保在高流量期间突然刷新缓存的重要部分不会使服务器过载。

Constant cache flushing may take away most of the benefits of the caching system. But you can test a high frequency of these actions and see what the effect is. A system that I tested was able to handle cache flushing about 100x more than we realistically expected so it was clear this wouldn’t be a problem and we didn’t have to spend time on this.

不断进行缓存刷新可能会剥夺缓存系统的大多数优势。 但是您可以测试这些动作的高频率并查看其效果。 我测试过的系统能够处理缓存刷新的速度比我们实际预期的要高100倍左右,因此很显然这不是问题,我们也不必花时间在上面。

Sidebar: while you’re doing this you might also want to manually check that the cache flushing actually pushes through updates. If you have several layers of caching, they may continue serving old content even after you flush a lower level cache.

补充工具栏:在执行此操作时,您可能还需要手动检查缓存刷新是否确实推送了更新。 如果您有多层缓存,即使刷新了较低级别的缓存,它们也可能继续提供旧内容。

Why this is important: if you have fairly consistent traffic, you can monitor and tune the cache in production to get good results. If you are preparing for a large surge in traffic then you’ll need additional testing to make sure the cache works as planned, has a good hit rate, and caches the right requests.

为什么这很重要:如果您的流量相当稳定,则可以监视和调整生产中的缓存以获得良好的结果。 如果您要为流量激增做准备,则需要进行其他测试,以确保缓存按计划进行工作,命中率高并且可以缓存正确的请求。

Not using the real distribution of traffic

不使用实际流量分配

A site that I worked with used a content management system where pages could be cached but it was critical for updates to go out quickly, so there was a short TTL on the caching.

我与之合作的网站使用了可以缓存页面的内容管理系统,但是对于快速发布更新至关重要,因此缓存上的TTL很短。

Many of the pages had relatively low traffic and weren’t that complex to regenerate. However one of the highest traffic pages, that needed very quick updates, also created a very high backend load when it was refreshed.

许多页面的流量相对较低,并且重新生成的过程并不复杂。 但是,流量最高的页面之一需要快速更新,刷新时也会产生很高的后端负载。

There were also some infrequent user actions that couldn’t be cached at all without significant changes and there wasn’t enough time to do those.

还有一些不常见的用户操作,如果不进行重大更改就根本无法缓存,并且没有足够的时间执行这些操作。

If we ran a load test that simply did each of the requests 1,000 times that would give us worthless results. It would be a mix of cached requests that don’t tell us the true backend capacity, and hammering the non-cached parts way more than we needed — which could crash the system before the test gave any useful results!

如果我们运行的负载测试仅对每个请求执行了1000次,那么结果将毫无价值。 混合的缓存请求不会告诉我们真正的后端容量,并且会以超出我们需要的方式锤击非缓存的部分,这可能会使系统崩溃,然后测试才能给出有用的结果!

The solution required a set of tests that ran concurrently. Some of them loaded the high traffic, high workload pages and did regular cache resets to see how they handled that. Others did the infrequent actions that couldn’t be cached, and ran slower to simulate the actual amount of traffic those would get.

该解决方案需要同时运行的一组测试。 他们中的一些人加载了高流量,高工作量的页面,并进行了定期的缓存重置,以了解其处理方式。 其他人则执行了无法缓存的不常执行的操作,并且运行得较慢,无法模拟可能获得的实际流量。

That suite of tests was what allowed us to get useful results, find out actual capacity limits, confirm that fixes worked, and check the stability of each part individually to make sure there were no weak points. None of that would have been possible if it simply ran through a pre-set list of URLs for a predetermined number of times.

那套测试使我们能够获得有用的结果,找出实际的容量限制,确认修复程序有效,并单独检查每个零件的稳定性以确保没有弱点。 如果仅在预设的URL列表中运行预定的次数,这些都将是不可能的。

Why this is important: most of your traffic is probably focused on a few parts of your site or system. Your caching rules are probably optimized around this already but there may be some exceptions. If you want to test how it handles higher traffic, the test needs to approximate the behaviour of those real users.

为什么这么重要:您的大部分流量可能集中在站点或系统的某些部分。 您的缓存规则可能已经对此进行了优化,但是可能会有一些例外。 如果要测试它如何处理更高的流量,则测试需要近似那些真实用户的行为。

正确进行负载测试 (Getting Your Load Tests Right)

These are a few common scenarios I’ve seen. They all come down to the same thing: your load test has to replicate real world conditions.

这些是我见过的一些常见情况。 它们全都归结为同一件事:您的负载测试必须复制现实条件。

It’s just like debugging in development. If your development environment is nothing like production, you wouldn’t expect to find bugs quickly and produce a fix that works well. The first step is to make sure you’re seeing the same thing that you do in production.

就像在开发中调试一样。 如果您的开发环境与生产环境完全不同,那么您就不会期望Swift发现错误并产生有效的修复程序。 第一步是确保您在生产中看到的是相同的东西。

And that’s the best way to think about load testing. First make sure it produces the same results you see in production. Then increase traffic using similar patterns to create a realistic workload. As we saw above that may involve different mixes of cached vs non-cached requests and unique vs repeating requests. Once you find a breaking point that you want to prepare for in live traffic, come up with fixes and run the same test against them.

这是考虑负载测试的最佳方法。 首先确保它产生与生产中相同的结果。 然后使用类似的模式增加流量,以创建实际的工作负载。 正如我们在上面看到的,这可能涉及缓存与非缓存请求以及唯一与重复请求的不同混合。 一旦找到要为实时流量准备的断点,请提出修复程序并对它们进行相同的测试。

This takes more time than just putting your URL into a load testing tool.

这比将URL放入负载测试工具要花费更多的时间。

Can you just skip all of this and increase your server capacity so you have no problems? Maybe. If the cost to do that is less than a few weeks of engineering time that could be the best solution.

您能跳过所有这些并增加服务器容量,这样就不会有问题吗? 也许。 如果这样做的成本少于几周的工程时间,那可能是最好的解决方案。

But that essentially means you have no load test or scalability measurement.

但这实际上意味着您没有负载测试或可伸缩性度量。

You can’t explain to the business and marketing teams how much traffic you can handle (especially when you expect a huge surge in traffic that doesn’t happen regularly). You won’t know which parts of the system scale gracefully and which parts actually get less efficient as the load grows.

您无法向业务和市场营销团队说明您可以处理多少流量(尤其是当您预计流量会不定期地激增时)。 您将不知道系统的哪些部分可以正常扩展,而哪些部分实际上随着负载的增长而变得效率较低。

You won’t come up with the best options to shield the unreliable parts and still create a good customer experience. And the added servers may not even be at the bottleneck so you might not see any benefit.

您不会想出最好的选择来遮盖不可靠的零件,并且仍然可以创造良好的客户体验。 而且添加的服务器可能甚至没有瓶颈,因此您可能看不到任何好处。

If you’re seeing crashes during high traffic periods, adding more servers has not been effective, and it’s causing problems for the business, you need a real and accurate load test.

如果您发现在高流量期间发生崩溃,添加更多服务器没有效果,并且给业务造成了问题,则需要进行真实且准确的负载测试。

You might have a CMS that delivers a constant stream of content, or an API that handles user requests for a popular mobile app. Either way you need to think about similar concepts of caching and traffic patterns to design that right load test that will help you.

您可能具有提供恒定内容流的CMS,或者具有处理用户对流行移动应用程序请求的API。 无论哪种方式,您都需要考虑类似的缓存和流量模式概念,以设计可以帮助您的正确负载测试。

The end result, tailored to your specific system, is the only way to get accurate results.

量身定制的最终结果是获得准确结果的唯一方法。

Do this and you will learn the real limitations you need to address. Before long you’ll be impressing the rest of the business with the stability and scalability of the backend!

这样做,您将了解需要解决的实际限制。 不久之后,您将以后端的稳定性和可伸缩性给其余业务留下深刻的印象!

Interested in more insights about scalability and performance? Follow me on Twitter for future articles and quick tips.

对有关可伸缩性和性能的更多见解感兴趣? 在Twitter上关注我,以获取将来的文章和快速提示。

翻译自: https://medium.com/swlh/why-your-load-test-will-hide-problems-and-lead-to-crashes-in-production-585b6258e67d

负载测试中极限负载

你可能感兴趣的:(java)