Build Your Own Botnet with EC2 and Capistrano to Load Test Your Server Cluster before Launch
Different types of web applications scale differently. When load testing is required, tools like JMeter may come in handy but when special needs arise, a hand-rolled load testing harness using Amazon’s EC2 might be just what you need to stay out of trouble.
Launch Fail
Launch Fail is every web developer and entrepreneur’s nightmare. When you finally expose your system to the public, your servers fall over and die under load. This is quickly followed by stressful hours on emergency calls, huddled in a room with your fellow engineers perhaps while your boss runs cover if you’re lucky. Not Fun. How can we avoid this?
Too much traffic is somewhat of a welcome problem to have but it is still a problem and sometimes a very big one. You only get one chance to make a first impression. Cute as it is, we don’t want it to be the Fail Whale .
The truth is that the vast majority of web sites can deal with scaling issues slowly over time as they enjoy moderate growth. As a site gets busier, the developers discover database optimizations that need to be made. They may decide to move onto bigger slices and/or add slices and begin load balancing. Perhaps the database can be sharded. Maybe media should move onto a content distribution network. Perhaps the site should cache frequently used data. Maybe an offline work queue would be helpful. The strategies appropriate to all but the biggest sites are fairly well established and many can be implemented incrementally as needs dictate.
Sites with Exceptional Scaling Requirements
Some sites by their very nature require more attention paid to scalability up front. This is particularly true for highly interactive, event-oriented sites. Traffic focused around an event has the effect of funneling what might otherwise be a manageable amount of traffic into a short time window. If the site facilitates real time or near-real-time interaction between visitors, the effect of adding simultaneous users can be highly nonlinear.
In the example graph above, we observe two inflection points where the behavior of the system changes substantially under load. Up to around 300 users, the system scales well and response times remain low. Then something starts to happen. The reduced performance is possibly tolerable for a short time (depending on requirements) but even this breaks down with another 300 concurrent users.
While this scenario may be illustrative of a typical scaling problem, the causes of the degraded performance at the different load levels will vary. Troubleshooting requires correlating data from different sources and running multiple tests. We will not go into all the techniques and sources for data here but instead focus on how to generate the load — for it is impossible to even produce this telling initial graph without being able to simulate an arbitrary number of concurrent users.
Generating Realistic Load
The method by which we simulate concurrent users varies according to the character of the application. A conventional web application with a little Ajax can be adequately tested using a tool like Apache’s JMeter . JMeter runs inside a Java virtual machine and uses Java’s threading capabilities to generate concurrent requests. The requests are replayed from a script created by recording the conversation between a browser and the server when a test user uses the site. During a test, you may instruct JMeter to start a specific number of threads. Each thread will replay the test script either a set number of times or loop indefinitely.
Threads vs. Users
One reason you want to perform a load test is likely because your boss has asked you, “how many users can we support with three slices?” or alternatively, “We are contractually obligated to support 1000 concurrent users, can we be certain it will work?”
Thus the first trick in load testing is figuring out how many users you are simulating during a test.
Whether we use JMeter or something else, we can control the number of concurrent threads of execution but how many users does X number of threads represent? Concurrent threads do not necessarily equal concurrent users. For ease of estimation we want to try to get the threads-to-simulated-users ratio close to unity.
To accomplish this, the test scripts must have appropriate delays added in or be recorded in such a way that normal user behavior is reflected. If the normal user will occasionally become passive and not generate requests to the server, then the test script should reflect that somehow.
Recording a good test script
Start by recording a single click path or navigation through the application then add multiple scenarios. Rotating between them or appending them into one long script might give more realistic and smoother results on average. Once recorded, replay the script using a single thread and observe the logs as a sanity check to ensure that it behaves as expected. You should be comfortable that the request profile resembles that of a single concurrent user.
Limitations of a Workstation
A single, powerful developer workstation can generate tremendous load on a server with a tool like JMeter. However, a workstation is often on the end of a relatively slow consumer internet connection and may not have sufficient upstream bandwidth to load a heavy duty server cluster or cloud configuration. Multiple workstations will share the same network bottlenecks and coordinating testing with multiple parties at other locations is impractical.
If the application involves real-time interaction between concurrent users such as online chat or exchange of documents or if it involves Comet (push), persistent connections or heavy Ajax interaction, simply replaying a single script of pre-recorded HTTP requests may not accurately reproduce actual load conditions. For example, if a user can see a list of other logged-in users and then choose to invite one or more to chat, how would you record the test script? There may only be a few users in the system when the test is recorded and they may not be the same ones that are there when you run the load test later. What you want is to pick the Nth user in a list generated at test time. In cases such as this, it may be necessary to script the behavior of an actual client browser or application and to actually fire up a web browser to replay the test script.
Scripting The Browser
Launching a browser and scripting the interaction with our server exercises the full application stack. Any performance issues introduced through behavior of the client or interaction between concurrent users through the UI will be demonstrated.
Inspiration for this technique can be found in testing tools like Selenium . In fact, when we encountered this very scenario on a client project, we looked at Selenium Grid for ideas.
The Cloud
Once you have a method of exercising your full application, including the browser, how do you generate sufficient load? You will need to run many browser instances. Even a powerful workstation will slow down with more than a few instances running. You will need to command many machines in order to run what might be hundreds or thousands of browser instances. Enter the Cloud. Like the sorcerer’s apprentice in Der Zauberlehrling , you must summon a vast army of “spirits” to do your work for you. Basically, you build your own mini-botnet.
Using EC2 to Generate Load
Amazon’s EC2 service or Elastic Compute Cloud allows you to summon unlimited computing power on demand for as long as you need it and pay only for what you use. In practice, this means you may boot up to 20 concurrent “instances” at one time before asking Amazon to approve your account for higher numbers. If you are new to EC2, it will take some getting used to. Read up on it at Amazon’s site first. After creating your account, download the ElasticFox plugin for Firefox. You might as well get the S3 Organizer while you’re at it as well.
How Many Instances?
Launching an instance is not a cost-free operation. Your account will be billed for at least one hour of time for each instance you launch. This can be anywhere from $0.10 to $1.00 depending on the instance type you request. Firing up a test spanning many instances also takes a long time. Therefore, in order to make efficient use of EC2 and your own time, you will need to run as many browser instances on your test machine instances as possible without invalidating your test. Through experimentation we discovered we were able to launch 10 instances of Firefox, each running a pretty heavyweight Adobe Flex application, on a “c5.medium” size instance running Ubuntu. Here is how Amazon describes this instance: “High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform.” Your mileage may vary.
Building an Amazon Machine Image
An EC2 instance is launched from an image of a machine contained in an AMI or Amazon Machine Image. AMI’s can be public or private. It may prove necessary or convenient for you to create your own private AMI from which to boot your load test instances. The alternative is to build a script which takes an instance of a public AMI containing presumably a stock linux distro and configures it to do your load testing. It depends on how much software you need to install and configure to get your load tester running. If you choose to build your own AMI , pay careful attention to whether you ultimately will need or can use the 64bit instances. If you build out a 32 bit AMI and decide later you want 64 bits, you will have to rebuild it.
You may want instances of Firefox running on the same host to log in as different users or at least have their own sessions. The session data is stored in a per-profile cache under the user home directory so you will need to have one browser profile set up for each browser instance. Your AMI should be set up with the profiles ready to go so you can launch them when starting a load test.
In order to launch the browsers, you will need a windowing environment running on your instances. Vncserver can be used to launch headless X window sessions to which browsers may attach. If you set up your Security Groups properly you can even connect to your instances with VNC to watch them work.
If you are going the private AMI route, you want to upload your AMI to S3 and make sure you can boot it. If you are not using a custom AMI , boot your chosen public AMI and run your install script on it. You will need to devise a method for launching your Firefox instances from your computer via ssh. We suggest using Capistrano for this. When combined with the amazon-ec2 gem, you can define tasks to automatically boot your instances and run commands on all of them in parallel. An alternative option is to use shell scripts and the ec2-api-tools which provide a java-based command line interface into the cloud.
Running Tests
Decide how you want to launch your tests. It is a good idea to break the process into two stages so you can fire up a bunch of EC2 instances and leave them idle until you are ready to start your test. You will want to stop and start testing repeatedly while testing different server configurations or load levels and collecting data so you will want a way to stop your tests without shutting down all the instances. Set up the following tasks in Capistrano or build them as shell scripts.
- launch_loadtest_instances (takes a parameter of how many to start)
- start_loadtest (maybe takes a parameter of how many browsers to launch on each instance)
- stop_loadtest
- shutdown_loadtest
Once you have these tasks working, you should be able to conduct load tests. You may wish to add a delay to the script which launches browsers on each instance so that you do not have several hundred simulated users all attempt to login or hit the main page or perform some heavy operation at the same time — that is unless such behavior can be expected of your users.
Now What?
With this fancy, cloud computing test rig, you should be able to generate the load which will allow you to collect valuable information about the performance of your production server environment. Having the ability to perform this kind of testing will help you isolate and resolve scaling problems… and summoning 100 servers in the cloud to do your bidding is kind of, AWESOME !