Synthetic Testing in AWS: Challenges and Solutions

May 21, 2021

11 min read

A bare-metal server versus a cloud server: not a dilemma anymore. Specialists know that both have their pros and cons, and they make the choice depending on each client’s needs. Clients, however, are not always aware of the risks of cloud migration.

Also, they often believe that by paying the provider of cloud services the same amount of money as other clients do, they will get the same hardware behind the cloud. One of the greatest risks, however, is that you don’t actually know the system’s limitations. By testing your system in the cloud, you can avoid most migration risks and save money in the long run. Wild Apricot is a case in point.

Load Profile for Synthetic Testing in AWS

Back in 2016, they decided to move their systems entirely to the cloud. With 6 million users and 15 k communities (i.e., websites, each with a unique design and its own events and fundraisers, blogs, and forums), the migration risks were high. PFLB helped them get it done.

The Task: Synthetic Testing in AWS

Wild Apricot wanted to seamlessly move everything to cloud hosting and chose AWS as the platform. The objective was to cut maintenance costs, increase availability up to 100%, and ensure the scalability of services in cloud computing.

To confirm that cloud migration made sense, we had to:

It seemed ordinary, but it was not.

Load Profile for Synthetic Testing in AWS

As soon as we started, we faced a couple of challenges. First, we got 50 GB of IIS server logs PER MONTH to process to build the load profile. Obviously, we should have told the client which logs we needed, but we hadn’t. We had neither that much server power nor the time or money to buy additional servers. So, we improvised. We used PowerShell to get rid of useless data from the logs and reduced the overall volume by 50%. We processed the remaining half with Python, grouped the data, and further filtered out any unnecessary data.

The next step was to group the actions by the appropriate time frame. We needed it with 1-minute — in some cases even 1-hour — granularity, depending on the particular task. However, the source data were logged every second. Once done, we processed the data again with our own library and Pandas to get the final result. The load profile was beautiful, and it only took us 2 days to process the logs.

AWS Monitoring Systems: Challenges and Results

We used InfluxDB to combine the monitoring data, Grafana to visualize it, and Telegraph to monitor hardware resource utilization and app performance. We did our best to keep everything in-cloud and online, and the combination of these monitoring solutions seemed fine for the cloud, too.

One of the challenges of monitoring was that we were changing the data sources all the time—both instances and load generators. We were constantly stopping the running ones and introducing new ones to simulate new circumstances and increase the load. Obviously, we had to embed Telegraph into the images of all the stations.

As a new instance was spun up, it started the listener and posted the data to the database. In the end, we had 7 Telegraph configurations in 29 constantly working instances—plus three load stations and one more for monitoring. This added up to 11 dashboards in Grafana, each with its own graphs.

Plus, we were monitoring JMeter separately. We gathered TPS, errors, response times, and other custom metrics. Look at the graphs for MongoDB and MSSQL. Aren’t they beautiful? And informative, too: you can take a closer look at every single phase.

Also, Grafana shows the graphs in your browser, and it gets too slow with growing amounts of data. We had to optimize it to make it load all the graphs simultaneously. First, we optimized the queries sent from Grafana to InfluxDB. Then, we hid the graphs that weren’t used very often to make only what we really needed visible. We also configured alerts and put them on the alert list. We configured Grafana to send screenshots along with an alert when it was triggered to Telegram, but instead you could use e-mail or any messenger of your choice. This way, you won’t need to constantly follow each and every graph, as we did at first.

Synthetic Testing: Our Approach

Eventually, we had to choose synthetic testing software for AWS. The key factor was that the solutions had to be cross-platform, as some servers ran on Linux, while others used Windows. Of course, you’d also like the software of your choice to be free and as fast as possible yet yield explicit results. The final list was as follows:

The next step was to figure out the metrics, since without them the test lasted 8 hours. Unacceptable, right? But the client wanted to run synthetic tests before starting the process of adding each machine to the pool of working ones to ensure that it was as functional as any. We chose matrix size for the CPU and block size for the disk, as well as interaction types, such as reading/writing and randomized access. With the metrics, the test took only half an hour.

Synthetic Testing: Troubleshooting

But there was another obstacle. Suddenly, it just looked too good—too quick, with results unrealistic for synthetic testing in AWS. The thing is, AWS has this undocumented, top-secret feature that would ruin any test. It’s called Burst. With Burst, your results look better than perfect. Each of your hard disks gets a credit of 5 million IOPS for about 30 minutes. This credit allows you to use many more IOPS than you paid for (as many as 5 million!), and all applications start smoothly, quickly, and simultaneously. However, after the credit has been used, the IOPS are set back to the previous level. Then, when you go lower than the baseline, which is around 1500 IOPS, the credit starts growing again. The exact numbers, of course, depend on the hard disk type, volume, etc. To overcome this performance challenge, which had to do particularly with cloud computing, we had to exhaust Burst every time to prepare the system before running the load tests. This point is crucially important; your tests will otherwise be useless.

The challenges we encountered later on also influenced the cloud-based load testing process. First, we needed to randomize the sequence in which Apache JMeter picked lines from the file with site addresses—specifically, 5800 of them. Unfortunately, by default, it can only read them sequentially. To randomize the order, we used the HTTP Simple Table Server JMeter plugin. If you are experiencing similar problems with your performance testing in a cloud environment, we highly recommend that you do the same. If we had used it from the start, it would have saved us a lot of time.

Still, with 5800 separate sites, we kept running out of available connections. Buying more was not an option, since we were in the cloud. You can’t just add an extra memory module to your machine. You’d have to change the instance and pay for it not once but every time. There was no other option but to tune the load station. We changed the following parameters on one machine to increase the number of connections:

This helped for a while, but then the test crashed again. A final move was to switch off keep-alive in scripts because most of the time, the tests did not return to the same site. Once again, problem solved.

The last issue was memory leaks. JMeter leaked around 6 GB per hour—and in AWS, you have to fight for every gigabyte. It is advertised all over the internet that scaling in AWS is easy, but this is just not true. We configured monitoring for the Java Virtual Machine by connecting the Java agent to JMeter and then retrieved the memory dump and analyzed it. We found that the HTTP request classes were taking up most of the memory, as they collect all the request information but are not discarded by the garbage collector. That’s why there were memory leaks.
The Java agent data were sent directly to the database and monitored in real time in Grafana. This allowed us to immediately evaluate the changes we had made in the Java Virtual Machine and in the script instead of just waiting for another crash.

Conclusions: Bare Metal or Cloud After All?

The parameters and configurations in a cloud are very different from those in the same hardware accessed outside a cloud. There are many details to consider. The system didn’t work the first or second time we started it, but with the 8th iteration, we succeeded.

Table of contents

Related insights in case studies

Explore what we’ve learned from these experiences

More Case Studies

3 min read

Benchmark Testing Case Study: How PFLB Validated Chainguard Docker Container Performance

Jul 2, 2025

Chainguard specializes in providing highly secure Docker container images — regularly updated, tightly managed, and designed to minimize vulnerabilities. While their reputation for security is well-established, Chainguard wanted to ensure their containers also delivered competitive performance. After all, strong security is crucial, but it shouldn’t slow you down. Client Goal Chainguard needed clear, objective proof […]

4 min read

From Weeks to Hours: Accelerating Data Masking and Enabling Easy B2B Data Sharing for a Leading Bank

accelerating data masking for a leading bank preview

Nov 20, 2024

A leading bank, ranked among the top 20 in its market, provides services to millions of customers daily. Staying at the forefront of this competitive market requires not only stable and updated infrastructure but also rapid feature delivery to maintain the highest service quality. Challenge The bank faced a critical challenge in enabling safe sharing […]

10 min read

Leading Oil & Gas Innovator, NOV Excels with Real-Time Drilling Data After Load Testing

leading oil gas excels with real time drilling data after load testing preview

Jul 29, 2024

NOV, a renowned global provider of equipment, components, and IT products for the oil and gas industry, which is located in Texas, USA, empowers extraction companies worldwide with innovative technological solutions that enable the safe and efficient production of abundant energy while minimizing environmental impact. Under its CTES brand, NOV offers a range of IT […]

7 min read

From Hundreds to Thousands: Scaling MEFA Pathway Software for Mass Student Registration

how load testing helped e learning services provider preview

Jul 12, 2024

FolderWave, Inc. is a leading digital services provider in the Massachusetts e-learning sector. It aids millions of students in researching and planning a job-oriented education. The company delivers IT solutions, platforms, and services in partnership with notable non-profit organizations like MEFA Pathway and College Board, which connect a vast network of colleges, schools, and universities […]

Be the first one to know

We’ll send you a monthly e-mail with all the useful insights that we will have found and analyzed

People love to read

Explore the most popular articles we’ve written so far