Recently we have finished a truly interesting project for a big national bank, i.e. we’ve performed load testing for Robotic Process Automation, which is an automated system aimed to fully substitute any operational bank employee working on a PC.
It’s not important for the system, which task to perform or which software to work with, e.g. it can calculate a salary or order notebooks and pens from an e-shop. The idea was not for the robot to send hard-coded requests to somewhere, but to use the entry signals such as mouse and keyboard in the same way as an actual user does.
The customer planned to increase the amount of virtual automated workspaces (VAW) with robots, at the same time increasing the amount of executing processes. PFLB team has been requested to perform load testing and to solve the following issues:
The production contains 1 database server, 14 application servers, 1 balancer and 700 VAWs.
The customer wanted to implement the idea to create a robot emulator that will emulate the work of 700 VAWs on one server.
The pseudo robots were to send requests that they are starting working in order to load the system. After giving it a thought, we’ve decided to reject this approach, because Jmeter can substitute the robots. Obviously, one user stands for one robot. In fact, the robot just sends requests to the application server, so it’s not important what it’s busy with for the load testing.
It turned out to be a black box. The solution was evident: traffic logging. As the production uses .net-remotion secure protocol, we’ve transfered the load testing environment to insecure for simplicity. The traffic was written using WireShark, because .net-remotion sends TCP requests. A TCP request has the following structure: [adress – the amount of bytes to transfer – the body of the byte line – the symbol of the query ending].. The sniffer returned a hex line of the transferring bytes. At the beginning we’ve used a self-written convector for parametrization, afterwards we’ve just used to read the hex lines. In the end the load script contained a package of TCP samplers, where the body is a hex line, parametrization preprocessors, and post-processors for correlation.
An important discovery was that every not damaged TCP request becomes 200 response, even if in the response it’s written “Your query is useless”. To verify the response body, we used Jmeter’s asserts. Another nuance was the load script logic. We’ve decided that one script with branches inside is better than 20 scripts for different types, volumes, and intensities according to the load testing profile. In this way we’ve created an instrument that can reproduce the load given by real robots on the Blue Prism (BP) servers.
We’ve got only one BP server, whereas there are 14 of them on production. That’s not a problem: any SQL request is sent using TDS protocol, so we can log it and reproduce it. Now a Jmeter user substitutes a part of the BP server for us, i.e. it sends requests, caused by a robot.
While running the first maximal performance search test, we’ve seen that even though we’ve reached 100% of the load testing profile, the hardware resources are not loaded as much as on production. It turned out that:
The current 100% production load utilizes almost all bandwidth (9 from 10 Gbps), as well as a significant amount of other hardware resources. As the amount of robots gets increased, the hardware resource utilization is growing linearly; this is also the case when the amount of processes or their load is increased. Obviously, if we multiply 2 linear functions, we’ll get a quadratic function that reveals us the maximal performance in 5 minutes using Excel.
But the maths turned out as not too helpful, so we wad to translate the test, where we’ve increased the amount of robots, as well as the amount of processes. The maximal performance result was 109% of the load testing profile with network restriction and 140% of CPU load, for which we had to slightly decrease the size of XML files. All that happened even though the 100% of the load testing profile lead to 40% of the database server CPU utilization. Which conclusions can we make? Widening the network channel, i.e. getting the “golden” cables, is not going to boost the performance
the recommendations how to optimize the script logic for robots have been compiled;
a sizing TTC server table has been developed for different load levels;
the recommendation has been given that the system performance will be boosted by creating additional instances of Blue Prism.