Not all business and system level transactions are created equal. Some system transactions are lightweight and do not consume many system resources. Others may be an order of magnitude heavier than others. Typically the smaller, lightweight transactions are the majority of the transaction mix. For example, a key business transaction might be “order status”. There are dozens of system transactions that support that. This is a relatively simple transaction, a logged in web user selects the check order option, the system has the customer information and can quickly retrieve the information. However, this can become more complex if the user can make modifications. Other more complex transactions may involve rules processing or complex history queries, or an inadvertent poorly written database query.
These system transactions are executed on web servers, the application servers, the database servers. Systems are designed and constructed to support tens of thousands of users, executing 100’s of concurrent system transactions. These transactions are manipulating customer orders, order history, billing and invoicing, search and recommendations based on past purchases. They are eventually translated into a generic transactions per second rating, TPS.
Transactions per second
For example, after doing the workload analysis for a new system, the model might predict that 50 TPS can be expected. This may start as 50 URL requests per second, and then fans outs to web service requests, method invocations, queue writes, and SQL statements. Each being a multiple of the original 50 TPS.
The generic TPS rating is a guide, however we need the detailed transaction mix and we need the workload distribution. The TPS is an aggregate for transactions, and an average across some time interval. For instance, over the eight hour day, 1,000,000 URL requests were processed; this becomes 125,000 per hour, 2,083 per minute and 35 per second. This even distribution is misleading. You must know the distribution over the eight hours.
What if 50% of the transactions arrive in the first two hours or first hour? At two hours this becomes 250,000 transactions per hour, double our original estimate. Now our TPS becomes 70. Do you design a system that can handle 34 TPS or 70 TPS? By the way, what is your transaction mix?
The overall TPS is composed of a transaction mix. You must be aware of the heavy transactions, as they consume significantly more resources. If you predicted 5 TPS for these heavy hitters, of your overall 34 TPS or 70 TPS, what happens to your system when these heavy hitters increase to 10 TPS or more? Your system will run out of gas quickly and sooner than your capacity model predicted. Using the proper monitoring tools will enable you to watch the workload profile on the product system. You can review the daily workload and compare to the planned workload or normal workload.
For improved capacity planning, your TPS must be categorized by transaction.
Peak hour and peak minute
What is your peak hour? What is the peak minute within the peak hour? Does your business require you to design and build the system to the peak minute? Is the peak a sudden sharp spike or is more gradual and build over minutes, sustains and then subsided? Your system might be able to handle a sudden short spike, the transactions will slow down for a brief period (seconds) and then work through the queues and return to normal after a few minutes.
However, if the peak load is sustained, your system will slow down and the queues will fill-up. Due to the profile of the spike (duration) the system might not be able to work through the queues, so the requests continue to slow down, connections time-out, a queue forms for connections pools, etc. For this case, you must designed and build the system for the sustained peak.
Under load your complex heavy transactions will slowdown exponentially.
1) Transaction mix: Identify the transaction mix and the heavy hitters
2) TPS: each transaction needs a TPS rating, as well as the aggregate TPS. 50 TPS, 10 TPS Login , 10 TPS Product search by name , 10 TPS Enter order ,etc.
3) System resources: Measure system resources consumed for each transaction (CPU, DB)
4) Performance test: Define and execute performance tests to validate the transaction weight on the system. Run each transaction independently. Measure the TPS that the JVM can safely support.
5) Monitor production: Compare workload models, trending response time of transactions
6) Identify Peak hour and peak minute
7) Understand transaction workload distribution