Archive

Transaction Performance

Frustrated User

Bring Your Own Response Time.

The consumers’ expectations have greatly influenced the demands and expectations on Enterprise IT departments. The consumer and IT customer brought their own devices, expected more self service and at a much faster pace. One of the key tasks that a Performance Engineer must do is to help the business and IT set expectations on the response times of corporate systems. The history of performance requirements for corporate facing systems and even call centers has been problematic. Often times ignored and certainly deferred. The typical approach is to see just how slow the system can be before the users completely revolt. This tends to be the case because its not a revenue generating system, however, many of the Corporate IT systems directly touch the customer or business partner after the sale or contract is signed.

Response time or performance goals for Internet retailers is well defined and measured, there are many Industry specific benchmarks that compare the response times of web pages against competitors in the industry. The Internet business models demand faster and faster response times for transactions. Benchmarks can be found at Compuware (www.compuware.com), Keynote (www.keynote.com), among others. However, there is not a benchmark for Corporate systems. The users of Corporate systems are starting to voice their concerns and displeasure more loudly.  They are expecting speeds comparable to Internet Retailer speeds. Their expectations are for less than five seconds and often two seconds for simple transactions.

Our studies have shown and are in alignment with the research done by Jakob Nielson (www.nngroup.com) on usability, A guide to setting user expectations must consider three barriers;

1) 0.100 Seconds: The user perceives the system to respond in real time with out any noticeable delay

2) 1.0 Seconds: the User starts to perceive a slight delay with the system, but us very happy with response time

3) 10.0 Seconds: the user will greatly notice the delay and start to be distracted and attempt to do other things while waiting

So, just as the consumer has brought their own devices, they are bringing their own Response times to Corporate systems.

 

Advertisements

Stresstest

Purpose of a stress test

The purpose of the stress test is to design and execute extreme testing scenarios that will cause the application to violate its Service Level Objectives. This is accomplished by increasing the workload on the application for both online and background transactions. The workload for the stress test is well above the normal workload day. A key outcome of the stress test; it allows you to find out how much headroom there is in the system before the user experience is severely impacted or the background processes slow down.

  • For online transactions, the response time will start to increase under the workload as more users are added and as the transaction arrival rate increases.
  • For background transactions, the component throughput will decrease as the workload increases. For example, starting at 10 Orders per second, it will slow down to three Orders per second.

Ultimately the workload is increased until the application breaks.

Stresstestbatch

Stress test entry criteria

A large application stress test requires a large amount of preparation in order to get ready to run the tests and maximize the value from it. To successfully execute a Stress test, the starting performance of the application is critical to know. Is the application performing well under a normal workload? Otherwise, what is the value derived from the test if the application has not achieved the service level objectives before you start the stress test? Here is a sample checklist you can use to help determine if your application is ready for a stress test.

Item No. Criteria Comments/details
1 Passed performance test scenarios and achieved the performance goals
2 Critical business transactions have met the service level objectives   (list critical transactions)
3 Meeting current online transaction service level objectives
4 Meeting the background or batch service level objectives
5 Meeting the real-time messaging requirements, at or better than the   desired throughput
6 Application is at the expected Release Level or version for   production
7 At normal load, it is fitting within the Batch window
8 The Stress testing system and application configuration is like   production configuration
9 Data in the database is at the expected production size.
10

SuperBus

A Farrari 458 Italia can move two people at speeds in excess of 200 MPH, the Superbus (pictured above) moves 23 people at speeds of 155 MPH, the newest Maglev train in Japan tops out at 312 MPH, with 14 cars each carrying 68 passengers (952 people). These are examples of bulk or batch movement of people.

Processing data, moving data, from one application to another. This occurs in a number of ways either as a singleton or a large set (batch), instantly, near-real time, delayed by minutes, in batches throughout the business day or in a large batch cycle at the close of the day. Each case has specific performance and throughput goals. Software performance engineering includes methods for batch processing.

Lets talk about batches…

Our complex business and IT systems of today still do a large amount of batch processing, moving data out of one system and into many more systems, with transformation and enhancements along the way. These can be internal systems in the Enterprise or external third party providers or even customers. I have noticed over the past few years that batch performance design has become (by accident) an iterative approach.
The original design goals were simply specified for the entire batch process, it must be completed by 06:00 EST. Often times, the batch critical path is not clear. There has been little thought to the throughput of each batch job (transaction per second). The iteration starts during the QA testing process when it is discovered that the current batch process will take 24 hours to process the days’ work. Someone finally did the math. The team did not know if it was designing a Farrari, a Superbus or a Maglev.
For the critical path batch processes you must define the required throughput. What are you moving? Invoices, orders, customers, transactions. How many of them do you have and what is the peak volume? What is your scalability approach and how do you achieve increases in throughput? Do you need to scale-up or scale-out?
You need to design the batch process to fit comfortably in the batch window and with room to grow. This year 500,000 invoices, next year is 750,000 invoices. How do you process the increase and stay within the window?

Software performance engineering Sample Questions

1 What are you processing?
2 How many steps (programs) are in the batch process?
3 What is the overall throughput?
4 What is the throughput of the individual programs?
5 What is the growth rate?
6 What is the peaking factor?
7 How will the programs scale with the load?
8 Have you identified the critical path throughput goals?
9 Are all the developers aware of the throughput goals?
10 How will you test and validate the throughput of each program?

Know your design goals

The barchettas were made in the 1940’s and 1950’s. An Italian style topless 2-seater sports car, designed and build for one purpose, to go fast. It was built for racing, weight and wind resistance were kept to a minimum, any unnecessary equipment was removed, no decoration. Doors were optional. Ferrari created one of the earlier models and other followed with the same style. The design team was focused on speed, they knew there were performance and response time goals for the car.

Market Data

Calculating system momentum – A market basket of transactions and an index

Can we use momentum as a derived value or index to alert us to impending problems with the application or system? Well, the transaction response time is really a byproduct of the workload on the system resources. So, may be a better way to look at it is; does the workload have momentum? Is the workload increasing or decreasing? Borrowing from Physics, momentum is equal to mass times velocity.

We could use transaction complexity to represent mass, we all know that some transactions are heavier than others. However, using response time as velocity really does not work. Instead I could use the transaction arrival rate to represent velocity. Then I could say that the transaction or system momentum is increasing as the arrival rate increases, taking into account the weight of the transactions.

What I am looking for is a communication vehicle to let non-technical people know how the health of the system is.
Momentum is equal to the transaction weight times the arrival rate of the transactions.

I need to pick a rating or scale for my transactions; 1,5,10. Then there is an overall transaction arrival rate and an individual transaction arrival rate. I need the individual transactions in order for the momentum index to have a chance of being relevant.

M = (T1 * T1 TPS) + (T2 * T2 TPS) + (….) or index?

This would be a very custom index for each application. It represents a market basket of transactions. Much list an EFT represents a basket of stocks.
Also, what I want to determine is how quickly the momentum is changing up or down. If I can get the real-time transaction arrival rates, then I can use the momentum to get an early warning of trouble in the system. Another term, might be a volatility index for the application. Can I get the alert in the front-end of the application early and the correlate with all the system resource monitors.

For this I need to borrow from the Financial markets High Frequency Traders. They have tools and techniques that track large amounts of market date in real-time and try to jump in-front of the market momentum. In need to jump in-front of my system momentum.
The faster I can determine that the arrival rate of the heavy transactions is increasing, then I might be able to jump in-front of that and prevent an application or system outage. I need to calculate the rate of change in real time of the arrival rates. I need a to see that at a clock tick at time zero, the arrival rate is 10 TPS and the transaction response time is 300 ms. Then I need another sample at the next clock tick to calculate the TPS is now 11, and the response time is 305 ms. Perfect for using HFT techniques.

Odometer

Dateline Monday April 1st.

Big Data and Software performance engineering combine to help eliminate the Government debt, with the ability to collect real-time speed information from every car on the high-way and instanet performance optimized analytics.

Speed Limits

In an effort to continue to reduce the national and local government debt, the Car makers and the Secret lab from the government have created the ability to enable every car to instantly broadcast its current speed and plate number, encrypted of course, only readable by those who need it.  This allows the state and Federal government to instantly collect new revenue.  The wireless broadcast of the automobile speed and plate number will be good within a ½ mile range of the car, where the remote collectors can assess the fines as they happen.

There are a few options for the program; before each driver is charged, they will be given the option of speeding for the day or just this instance.  This allows the driver to purchase in bulk and at a reduced rate for the day.  If you have urgent meetings during the day, this will allow you to continue to speed and will not be pulled over the rest of the day.  This is a new and revolutionary take on the Speed Pass.  They still have to work through refunds when the driver cannot exceed the speed limit.  Perhaps, roll it forward.  People can enable payment through iTunes, Paypal or credit cards.

Upcoming Releases

It is rumored that the next Release of the program will introduce a step program, increments of 5 MPH over the posted limit to create a new premium tier option.  Also, the government is looking at maximizing its revenue by not allowing the Insurance companies to hit the driver with a surcharge for speeding. One anonymous insurance source is quoted as saying “What the hell?, we’ll show them whose boss.”

The implications for Big Data and performance engineering are tremendous.  This type of service offering by the Federal and state governments was only possible by the breakthrough advances by the Big Data and Software performance industries. A long time data and performance engineer, Chris says “This is an outstanding combintation of the two domains and wants to know if its Shovel Ready”,

Airlines get in the act

In a related story on remote monitoring, payment and data collection; a Major Airline is looking at adding remote sensors in every seat to record the weight of each passenger.  Then after deep analytics, the system will charge people more that are over the average weight for an adult.  They are still in Beta, working through how to identify males and females based on name, when to take the reading and what if people change seats.

SWSlowPerf

Some of the top reasons that your application/web site/mobile app is slow;

10) I thought you turned off the diagnostic logging

9) Do you really have to index a table with five years of history and one Billion rows

8) You doubled the number of calls from the application tier to the database tier for the same workload and were surprised by the increase, which no one noticed until production.

7) They moved the application server to another continent

6) They virtualized it (you weren’t using all the real CPU anyway)

5) You wrote your own caching component and didn’t really understand the impact of flushing the cache

4) Even Amazon Web Services stops allocating Servers (thought you could buy your way out)

3) The Marketing group ran a hugely successful ad on a major TV program and you under estimated the new workload. The good news is you and the CIO are on a first name basis.

2) They upgraded to a new version of the Application server/database server/etc. and no one thought a performance test might be needed.

1) The business critical Applications don’t have performance goals, how do you know its slow?

Also, I like “Is there really a difference between polling and event driven programming?”

 

Workflow

Business workflow, business process

The System response time must not impact the workflow. The transition from transaction to transaction must be seamless and the user must not notice the system.  One might even describe the interaction between the person and the system as graceful and flowing, where the system responds before they can even finish a sip of their coffee, do your users cozy up to the system (too far??).

Understanding each workflow in the application is crucial to setting the proper response time goals of the application. This is required to set up the software performance requirements for the system and for each transition that supports the workflow. The systems today are highly distributed with web servers, application servers and web services, and message hubs and multiple database, etc.  In the software requirements phase, once the workflows are defined with performance goals, it is critical to make everyone who makes a component in the workflow aware of those goals.

There are call center workflows, document management workflows, order placement workflow, business intelligence and analytical workflow, and of course Big Data workflow.

When in the software requirements phase you might consider this checklist for the workflow;

1)      Identify the workflows: Have the key workflows been defined that have performance requirements and are the response time goals defined?

2)      Duration: have you defined the overall duration of the workflow? How long should the call center interaction by?

3)      Downstream processing: Have you defined when the data from the workflow must be available for the downstream workflows?  For instance, after collecting a customers demographic information and vehicle information, when is it available for rating a policy? 30 seconds, 24 hours?

4)      Business transactions: These support the workflow. Have performance critical business transaction been defined, with response time goals?

5)      System Transactions: These support the business transactions. Have you defined the response time goals for critical systems transactions supporting the critical business transactions?  This is where share system transaction can be found, have your requirements captured enough performance information to tell the developer how fast this system transaction must be and how many transactions per second it will support?

6)      Performance budget: Now that you have a business transaction response time goal, have you allocated the response among all the technical components supporting the business transaction? You should create a system interaction diagram to help with this, defining the time allocated across the tiers; client, web, application, message hub, database.

7)      Database query requests: Have you categorized your database queries? Simple transactions to the complex? Is there a response time goal for each? Is there difference between the first request and subsequent request?

8)      Report requests: Have you categorized the report request types?  Simple reports are 2 seconds, complex multi table grouping, ordering reports take longer that cross fiscal quarters?

9)      Discussion and negotiation with the end-user or business sponsor.  All along you must be in discussion with business people who own the system. The role of the architect is to work with the business to tell them what is possible and how much it will cost. The business priorities are critical.  The business might want to spend the extra money to have near real time reporting to gain advantage or they might be satisfied with a four hour reporting window.

 How to handle the response time discussion

Categorize: You should look to categorize the response time into; satisfied, tolerating, frustrated and abandonment. Two seconds could keep the people satisfied, while eight seconds will make them frustrated for an online transaction.  Another transaction at five seconds could keep people satisfied and 15 seconds make them frustrated.

Percentiles: You need to establish a goal for what percentage of the user population would be satisfied, 50th, 80th, 90th percentile? 90 percent of the people should have a satisfied experience.

Under what load: You need to discuss with the business people that there is a normal workload, a peak workload, an above peak workload and define target for each.  This business might ok with a relaxed target where the people are tolerating or frustrated for a period of time during a peak load for a short duration.