Archive

Software performance requirements

 The SPE Body of Knowledge

 

How can we benefit from the using a Body of Knowledge approach for Software Performance Engineering. The goals is to address the career path, the organization, the industry practices and to enable you to build a BoK within your company, for the performance engineer. There are five knowledge areas for performance engineering.

This is an overview presentation I gave at the Greater Boston Computer Measurement Group and the National meeting. There are two documents here, the Powerpoint and the Detailed document.  It is a work in progress.

PresentationSPEBoK CMG National V4

Paper with the details of the SPEBoK: The Guide to the Software Performance engineering body of knowledge V4

Please send me your comments..

 

Walter

 

 

Frustrated User

Bring Your Own Response Time.

The consumers’ expectations have greatly influenced the demands and expectations on Enterprise IT departments. The consumer and IT customer brought their own devices, expected more self service and at a much faster pace. One of the key tasks that a Performance Engineer must do is to help the business and IT set expectations on the response times of corporate systems. The history of performance requirements for corporate facing systems and even call centers has been problematic. Often times ignored and certainly deferred. The typical approach is to see just how slow the system can be before the users completely revolt. This tends to be the case because its not a revenue generating system, however, many of the Corporate IT systems directly touch the customer or business partner after the sale or contract is signed.

Response time or performance goals for Internet retailers is well defined and measured, there are many Industry specific benchmarks that compare the response times of web pages against competitors in the industry. The Internet business models demand faster and faster response times for transactions. Benchmarks can be found at Compuware (www.compuware.com), Keynote (www.keynote.com), among others. However, there is not a benchmark for Corporate systems. The users of Corporate systems are starting to voice their concerns and displeasure more loudly.  They are expecting speeds comparable to Internet Retailer speeds. Their expectations are for less than five seconds and often two seconds for simple transactions.

Our studies have shown and are in alignment with the research done by Jakob Nielson (www.nngroup.com) on usability, A guide to setting user expectations must consider three barriers;

1) 0.100 Seconds: The user perceives the system to respond in real time with out any noticeable delay

2) 1.0 Seconds: the User starts to perceive a slight delay with the system, but us very happy with response time

3) 10.0 Seconds: the user will greatly notice the delay and start to be distracted and attempt to do other things while waiting

So, just as the consumer has brought their own devices, they are bringing their own Response times to Corporate systems.

 

GhostMachin

Please allow me to introduce myself, I am a glitch of wealth and taste. I’ve been around for a long, long year, stole many a software architects soul and faith. I was around when the Architect had his moment of doubt and pain. Pleased to meet you, hope you guess my name. But what’s puzzling you is the nature of my game.

Perhaps there is something more sinister at work here, with all the headlines on software glitches and problems, maybe there really are ghosts in the machines.

I rode a tank, held a generals rank

The software glitch is large and in change, it creeps in and disrupts your plans and intentions. Most people really want to design and build software systems and web sites that will provide an outstanding user experience. However there are too many demands and temptations to keep to the schedule even when there is ample evidence that it is not going to work.

I laid traps for troubadours, who get killed before they reached Bombay

There are so many components involved in large scale enterprise systems, the glitch is laying in wait at every corner, at every hand-off. Often times, it is hard see while you are building the system. However, you can feel it coming, you can feel its cold breath on your neck, when you turnaround, no one is there.

You check in your code, hoping the build cycle will keep your code safe and keep the glitch out. As the system gets larger and more complex, you feel that something isn’t quit right. As the release date grows nearer, you stay later, work later, working into the night. The code is finally sent to the QA team for testing.

Just as every cop is a criminal, and all the sinners saints

The QA team moves through their test plans, they find minor things that are not working.  These issues go back for fixes and are then retested. Nearing the end of the testing cycle, several of the team members report errors, that can’t be reproduced. The function failed, then succeeded. This is the first sign that a ghost is in the machine. The team dismisses this inconsistency, and they continue.  They pass the new Release. They kept to the schedule, they didn’t do all the testing they had planned. They didn’t have time for a performance test.

So, if you meet me, have some courtesy, have some sympathy and some taste.

Performance testing

Performance testing

Hulk Smash puny web site.

The Healthcare.gov web site was crushed or smashed as soon as it was opened. The top IT people are quoted as saying it was due to the volume and unprecedented number of users accessing the site. They had more than five times the estimated number of people, they predicted 50,000 to 60,000 concurrent users, actually there were 250,000 to 300,000. The total number of people who could use the web site is estimated to be 23 Million, if one percent are on the system, that is 230,000 people at one time. To arrive at the 50,000 to 60,000 user estimate, that would one quarter of one percent (0.25%). That seems a little low, is it more reasonable that 1% or more would access the system that was so widely publicized?

Demand forecasting and scenarios

It is always a challenge to define the workload profile for a new web site. There are user behavior questions like; when are they most likely to access the web site, what is the hourly traffic pattern distribution, how many simultaneous users will there be, what would the peak be, what is normal?  What are the types of user profiles or user categories? What is the possible number of users (23 million)?  What is the average number of users (1%, 230,000)? What if you got 2X or 3X the average?

What will the workload profile be like as the deadline to register gets closer? Will the pattern be like the Batman mask; a spike in the beginning, then it flattens out, and then spike at the end of enrollment.

The next topic to understand is the transaction mix, what are the users doing on the web site? For new web sites, the registration event is a onetime transaction for each user, they will not register again. But for new site, this is the most critical transaction. So, the question becomes, how many simultaneous registrations do you expect?

The Healthcare.gov web site starts off as a router, of the 23 Million people who will visit it, some percentage will be routed to the State Healthcare web site, so, maybe 12 Million will eventually be the user population.

Design goals

Answering these questions setups the design goals and the performance testing goals for your web site. Most web sites and the Healthcare.gov site is not different, must interact with many third parties.  You can communicate to them what your volumetrics are so they can plan as well.

The design goals are translated in the development phase. The developers must know what that critical business transactions are, their response time goals and their throughput (10 TPS).

Performance testing

Preparing to test a system that might have 10 Million, 15 million or more customers is no easy task. The data setup alone is a huge task. The team will need to create the synthetic data to support the performance test. The team will then have to simulate 50,000 or 100,000 users accessing the system. This is where Cloud based performance testing tools can really help.

Production – APM

For any new web site, with a new user profile, you have to be watching the system utilization and the transactions. Application performance management is a critical part of being prepared for such a large system. The tools can provide real-time transaction performance information, measuring the user experience. You can spot slow transactions before it becomes a real issue. The system resource consumption can be closely monitored as the workload increases.  Again allowing you to respond with more resources quickly.

Leverage the Software performance engineering Body of knowledge

What can you do to prepare for a very large web site?  You can take advantage of the Software Performance engineering Body of knowledge and apply each knowledge area to prepare.

  • Software architecture and design
  • Performance testing and validation
  • Capacity planning
  • Application performance management
  • Problem detection and resolution

These five areas work hand-in-hand to prepare your web site.

SuperBus

A Farrari 458 Italia can move two people at speeds in excess of 200 MPH, the Superbus (pictured above) moves 23 people at speeds of 155 MPH, the newest Maglev train in Japan tops out at 312 MPH, with 14 cars each carrying 68 passengers (952 people). These are examples of bulk or batch movement of people.

Processing data, moving data, from one application to another. This occurs in a number of ways either as a singleton or a large set (batch), instantly, near-real time, delayed by minutes, in batches throughout the business day or in a large batch cycle at the close of the day. Each case has specific performance and throughput goals. Software performance engineering includes methods for batch processing.

Lets talk about batches…

Our complex business and IT systems of today still do a large amount of batch processing, moving data out of one system and into many more systems, with transformation and enhancements along the way. These can be internal systems in the Enterprise or external third party providers or even customers. I have noticed over the past few years that batch performance design has become (by accident) an iterative approach.
The original design goals were simply specified for the entire batch process, it must be completed by 06:00 EST. Often times, the batch critical path is not clear. There has been little thought to the throughput of each batch job (transaction per second). The iteration starts during the QA testing process when it is discovered that the current batch process will take 24 hours to process the days’ work. Someone finally did the math. The team did not know if it was designing a Farrari, a Superbus or a Maglev.
For the critical path batch processes you must define the required throughput. What are you moving? Invoices, orders, customers, transactions. How many of them do you have and what is the peak volume? What is your scalability approach and how do you achieve increases in throughput? Do you need to scale-up or scale-out?
You need to design the batch process to fit comfortably in the batch window and with room to grow. This year 500,000 invoices, next year is 750,000 invoices. How do you process the increase and stay within the window?

Software performance engineering Sample Questions

1 What are you processing?
2 How many steps (programs) are in the batch process?
3 What is the overall throughput?
4 What is the throughput of the individual programs?
5 What is the growth rate?
6 What is the peaking factor?
7 How will the programs scale with the load?
8 Have you identified the critical path throughput goals?
9 Are all the developers aware of the throughput goals?
10 How will you test and validate the throughput of each program?

Know your design goals

The barchettas were made in the 1940’s and 1950’s. An Italian style topless 2-seater sports car, designed and build for one purpose, to go fast. It was built for racing, weight and wind resistance were kept to a minimum, any unnecessary equipment was removed, no decoration. Doors were optional. Ferrari created one of the earlier models and other followed with the same style. The design team was focused on speed, they knew there were performance and response time goals for the car.

Model test case execution

So, what is the difference between a Software Benchmark and a software performance test?
The key difference is the size of the bet the business has placed on the outcome of the project.

There are hardware benchmarks, database benchmarks, and of course the Transaction Processing Council (TPC), with a long list of standard benchmark tests for companies. The Software Benchmark I am referring to in this article it about the custom application benchmark; designed for your particular business or industry. The best way for a business to make a critical bet the company decision is to define, design and execute a Benchmark for its unique workload and transaction volumes.

Who uses a benchmark

The Software Benchmark is needed to determine if the business will use the new application or technology for business advantage. The workload must be well defined, the database must be at production sizes, and the system resource consumption must be clearly monitored. When the benchmark is executed and the question is answered; all the details, the facts, the database demographics, the workload, must be clearly understood by the business decision makers. The Benchmark team must get the test right, in the allocated timeline, if the benchmark takes too long, then you have your answer. The results of the Benchmark typically undergo tremendous scrutiny. A third party is often required to provide the needed visibility and help pass the scrutiny.
There are two categories of companies that undertake a benchmark;
1) Software vendors – the companies that make the software application
2) Consumers of the software – the companies that will use the application for business value

The software performance test is usually for a project or program already underway, the application and technology are already decided. Performance testing is used to make sure the Releases will meet the Service Level Expectations. The Workload for each test may focus on specific parts of the workload and skip others for a given Release cycle. The performance test plan may include;
1) Component testing
2) Duration testing
3) Stability testing
4) Failover and failback testing
5) Also, a round of tuning may be added

Benchmarks for software vendors

The business has decided to move the product in to the large client segment of the market. As such, they need to demonstrate that their application will be able to scale to what their market considers to be large, 10 Million accounts for instance. The application must perform well at this level. The key business transactions must still respond in less than two seconds. The overnight processing must still be able to be completed in the defined window, say 4 hours to complete the billing process.

The bet: Business and revenue growth in the large segment of the market. Repositioning of the company in the marketplace in relation to competitors.

The consumers

There are a couple of scenarios for this category. One is the business is already considered large in the marketplace, they already have 10 Million accounts or more. However the systems in place are older with restricted functionality that is not easily changed. The business needs to add new features to stay ahead of the competition. The second case is a growth plan, where the business believes to increase its market share, it must grow significantly. The business may currently serve 1 Million accounts, but now have a three year business plan calling for a growth plan to 50 Million accounts. They need a technology platform that can scale with them.
The bet for an already large business: Maintain the current business and add new features quickly on a new platform. Stay a market leader, if the new platform fails, you are no longer the market leader.

The bet for a growth business: Easily acquire new accounts and gain market share, or stumble.

Implementation approach

The software benchmark is an event for the business. It is highly visible to Sr. Management, if you are the software vendor, it might be highly visible to your sales pipeline. There may be significant deals waiting on the outcome.
There are typically three to four phases required. Even before that, the organization must review the resources required, people, time, equipment, and budget. The focus on a benchmark can distract already busy people. The developers of the code may not have the time or the skills to design and execute a formal Benchmark. The same for the QA team. A Benchmark may require the use of an external testing lab in order to get the proper configuration. The Benchmark project must be treated as a distinct project.

Critical areas

Business goals and objectives

Clearly state the purpose, to demonstrate the system can safely support the workload of 10 Million accounts. To demonstrate predictable scalability of the application as the workload increases.

Workload profile

In order to gain value from the benchmark, you must have an accurate workload. There will be several user profiles for the online component, with the detailed steps they take through the application. For instance, the casual users, the new users, the power users. There must be a representative batch component as well. As the online purchase transactions will drive the invoice creation process in the batch schedule. There could be a month end process and a quarter end process.

Database size and demographics

The data distribution must be well understood and clearly defined. If you have 10 Million accounts, some are active and some are closed. There may be a residential profile and business profile, with different numbers of details under each. For instance, an insurance policy can have one driver or four drivers, plus the cars. For a web shopping cart application, there can be four years of historical purchases. Your database needs to simulate this.

Performance testing process

The performance testing process must be completely visible and flawless. Generally, you do not have time to rerun a Benchmark. The testing scenarios, test execution, metrics and monitoring must be complete. Many tests may be executed in order to get you ready for the official set of benchmark tests. You need to have a very good test results archive system, because you may not have to completely evaluate the test results, until the end.

Results analysis and executive summary

How do you know you ran a successful test? All the detailed results must be compiled and summarized into an executive view. The virtual web transactions and the batch processing must have detailed results. The Virtual users will record the response time of each request. The you must provide the response time distributions, 50th, 75th, 95th. Plus you must include the transaction per second load. Under 10 TPS, the 95th percentile was 2.5 seconds.
The batch processes must include the rate of processing. The rate for the entire Process, there were 5 million invoices processed in three hours. Also the critical path programs must be clearly identified.

External Vendor lab

Often times, in order to hit the 10 Million account target, the Benchmark requires the use of an external lab. The Benchmark may require a large server or large number of servicer, the database server may need to be large and the data storage may need the newest vendor equipment. The marketing department may have signed a deal with the vendor to use their equipment for a reduced price to use the equipment and be part of a press release.

Archeology

Performance artifacts in development

Where are your requirements and development performance artifacts? Over the years of being a performance engineer, I have been involved in a number of projects related to performance and scalability readiness assessments. This involves evaluating the software, either from a vendor or developed in-house, to determine if it has been designed and developed with performance and scalability goals. During this readiness assessment project, myself and the team I work with, will look for non-functional requirements for the key business and system transactions, and development guidelines and artifacts that track or measure service time during the development and unit testing phase. Finding performance early.

Non-Functional requirements

To start, there are non-functional requirements that should have been defined for the development team. The team develops the code to make the business functions real. The next question is where does your Software development lifecycle and methodology (that’s right, I said methodology) have activities and artifacts specific to performance, scalability, and stability? For example, the application needs a change to the pricing calculation, or order history functions, how fast should it be? Where is it specified that it still needs to be 300 milliseconds after the functional change? Initially the non-functional requirements have specified that the pricing calculation must be completed in 300 milliseconds for average complexity and 600 millisecond for complex calculations. Can you point to the artifact(s) where that is defined in your methodology? Before the developer begins coding, is he or she aware of that?
Then we look for guidelines for developers and services provided by a framework. Has the Performance or Architecture team defined a set of guidelines for the developer to use when building this type of service? Is the use of caching been defined, who verifies the database access and SQL statements are optimal? Where is that captured, what artifacts captures this? Does each developer understand the proper use of logging and code instrumentation, or is it part of the development framework? For the case of the Pricing service, each method must measure service time (internal), and each exposed public service must have a service time measurement.

Continuous Integration

A key artifact to look for is the results from the Weekly or daily build process. Are there test results for the internal method calls and external service calls? Junit will support the internal verification and Jmeter can support the external verification. In order to get value from this, the testing database must be robust (not simply single rows with no history). But, how can you use the response results during development to indicate eventual production performance? The value comes from comparing build to build, for instance, did the service time change radically? This can be an early indicator. However, often times the development environment changes or the database changes. The Performance Engineer must show the business there is value by maintaining consistency in the development environment. With a consistent development environment you can show that the service time of the pricing service has significantly changed, well before production.

Key Performance artifact

For the Jmeter test case: For build 1, the Pricing service is measured at 1.000 second. The goal is 300 milliseconds. Or, what if the service time is 100 milliseconds? Then you need to track the service time from build to build to monitor for consistency. If the 100 milliseconds goes to 1.00 second, how did that happen? Did the environment change, did the developer add new code to the function? You must evaluate this, as you found it early.