Market Data

Calculating system momentum – A market basket of transactions and an index

Can we use momentum as a derived value or index to alert us to impending problems with the application or system? Well, the transaction response time is really a byproduct of the workload on the system resources. So, may be a better way to look at it is; does the workload have momentum? Is the workload increasing or decreasing? Borrowing from Physics, momentum is equal to mass times velocity.

We could use transaction complexity to represent mass, we all know that some transactions are heavier than others. However, using response time as velocity really does not work. Instead I could use the transaction arrival rate to represent velocity. Then I could say that the transaction or system momentum is increasing as the arrival rate increases, taking into account the weight of the transactions.

What I am looking for is a communication vehicle to let non-technical people know how the health of the system is.
Momentum is equal to the transaction weight times the arrival rate of the transactions.

I need to pick a rating or scale for my transactions; 1,5,10. Then there is an overall transaction arrival rate and an individual transaction arrival rate. I need the individual transactions in order for the momentum index to have a chance of being relevant.

M = (T1 * T1 TPS) + (T2 * T2 TPS) + (….) or index?

This would be a very custom index for each application. It represents a market basket of transactions. Much list an EFT represents a basket of stocks.
Also, what I want to determine is how quickly the momentum is changing up or down. If I can get the real-time transaction arrival rates, then I can use the momentum to get an early warning of trouble in the system. Another term, might be a volatility index for the application. Can I get the alert in the front-end of the application early and the correlate with all the system resource monitors.

For this I need to borrow from the Financial markets High Frequency Traders. They have tools and techniques that track large amounts of market date in real-time and try to jump in-front of the market momentum. In need to jump in-front of my system momentum.
The faster I can determine that the arrival rate of the heavy transactions is increasing, then I might be able to jump in-front of that and prevent an application or system outage. I need to calculate the rate of change in real time of the arrival rates. I need a to see that at a clock tick at time zero, the arrival rate is 10 TPS and the transaction response time is 300 ms. Then I need another sample at the next clock tick to calculate the TPS is now 11, and the response time is 305 ms. Perfect for using HFT techniques.

Model test case execution

So, what is the difference between a Software Benchmark and a software performance test?
The key difference is the size of the bet the business has placed on the outcome of the project.

There are hardware benchmarks, database benchmarks, and of course the Transaction Processing Council (TPC), with a long list of standard benchmark tests for companies. The Software Benchmark I am referring to in this article it about the custom application benchmark; designed for your particular business or industry. The best way for a business to make a critical bet the company decision is to define, design and execute a Benchmark for its unique workload and transaction volumes.

Who uses a benchmark

The Software Benchmark is needed to determine if the business will use the new application or technology for business advantage. The workload must be well defined, the database must be at production sizes, and the system resource consumption must be clearly monitored. When the benchmark is executed and the question is answered; all the details, the facts, the database demographics, the workload, must be clearly understood by the business decision makers. The Benchmark team must get the test right, in the allocated timeline, if the benchmark takes too long, then you have your answer. The results of the Benchmark typically undergo tremendous scrutiny. A third party is often required to provide the needed visibility and help pass the scrutiny.
There are two categories of companies that undertake a benchmark;
1) Software vendors – the companies that make the software application
2) Consumers of the software – the companies that will use the application for business value

The software performance test is usually for a project or program already underway, the application and technology are already decided. Performance testing is used to make sure the Releases will meet the Service Level Expectations. The Workload for each test may focus on specific parts of the workload and skip others for a given Release cycle. The performance test plan may include;
1) Component testing
2) Duration testing
3) Stability testing
4) Failover and failback testing
5) Also, a round of tuning may be added

Benchmarks for software vendors

The business has decided to move the product in to the large client segment of the market. As such, they need to demonstrate that their application will be able to scale to what their market considers to be large, 10 Million accounts for instance. The application must perform well at this level. The key business transactions must still respond in less than two seconds. The overnight processing must still be able to be completed in the defined window, say 4 hours to complete the billing process.

The bet: Business and revenue growth in the large segment of the market. Repositioning of the company in the marketplace in relation to competitors.

The consumers

There are a couple of scenarios for this category. One is the business is already considered large in the marketplace, they already have 10 Million accounts or more. However the systems in place are older with restricted functionality that is not easily changed. The business needs to add new features to stay ahead of the competition. The second case is a growth plan, where the business believes to increase its market share, it must grow significantly. The business may currently serve 1 Million accounts, but now have a three year business plan calling for a growth plan to 50 Million accounts. They need a technology platform that can scale with them.
The bet for an already large business: Maintain the current business and add new features quickly on a new platform. Stay a market leader, if the new platform fails, you are no longer the market leader.

The bet for a growth business: Easily acquire new accounts and gain market share, or stumble.

Implementation approach

The software benchmark is an event for the business. It is highly visible to Sr. Management, if you are the software vendor, it might be highly visible to your sales pipeline. There may be significant deals waiting on the outcome.
There are typically three to four phases required. Even before that, the organization must review the resources required, people, time, equipment, and budget. The focus on a benchmark can distract already busy people. The developers of the code may not have the time or the skills to design and execute a formal Benchmark. The same for the QA team. A Benchmark may require the use of an external testing lab in order to get the proper configuration. The Benchmark project must be treated as a distinct project.

Critical areas

Business goals and objectives

Clearly state the purpose, to demonstrate the system can safely support the workload of 10 Million accounts. To demonstrate predictable scalability of the application as the workload increases.

Workload profile

In order to gain value from the benchmark, you must have an accurate workload. There will be several user profiles for the online component, with the detailed steps they take through the application. For instance, the casual users, the new users, the power users. There must be a representative batch component as well. As the online purchase transactions will drive the invoice creation process in the batch schedule. There could be a month end process and a quarter end process.

Database size and demographics

The data distribution must be well understood and clearly defined. If you have 10 Million accounts, some are active and some are closed. There may be a residential profile and business profile, with different numbers of details under each. For instance, an insurance policy can have one driver or four drivers, plus the cars. For a web shopping cart application, there can be four years of historical purchases. Your database needs to simulate this.

Performance testing process

The performance testing process must be completely visible and flawless. Generally, you do not have time to rerun a Benchmark. The testing scenarios, test execution, metrics and monitoring must be complete. Many tests may be executed in order to get you ready for the official set of benchmark tests. You need to have a very good test results archive system, because you may not have to completely evaluate the test results, until the end.

Results analysis and executive summary

How do you know you ran a successful test? All the detailed results must be compiled and summarized into an executive view. The virtual web transactions and the batch processing must have detailed results. The Virtual users will record the response time of each request. The you must provide the response time distributions, 50th, 75th, 95th. Plus you must include the transaction per second load. Under 10 TPS, the 95th percentile was 2.5 seconds.
The batch processes must include the rate of processing. The rate for the entire Process, there were 5 million invoices processed in three hours. Also the critical path programs must be clearly identified.

External Vendor lab

Often times, in order to hit the 10 Million account target, the Benchmark requires the use of an external lab. The Benchmark may require a large server or large number of servicer, the database server may need to be large and the data storage may need the newest vendor equipment. The marketing department may have signed a deal with the vendor to use their equipment for a reduced price to use the equipment and be part of a press release.

Archeology

Performance artifacts in development

Where are your requirements and development performance artifacts? Over the years of being a performance engineer, I have been involved in a number of projects related to performance and scalability readiness assessments. This involves evaluating the software, either from a vendor or developed in-house, to determine if it has been designed and developed with performance and scalability goals. During this readiness assessment project, myself and the team I work with, will look for non-functional requirements for the key business and system transactions, and development guidelines and artifacts that track or measure service time during the development and unit testing phase. Finding performance early.

Non-Functional requirements

To start, there are non-functional requirements that should have been defined for the development team. The team develops the code to make the business functions real. The next question is where does your Software development lifecycle and methodology (that’s right, I said methodology) have activities and artifacts specific to performance, scalability, and stability? For example, the application needs a change to the pricing calculation, or order history functions, how fast should it be? Where is it specified that it still needs to be 300 milliseconds after the functional change? Initially the non-functional requirements have specified that the pricing calculation must be completed in 300 milliseconds for average complexity and 600 millisecond for complex calculations. Can you point to the artifact(s) where that is defined in your methodology? Before the developer begins coding, is he or she aware of that?
Then we look for guidelines for developers and services provided by a framework. Has the Performance or Architecture team defined a set of guidelines for the developer to use when building this type of service? Is the use of caching been defined, who verifies the database access and SQL statements are optimal? Where is that captured, what artifacts captures this? Does each developer understand the proper use of logging and code instrumentation, or is it part of the development framework? For the case of the Pricing service, each method must measure service time (internal), and each exposed public service must have a service time measurement.

Continuous Integration

A key artifact to look for is the results from the Weekly or daily build process. Are there test results for the internal method calls and external service calls? Junit will support the internal verification and Jmeter can support the external verification. In order to get value from this, the testing database must be robust (not simply single rows with no history). But, how can you use the response results during development to indicate eventual production performance? The value comes from comparing build to build, for instance, did the service time change radically? This can be an early indicator. However, often times the development environment changes or the database changes. The Performance Engineer must show the business there is value by maintaining consistency in the development environment. With a consistent development environment you can show that the service time of the pricing service has significantly changed, well before production.

Key Performance artifact

For the Jmeter test case: For build 1, the Pricing service is measured at 1.000 second. The goal is 300 milliseconds. Or, what if the service time is 100 milliseconds? Then you need to track the service time from build to build to monitor for consistency. If the 100 milliseconds goes to 1.00 second, how did that happen? Did the environment change, did the developer add new code to the function? You must evaluate this, as you found it early.

Odometer

Dateline Monday April 1st.

Big Data and Software performance engineering combine to help eliminate the Government debt, with the ability to collect real-time speed information from every car on the high-way and instanet performance optimized analytics.

Speed Limits

In an effort to continue to reduce the national and local government debt, the Car makers and the Secret lab from the government have created the ability to enable every car to instantly broadcast its current speed and plate number, encrypted of course, only readable by those who need it.  This allows the state and Federal government to instantly collect new revenue.  The wireless broadcast of the automobile speed and plate number will be good within a ½ mile range of the car, where the remote collectors can assess the fines as they happen.

There are a few options for the program; before each driver is charged, they will be given the option of speeding for the day or just this instance.  This allows the driver to purchase in bulk and at a reduced rate for the day.  If you have urgent meetings during the day, this will allow you to continue to speed and will not be pulled over the rest of the day.  This is a new and revolutionary take on the Speed Pass.  They still have to work through refunds when the driver cannot exceed the speed limit.  Perhaps, roll it forward.  People can enable payment through iTunes, Paypal or credit cards.

Upcoming Releases

It is rumored that the next Release of the program will introduce a step program, increments of 5 MPH over the posted limit to create a new premium tier option.  Also, the government is looking at maximizing its revenue by not allowing the Insurance companies to hit the driver with a surcharge for speeding. One anonymous insurance source is quoted as saying “What the hell?, we’ll show them whose boss.”

The implications for Big Data and performance engineering are tremendous.  This type of service offering by the Federal and state governments was only possible by the breakthrough advances by the Big Data and Software performance industries. A long time data and performance engineer, Chris says “This is an outstanding combintation of the two domains and wants to know if its Shovel Ready”,

Airlines get in the act

In a related story on remote monitoring, payment and data collection; a Major Airline is looking at adding remote sensors in every seat to record the weight of each passenger.  Then after deep analytics, the system will charge people more that are over the average weight for an adult.  They are still in Beta, working through how to identify males and females based on name, when to take the reading and what if people change seats.

SWSlowPerf

Some of the top reasons that your application/web site/mobile app is slow;

10) I thought you turned off the diagnostic logging

9) Do you really have to index a table with five years of history and one Billion rows

8) You doubled the number of calls from the application tier to the database tier for the same workload and were surprised by the increase, which no one noticed until production.

7) They moved the application server to another continent

6) They virtualized it (you weren’t using all the real CPU anyway)

5) You wrote your own caching component and didn’t really understand the impact of flushing the cache

4) Even Amazon Web Services stops allocating Servers (thought you could buy your way out)

3) The Marketing group ran a hugely successful ad on a major TV program and you under estimated the new workload. The good news is you and the CIO are on a first name basis.

2) They upgraded to a new version of the Application server/database server/etc. and no one thought a performance test might be needed.

1) The business critical Applications don’t have performance goals, how do you know its slow?

Also, I like “Is there really a difference between polling and event driven programming?”

 

Workflow

Business workflow, business process

The System response time must not impact the workflow. The transition from transaction to transaction must be seamless and the user must not notice the system.  One might even describe the interaction between the person and the system as graceful and flowing, where the system responds before they can even finish a sip of their coffee, do your users cozy up to the system (too far??).

Understanding each workflow in the application is crucial to setting the proper response time goals of the application. This is required to set up the software performance requirements for the system and for each transition that supports the workflow. The systems today are highly distributed with web servers, application servers and web services, and message hubs and multiple database, etc.  In the software requirements phase, once the workflows are defined with performance goals, it is critical to make everyone who makes a component in the workflow aware of those goals.

There are call center workflows, document management workflows, order placement workflow, business intelligence and analytical workflow, and of course Big Data workflow.

When in the software requirements phase you might consider this checklist for the workflow;

1)      Identify the workflows: Have the key workflows been defined that have performance requirements and are the response time goals defined?

2)      Duration: have you defined the overall duration of the workflow? How long should the call center interaction by?

3)      Downstream processing: Have you defined when the data from the workflow must be available for the downstream workflows?  For instance, after collecting a customers demographic information and vehicle information, when is it available for rating a policy? 30 seconds, 24 hours?

4)      Business transactions: These support the workflow. Have performance critical business transaction been defined, with response time goals?

5)      System Transactions: These support the business transactions. Have you defined the response time goals for critical systems transactions supporting the critical business transactions?  This is where share system transaction can be found, have your requirements captured enough performance information to tell the developer how fast this system transaction must be and how many transactions per second it will support?

6)      Performance budget: Now that you have a business transaction response time goal, have you allocated the response among all the technical components supporting the business transaction? You should create a system interaction diagram to help with this, defining the time allocated across the tiers; client, web, application, message hub, database.

7)      Database query requests: Have you categorized your database queries? Simple transactions to the complex? Is there a response time goal for each? Is there difference between the first request and subsequent request?

8)      Report requests: Have you categorized the report request types?  Simple reports are 2 seconds, complex multi table grouping, ordering reports take longer that cross fiscal quarters?

9)      Discussion and negotiation with the end-user or business sponsor.  All along you must be in discussion with business people who own the system. The role of the architect is to work with the business to tell them what is possible and how much it will cost. The business priorities are critical.  The business might want to spend the extra money to have near real time reporting to gain advantage or they might be satisfied with a four hour reporting window.

 How to handle the response time discussion

Categorize: You should look to categorize the response time into; satisfied, tolerating, frustrated and abandonment. Two seconds could keep the people satisfied, while eight seconds will make them frustrated for an online transaction.  Another transaction at five seconds could keep people satisfied and 15 seconds make them frustrated.

Percentiles: You need to establish a goal for what percentage of the user population would be satisfied, 50th, 80th, 90th percentile? 90 percent of the people should have a satisfied experience.

Under what load: You need to discuss with the business people that there is a normal workload, a peak workload, an above peak workload and define target for each.  This business might ok with a relaxed target where the people are tolerating or frustrated for a period of time during a peak load for a short duration.

How fast, how many users, how many transactions. Zoom, Zoom.

Software performance requirements is about setting the performance and scalability context for getting the design and development right for your web application, your web service, your messaging hub, your reporting system, your mobile device. I am creating a few checklists;

  1. Project risk profile: Is performance important for this project and what works against performance in the project?
  2. Business workflows: What is the duration of a workflow, how many types of workflow and what is the peak workflow?
  3. Application business volumes and growth: The application automates the workflow with transactions and how will the volumes grow?
  4. Non-user interaction processes (batch, messages): This is about component throughput, how many orders per second?
  5. Communication to down-stream SDLC processes and phases: Setting the stage for design and development.

Here is a checklist to help set the context for your team;

Project Risk profile: These are the overarching performance and capacity considerations that the entire team must be aware of. This would communicate to the team that the business just acquired 2,000 new stores and this new application must now process twice as many users and orders in the same time. Or a new government regulation will be in effect that requires every retail brokerage order to have additional review that must not slow down the order processing system.

  1. Performance risk: Has performance or scalability been defined as a risk? Have you asked the business this question, or the production operations team?
  2. Extreme response times: Are there key business transactions with extreme response time requirements? User response under one second, a component that must process 500,000 transactions per second. If you have one second to respond to a user request, you better make sure the developers know this.
  3. Batch windows: Is there a very strict batch processing window and is the current system neat the end of the window? Yes, there are still large batch systems, that process tremendous volumes of data, for instance Mutual fund calculations.
  4. Third party: Does the system depend on third party software to complete the workflow? This could be software you buy and install as part of your application, or you could be using a Web Service.
  5. Third party SLA: Do the third party’s provide enforceable Service Level Agreements? Are you using a SaaS vendor and do you specify the response time?
  6. Peak Workload: Have the key business transactions been evaluated to determine average and peak workloads? And is there process in place to review these?
  7. Calculation: What are the key calculations and how is their response time influenced by the type of calculation (some do more work than others). Are there key pricing calculations, or rate quoting engines, or preference calculations, inventory allocation, etc. I have seen a few pricing calculations get caught-up in the volume discount calculation, and go from 100 milliseconds to 1 second.
  8. System Peak: What are the attributes that drive the peak load of the application? Is it seasonal, advertising driven, back to school, weather related (insurance claims), and do you model the peak? How many developers are not aware of the peak?
  9. Regulatory requirements: Is there an auditing component, reporting timeline, are there large volumes of data required to capture and provide to an agency?

Your projects must have a performance risk profile defined. There may be no risk, or there may be significant risk.

The next post will be about the business workflows.

Follow

Get every new post delivered to your Inbox.