Archive

Monthly Archives: March 2013

SWSlowPerf

Some of the top reasons that your application/web site/mobile app is slow;

10) I thought you turned off the diagnostic logging

9) Do you really have to index a table with five years of history and one Billion rows

8) You doubled the number of calls from the application tier to the database tier for the same workload and were surprised by the increase, which no one noticed until production.

7) They moved the application server to another continent

6) They virtualized it (you weren’t using all the real CPU anyway)

5) You wrote your own caching component and didn’t really understand the impact of flushing the cache

4) Even Amazon Web Services stops allocating Servers (thought you could buy your way out)

3) The Marketing group ran a hugely successful ad on a major TV program and you under estimated the new workload. The good news is you and the CIO are on a first name basis.

2) They upgraded to a new version of the Application server/database server/etc. and no one thought a performance test might be needed.

1) The business critical Applications don’t have performance goals, how do you know its slow?

Also, I like “Is there really a difference between polling and event driven programming?”

 

Workflow

Business workflow, business process

The System response time must not impact the workflow. The transition from transaction to transaction must be seamless and the user must not notice the system.  One might even describe the interaction between the person and the system as graceful and flowing, where the system responds before they can even finish a sip of their coffee, do your users cozy up to the system (too far??).

Understanding each workflow in the application is crucial to setting the proper response time goals of the application. This is required to set up the software performance requirements for the system and for each transition that supports the workflow. The systems today are highly distributed with web servers, application servers and web services, and message hubs and multiple database, etc.  In the software requirements phase, once the workflows are defined with performance goals, it is critical to make everyone who makes a component in the workflow aware of those goals.

There are call center workflows, document management workflows, order placement workflow, business intelligence and analytical workflow, and of course Big Data workflow.

When in the software requirements phase you might consider this checklist for the workflow;

1)      Identify the workflows: Have the key workflows been defined that have performance requirements and are the response time goals defined?

2)      Duration: have you defined the overall duration of the workflow? How long should the call center interaction by?

3)      Downstream processing: Have you defined when the data from the workflow must be available for the downstream workflows?  For instance, after collecting a customers demographic information and vehicle information, when is it available for rating a policy? 30 seconds, 24 hours?

4)      Business transactions: These support the workflow. Have performance critical business transaction been defined, with response time goals?

5)      System Transactions: These support the business transactions. Have you defined the response time goals for critical systems transactions supporting the critical business transactions?  This is where share system transaction can be found, have your requirements captured enough performance information to tell the developer how fast this system transaction must be and how many transactions per second it will support?

6)      Performance budget: Now that you have a business transaction response time goal, have you allocated the response among all the technical components supporting the business transaction? You should create a system interaction diagram to help with this, defining the time allocated across the tiers; client, web, application, message hub, database.

7)      Database query requests: Have you categorized your database queries? Simple transactions to the complex? Is there a response time goal for each? Is there difference between the first request and subsequent request?

8)      Report requests: Have you categorized the report request types?  Simple reports are 2 seconds, complex multi table grouping, ordering reports take longer that cross fiscal quarters?

9)      Discussion and negotiation with the end-user or business sponsor.  All along you must be in discussion with business people who own the system. The role of the architect is to work with the business to tell them what is possible and how much it will cost. The business priorities are critical.  The business might want to spend the extra money to have near real time reporting to gain advantage or they might be satisfied with a four hour reporting window.

 How to handle the response time discussion

Categorize: You should look to categorize the response time into; satisfied, tolerating, frustrated and abandonment. Two seconds could keep the people satisfied, while eight seconds will make them frustrated for an online transaction.  Another transaction at five seconds could keep people satisfied and 15 seconds make them frustrated.

Percentiles: You need to establish a goal for what percentage of the user population would be satisfied, 50th, 80th, 90th percentile? 90 percent of the people should have a satisfied experience.

Under what load: You need to discuss with the business people that there is a normal workload, a peak workload, an above peak workload and define target for each.  This business might ok with a relaxed target where the people are tolerating or frustrated for a period of time during a peak load for a short duration.

How fast, how many users, how many transactions. Zoom, Zoom.

Software performance requirements is about setting the performance and scalability context for getting the design and development right for your web application, your web service, your messaging hub, your reporting system, your mobile device. I am creating a few checklists;

  1. Project risk profile: Is performance important for this project and what works against performance in the project?
  2. Business workflows: What is the duration of a workflow, how many types of workflow and what is the peak workflow?
  3. Application business volumes and growth: The application automates the workflow with transactions and how will the volumes grow?
  4. Non-user interaction processes (batch, messages): This is about component throughput, how many orders per second?
  5. Communication to down-stream SDLC processes and phases: Setting the stage for design and development.

Here is a checklist to help set the context for your team;

Project Risk profile: These are the overarching performance and capacity considerations that the entire team must be aware of. This would communicate to the team that the business just acquired 2,000 new stores and this new application must now process twice as many users and orders in the same time. Or a new government regulation will be in effect that requires every retail brokerage order to have additional review that must not slow down the order processing system.

  1. Performance risk: Has performance or scalability been defined as a risk? Have you asked the business this question, or the production operations team?
  2. Extreme response times: Are there key business transactions with extreme response time requirements? User response under one second, a component that must process 500,000 transactions per second. If you have one second to respond to a user request, you better make sure the developers know this.
  3. Batch windows: Is there a very strict batch processing window and is the current system neat the end of the window? Yes, there are still large batch systems, that process tremendous volumes of data, for instance Mutual fund calculations.
  4. Third party: Does the system depend on third party software to complete the workflow? This could be software you buy and install as part of your application, or you could be using a Web Service.
  5. Third party SLA: Do the third party’s provide enforceable Service Level Agreements? Are you using a SaaS vendor and do you specify the response time?
  6. Peak Workload: Have the key business transactions been evaluated to determine average and peak workloads? And is there process in place to review these?
  7. Calculation: What are the key calculations and how is their response time influenced by the type of calculation (some do more work than others). Are there key pricing calculations, or rate quoting engines, or preference calculations, inventory allocation, etc. I have seen a few pricing calculations get caught-up in the volume discount calculation, and go from 100 milliseconds to 1 second.
  8. System Peak: What are the attributes that drive the peak load of the application? Is it seasonal, advertising driven, back to school, weather related (insurance claims), and do you model the peak? How many developers are not aware of the peak?
  9. Regulatory requirements: Is there an auditing component, reporting timeline, are there large volumes of data required to capture and provide to an agency?

Your projects must have a performance risk profile defined. There may be no risk, or there may be significant risk.

The next post will be about the business workflows.

Where is the SPE team?

Where is the SPE team?

Where does a Software Performance Engineering team fit into the Enterprise?

Shared services and business units

Many Performance engineering teams end up somewhere, rather than placed somewhere. There tends not to be much in the way of preplanning or discussion with the Sr. IT management team. In my experience teams have been in shared services organizations such as;

  1. Production/Operations,
  2. Quality Assurance,
  3. Development,
  4. Architecture,
  5. Testing Center,

where they have to support many different applications and provide difference services across the Enterprise.  Often times, the business units do not know how to engage with a shared services team and the PE team does not know how to engage the business.

When placed in a Business Unit, this typically means the business is aware the value that a PE team brings. The team has a more Sr. leader who is technical, part project manager, and can interact with the business to ensure the team is aligned.  The PE team will also be involved in more technical and product evaluations.  Each organization influences the goals and purpose of the PE team.  There must be an Enterprise Performance engineer path.

Care and feeding of the core PE team

The high value PE team usually has multi-disciplined people, people with a wider technical skillset that most other team within IT.  A low value performance testing team typically has a narrow set of skillsets, as their mission is to execute performance tests, where they do not understand the system under test very well. A critical part of managing a PE and a PT team is to make sure there is a well defined career path that helps shape the more junior person and helps to extend the technical and non-technical skillset of the more senior people. They should also growth their understanding of the business.

Enterprise architecture

The EA team is involved in many of the Enterprises large scale projects and they usually have a handle on the risks of the large projects. The EA team is involved early in the SDLC, they are technical team that understands the business. When a PE team is placed here, it can have Enterprise wide visibility into the key projects, this allows the team manager to insert the team into the high risk projects and to see the planned projects.

The Director of PE must be a direct report to the leader of the EA team. The goals of the typical EA team are very much in alignment with the goals of the PE team, the EA mission is to translate business requirements to a technical solution for business value. The PE mission is to help manage risk; the risk the applications will not meet the user experience, the risk the applications will not achieve the business goals of scalability, and reliability.

The risk here for the PE organization is the team does not stay involved for the development and testing process.  The PE career path here can be an issue, as the way to promotion is to be an EA.  The EA are involved early and once the design is approved, they are not involved in the detailed development.  The PE team must be involved in the full SDLC.

Quality assurance

Most QA organizations are focused on the functional requirements of the application. They are involved later in the SDLC, however, there is a trend to involve them in the requirements phase to define the tests cases at that time. When a PE team is placed here, it tends to be very testing focused. The project development teams involve the PE team when getting ready to run performance tests. Not during the design and development phases.

The QA team is very focused on the business needs, however they tend not to be technical.  There very few software developers in the QA team.  A high value performance engineering team is technical and business aware. This lack of technical focus within the group will impact the PE team and the PE career path.  The QA career path is different from the PE path. QA can be a manual process in many organization and lack the drive to automate testing.  The PE team needs automation to be successful, as technology changes.

There is a risk here is not enough focus on the technical skillset, too much focus on testing and not enough on design and development, where does the career path lie?

Operations

The goal of Operations is to keep the production applications running smoothly and performing well, where all is well and the customers are happy. The Ops team is the end of the line for the application and they often feel the pain of poorly performing applications, where they have to make it perform.  Often times the development teams do not make the application will performance well, unfortunaltey, response time, scalability, utilization of system resources is the problem of Ops.  They were often not involved at all during design and development.  This is a technical team, often times with a more limited understanding of the business needs.

The operations team put in place monitoring and measurement processes and tools for the applications. The performance team placed in this group usually has a capacity planning and troubleshooting focus. This is because when the application is moved into production, historically it may not have performed well, and the Ops team had to fix it.  They monitor the resource consumption of the production system, they understand the workload of the applications and can determine when it changes.

The performance team in this group will be involved in predicting the expected capacity of the new or modified applications, they evaluate the workload, review performance test results and determine  if additional computing resources are required.  They will have access to the production workload and gain a good understanding on how the application is used.  This information is critical to designing a high value performance test plan.

The risk for the PE team in this group is a reactive focus, and late involvement in the SDLC where they are not aware early enough on the changes to the applications. They may not be involved in the performance testing of the application and have limited access to the results of the testing.  The career path is at risk here as well.

Business unit

Large business units set their priorities and can have large critical development programs that span years.  The IT and Business leadership team have bought into the value that PE brings and how it mitigates risks for the business. These large programs will also have many different Releases, each with added business functionality and complexity. The development team is also part of the business unit.  The business unit may set up the PE team as a shared resource within the BU, to be leveraged across the critical business applications.

When a PE team is placed within the business unit it is viewed as a critical success factor in each major release. The development process in this situation often has PE activities embedded in the SDLC;  non-functional requirements are defined, design reviews, code reviews and unit testing for performance, performance testing, and production monitoring and measuring.  This is a more integrated team. The PE team is very in tune with the business goals and priorities. The PE team leads the PT team as well, there are well defined performance test scenarios and the expected outcome is well known.

The risk in this cases is minimal, however the career path will be maintained outside the program.  There will still be the need for a Enterprise Performance Engineering leader who can manage training and career paths.

Scattered Performance team

In this case, a performance engineering team does not exist. There are key people scattered across the organization who can take on the role of PE when needed.  They can be in all the organizations mentioned previously.  Often times they are brought together (summoned) when there is a critical performance problem.   This structure may be good enough for the business, there may be few and far between production issues related to performance and scalability.

A performance testing center.

The business has pushed for a low cost performance testing team that is typically in a different geography or off-shore. The development team or the QA team will define the test scenarios and test cases to be executed. In this case, you have people who are not performance engineers defining performance test cases for a testing team that does not have a performance engineer. The remote performance testing team often has very little understanding of the application under test. The PT team produces basic performance reports from the testing tool.

In some cases, the PT team cannot execute a test due to a technical issue that they cannot solve. They end up waiting for the next day for the development team to solve it. The value in a PT comes from the ability to understand the application, overcome many technical issues and provide insightful test results. The testing center model requires more involvement from the development team, in some cases so much involvement, the development ends up running the test.