Archive

Software Requirements

Architecture and technology decision making.

Making decisions

Making decisions

Lets hide behind the chain saws

In the spirit of Halloween, Chain saws.    What a great commercial from Geico to highlight the subject of poor decision making; when you’re in a horror movie, you make poor decisions.  There is a running car that is ready to go, but they decide not to take it. How many of us realize that after the architecture has been defined and the development starts, that maybe, in hind sight,  we choose to hide behind the chain saws. Another familiar scenario is after your business selected a software product that met the functional requirements,  however it wasn’t even close meeting the performance and scalability requirements. The product was selected, and then later in the SDLC it was very apparent that the software product was not going to meet the performance goals without a significant amount of rework from the vendor and three times the original capacity plan, from the vendor.

What is the decision making process? How do we make the right or wrong decision? The more difficult it is to undo the decision, the more information we need. To help make the choices, we have our personal experience, there are best practices, we have mentors and peers we consult, you have situational information and often times not enough information.  How much information do you need to be comfortable to commit?  One quote comes to mind, from General George Patton, “Perfect is the enemy of the good”.  His approach being the most important thing you can do is to make the decision quickly and move on, then adjust when new information becomes available.  Though, some decisions are tough to undo. Like committing to a new software product without the proper information.

When selecting a product that will be part of the key business function, you should make sure these items are part of your decision making process;

Understand the risks from the User population.

    1. Who is using the solution?
    2. Review the user population, the business growth plans, and volume peaking factors
  1. Understand the associated risk with type of application.
    1. What type of Application is supporting the business function? Is is a messaging application, a reporting and analysis, etc.
    2. For instance, messaging, ERP module(s), reporting and analysis, Business or consumer portal.
  2. Understand the associated risk with the technology the application or solution is dependent upon
    1. Is this a new technology platform or solution for the Enterprise?
    2. Has the technology been demonstrated to work at the expected load?
    3. Is there a critical technology required from a third party?
    4. Will part or all of the solution be hosted externally?
  3. Understand the risks associated with the Application release strategy
    1. Expected frequency of releases
    2. Degree of change per Release
    3. Added business units at each Release increasing volume
    4. Will the application be on a brand new release of the product
  4. Understand the risks associated with the entire Solution Architecture (logical and Physical layers)
    1. Is there a critical new component required for application
    2. What is the configuration testing environment
    3. What reference can the vendor introduce you to, are they using the version and features you will be using?
    4. What is the team depth of the vendor
    5. Who will be making the changes for you, the consulting organization or the development organization?
  5. Understand the risks associated with the team organization and structure
    1. Has the Solution architect done this before
    2. How distributed is the team?
    3. Does the team understand the development methodology
    4. Is the technology new and requires a scarce skillset?

Hopefully this will help you get to the running car and not the chain saws.

 

Advertisements

Frustrated User

Bring Your Own Response Time.

The consumers’ expectations have greatly influenced the demands and expectations on Enterprise IT departments. The consumer and IT customer brought their own devices, expected more self service and at a much faster pace. One of the key tasks that a Performance Engineer must do is to help the business and IT set expectations on the response times of corporate systems. The history of performance requirements for corporate facing systems and even call centers has been problematic. Often times ignored and certainly deferred. The typical approach is to see just how slow the system can be before the users completely revolt. This tends to be the case because its not a revenue generating system, however, many of the Corporate IT systems directly touch the customer or business partner after the sale or contract is signed.

Response time or performance goals for Internet retailers is well defined and measured, there are many Industry specific benchmarks that compare the response times of web pages against competitors in the industry. The Internet business models demand faster and faster response times for transactions. Benchmarks can be found at Compuware (www.compuware.com), Keynote (www.keynote.com), among others. However, there is not a benchmark for Corporate systems. The users of Corporate systems are starting to voice their concerns and displeasure more loudly.  They are expecting speeds comparable to Internet Retailer speeds. Their expectations are for less than five seconds and often two seconds for simple transactions.

Our studies have shown and are in alignment with the research done by Jakob Nielson (www.nngroup.com) on usability, A guide to setting user expectations must consider three barriers;

1) 0.100 Seconds: The user perceives the system to respond in real time with out any noticeable delay

2) 1.0 Seconds: the User starts to perceive a slight delay with the system, but us very happy with response time

3) 10.0 Seconds: the user will greatly notice the delay and start to be distracted and attempt to do other things while waiting

So, just as the consumer has brought their own devices, they are bringing their own Response times to Corporate systems.

 

Archeology

Performance artifacts in development

Where are your requirements and development performance artifacts? Over the years of being a performance engineer, I have been involved in a number of projects related to performance and scalability readiness assessments. This involves evaluating the software, either from a vendor or developed in-house, to determine if it has been designed and developed with performance and scalability goals. During this readiness assessment project, myself and the team I work with, will look for non-functional requirements for the key business and system transactions, and development guidelines and artifacts that track or measure service time during the development and unit testing phase. Finding performance early.

Non-Functional requirements

To start, there are non-functional requirements that should have been defined for the development team. The team develops the code to make the business functions real. The next question is where does your Software development lifecycle and methodology (that’s right, I said methodology) have activities and artifacts specific to performance, scalability, and stability? For example, the application needs a change to the pricing calculation, or order history functions, how fast should it be? Where is it specified that it still needs to be 300 milliseconds after the functional change? Initially the non-functional requirements have specified that the pricing calculation must be completed in 300 milliseconds for average complexity and 600 millisecond for complex calculations. Can you point to the artifact(s) where that is defined in your methodology? Before the developer begins coding, is he or she aware of that?
Then we look for guidelines for developers and services provided by a framework. Has the Performance or Architecture team defined a set of guidelines for the developer to use when building this type of service? Is the use of caching been defined, who verifies the database access and SQL statements are optimal? Where is that captured, what artifacts captures this? Does each developer understand the proper use of logging and code instrumentation, or is it part of the development framework? For the case of the Pricing service, each method must measure service time (internal), and each exposed public service must have a service time measurement.

Continuous Integration

A key artifact to look for is the results from the Weekly or daily build process. Are there test results for the internal method calls and external service calls? Junit will support the internal verification and Jmeter can support the external verification. In order to get value from this, the testing database must be robust (not simply single rows with no history). But, how can you use the response results during development to indicate eventual production performance? The value comes from comparing build to build, for instance, did the service time change radically? This can be an early indicator. However, often times the development environment changes or the database changes. The Performance Engineer must show the business there is value by maintaining consistency in the development environment. With a consistent development environment you can show that the service time of the pricing service has significantly changed, well before production.

Key Performance artifact

For the Jmeter test case: For build 1, the Pricing service is measured at 1.000 second. The goal is 300 milliseconds. Or, what if the service time is 100 milliseconds? Then you need to track the service time from build to build to monitor for consistency. If the 100 milliseconds goes to 1.00 second, how did that happen? Did the environment change, did the developer add new code to the function? You must evaluate this, as you found it early.

Workflow

Business workflow, business process

The System response time must not impact the workflow. The transition from transaction to transaction must be seamless and the user must not notice the system.  One might even describe the interaction between the person and the system as graceful and flowing, where the system responds before they can even finish a sip of their coffee, do your users cozy up to the system (too far??).

Understanding each workflow in the application is crucial to setting the proper response time goals of the application. This is required to set up the software performance requirements for the system and for each transition that supports the workflow. The systems today are highly distributed with web servers, application servers and web services, and message hubs and multiple database, etc.  In the software requirements phase, once the workflows are defined with performance goals, it is critical to make everyone who makes a component in the workflow aware of those goals.

There are call center workflows, document management workflows, order placement workflow, business intelligence and analytical workflow, and of course Big Data workflow.

When in the software requirements phase you might consider this checklist for the workflow;

1)      Identify the workflows: Have the key workflows been defined that have performance requirements and are the response time goals defined?

2)      Duration: have you defined the overall duration of the workflow? How long should the call center interaction by?

3)      Downstream processing: Have you defined when the data from the workflow must be available for the downstream workflows?  For instance, after collecting a customers demographic information and vehicle information, when is it available for rating a policy? 30 seconds, 24 hours?

4)      Business transactions: These support the workflow. Have performance critical business transaction been defined, with response time goals?

5)      System Transactions: These support the business transactions. Have you defined the response time goals for critical systems transactions supporting the critical business transactions?  This is where share system transaction can be found, have your requirements captured enough performance information to tell the developer how fast this system transaction must be and how many transactions per second it will support?

6)      Performance budget: Now that you have a business transaction response time goal, have you allocated the response among all the technical components supporting the business transaction? You should create a system interaction diagram to help with this, defining the time allocated across the tiers; client, web, application, message hub, database.

7)      Database query requests: Have you categorized your database queries? Simple transactions to the complex? Is there a response time goal for each? Is there difference between the first request and subsequent request?

8)      Report requests: Have you categorized the report request types?  Simple reports are 2 seconds, complex multi table grouping, ordering reports take longer that cross fiscal quarters?

9)      Discussion and negotiation with the end-user or business sponsor.  All along you must be in discussion with business people who own the system. The role of the architect is to work with the business to tell them what is possible and how much it will cost. The business priorities are critical.  The business might want to spend the extra money to have near real time reporting to gain advantage or they might be satisfied with a four hour reporting window.

 How to handle the response time discussion

Categorize: You should look to categorize the response time into; satisfied, tolerating, frustrated and abandonment. Two seconds could keep the people satisfied, while eight seconds will make them frustrated for an online transaction.  Another transaction at five seconds could keep people satisfied and 15 seconds make them frustrated.

Percentiles: You need to establish a goal for what percentage of the user population would be satisfied, 50th, 80th, 90th percentile? 90 percent of the people should have a satisfied experience.

Under what load: You need to discuss with the business people that there is a normal workload, a peak workload, an above peak workload and define target for each.  This business might ok with a relaxed target where the people are tolerating or frustrated for a period of time during a peak load for a short duration.

How fast, how many users, how many transactions. Zoom, Zoom.

Software performance requirements is about setting the performance and scalability context for getting the design and development right for your web application, your web service, your messaging hub, your reporting system, your mobile device. I am creating a few checklists;

  1. Project risk profile: Is performance important for this project and what works against performance in the project?
  2. Business workflows: What is the duration of a workflow, how many types of workflow and what is the peak workflow?
  3. Application business volumes and growth: The application automates the workflow with transactions and how will the volumes grow?
  4. Non-user interaction processes (batch, messages): This is about component throughput, how many orders per second?
  5. Communication to down-stream SDLC processes and phases: Setting the stage for design and development.

Here is a checklist to help set the context for your team;

Project Risk profile: These are the overarching performance and capacity considerations that the entire team must be aware of. This would communicate to the team that the business just acquired 2,000 new stores and this new application must now process twice as many users and orders in the same time. Or a new government regulation will be in effect that requires every retail brokerage order to have additional review that must not slow down the order processing system.

  1. Performance risk: Has performance or scalability been defined as a risk? Have you asked the business this question, or the production operations team?
  2. Extreme response times: Are there key business transactions with extreme response time requirements? User response under one second, a component that must process 500,000 transactions per second. If you have one second to respond to a user request, you better make sure the developers know this.
  3. Batch windows: Is there a very strict batch processing window and is the current system neat the end of the window? Yes, there are still large batch systems, that process tremendous volumes of data, for instance Mutual fund calculations.
  4. Third party: Does the system depend on third party software to complete the workflow? This could be software you buy and install as part of your application, or you could be using a Web Service.
  5. Third party SLA: Do the third party’s provide enforceable Service Level Agreements? Are you using a SaaS vendor and do you specify the response time?
  6. Peak Workload: Have the key business transactions been evaluated to determine average and peak workloads? And is there process in place to review these?
  7. Calculation: What are the key calculations and how is their response time influenced by the type of calculation (some do more work than others). Are there key pricing calculations, or rate quoting engines, or preference calculations, inventory allocation, etc. I have seen a few pricing calculations get caught-up in the volume discount calculation, and go from 100 milliseconds to 1 second.
  8. System Peak: What are the attributes that drive the peak load of the application? Is it seasonal, advertising driven, back to school, weather related (insurance claims), and do you model the peak? How many developers are not aware of the peak?
  9. Regulatory requirements: Is there an auditing component, reporting timeline, are there large volumes of data required to capture and provide to an agency?

Your projects must have a performance risk profile defined. There may be no risk, or there may be significant risk.

The next post will be about the business workflows.

Where is the SPE team?

Where is the SPE team?

Where does a Software Performance Engineering team fit into the Enterprise?

Shared services and business units

Many Performance engineering teams end up somewhere, rather than placed somewhere. There tends not to be much in the way of preplanning or discussion with the Sr. IT management team. In my experience teams have been in shared services organizations such as;

  1. Production/Operations,
  2. Quality Assurance,
  3. Development,
  4. Architecture,
  5. Testing Center,

where they have to support many different applications and provide difference services across the Enterprise.  Often times, the business units do not know how to engage with a shared services team and the PE team does not know how to engage the business.

When placed in a Business Unit, this typically means the business is aware the value that a PE team brings. The team has a more Sr. leader who is technical, part project manager, and can interact with the business to ensure the team is aligned.  The PE team will also be involved in more technical and product evaluations.  Each organization influences the goals and purpose of the PE team.  There must be an Enterprise Performance engineer path.

Care and feeding of the core PE team

The high value PE team usually has multi-disciplined people, people with a wider technical skillset that most other team within IT.  A low value performance testing team typically has a narrow set of skillsets, as their mission is to execute performance tests, where they do not understand the system under test very well. A critical part of managing a PE and a PT team is to make sure there is a well defined career path that helps shape the more junior person and helps to extend the technical and non-technical skillset of the more senior people. They should also growth their understanding of the business.

Enterprise architecture

The EA team is involved in many of the Enterprises large scale projects and they usually have a handle on the risks of the large projects. The EA team is involved early in the SDLC, they are technical team that understands the business. When a PE team is placed here, it can have Enterprise wide visibility into the key projects, this allows the team manager to insert the team into the high risk projects and to see the planned projects.

The Director of PE must be a direct report to the leader of the EA team. The goals of the typical EA team are very much in alignment with the goals of the PE team, the EA mission is to translate business requirements to a technical solution for business value. The PE mission is to help manage risk; the risk the applications will not meet the user experience, the risk the applications will not achieve the business goals of scalability, and reliability.

The risk here for the PE organization is the team does not stay involved for the development and testing process.  The PE career path here can be an issue, as the way to promotion is to be an EA.  The EA are involved early and once the design is approved, they are not involved in the detailed development.  The PE team must be involved in the full SDLC.

Quality assurance

Most QA organizations are focused on the functional requirements of the application. They are involved later in the SDLC, however, there is a trend to involve them in the requirements phase to define the tests cases at that time. When a PE team is placed here, it tends to be very testing focused. The project development teams involve the PE team when getting ready to run performance tests. Not during the design and development phases.

The QA team is very focused on the business needs, however they tend not to be technical.  There very few software developers in the QA team.  A high value performance engineering team is technical and business aware. This lack of technical focus within the group will impact the PE team and the PE career path.  The QA career path is different from the PE path. QA can be a manual process in many organization and lack the drive to automate testing.  The PE team needs automation to be successful, as technology changes.

There is a risk here is not enough focus on the technical skillset, too much focus on testing and not enough on design and development, where does the career path lie?

Operations

The goal of Operations is to keep the production applications running smoothly and performing well, where all is well and the customers are happy. The Ops team is the end of the line for the application and they often feel the pain of poorly performing applications, where they have to make it perform.  Often times the development teams do not make the application will performance well, unfortunaltey, response time, scalability, utilization of system resources is the problem of Ops.  They were often not involved at all during design and development.  This is a technical team, often times with a more limited understanding of the business needs.

The operations team put in place monitoring and measurement processes and tools for the applications. The performance team placed in this group usually has a capacity planning and troubleshooting focus. This is because when the application is moved into production, historically it may not have performed well, and the Ops team had to fix it.  They monitor the resource consumption of the production system, they understand the workload of the applications and can determine when it changes.

The performance team in this group will be involved in predicting the expected capacity of the new or modified applications, they evaluate the workload, review performance test results and determine  if additional computing resources are required.  They will have access to the production workload and gain a good understanding on how the application is used.  This information is critical to designing a high value performance test plan.

The risk for the PE team in this group is a reactive focus, and late involvement in the SDLC where they are not aware early enough on the changes to the applications. They may not be involved in the performance testing of the application and have limited access to the results of the testing.  The career path is at risk here as well.

Business unit

Large business units set their priorities and can have large critical development programs that span years.  The IT and Business leadership team have bought into the value that PE brings and how it mitigates risks for the business. These large programs will also have many different Releases, each with added business functionality and complexity. The development team is also part of the business unit.  The business unit may set up the PE team as a shared resource within the BU, to be leveraged across the critical business applications.

When a PE team is placed within the business unit it is viewed as a critical success factor in each major release. The development process in this situation often has PE activities embedded in the SDLC;  non-functional requirements are defined, design reviews, code reviews and unit testing for performance, performance testing, and production monitoring and measuring.  This is a more integrated team. The PE team is very in tune with the business goals and priorities. The PE team leads the PT team as well, there are well defined performance test scenarios and the expected outcome is well known.

The risk in this cases is minimal, however the career path will be maintained outside the program.  There will still be the need for a Enterprise Performance Engineering leader who can manage training and career paths.

Scattered Performance team

In this case, a performance engineering team does not exist. There are key people scattered across the organization who can take on the role of PE when needed.  They can be in all the organizations mentioned previously.  Often times they are brought together (summoned) when there is a critical performance problem.   This structure may be good enough for the business, there may be few and far between production issues related to performance and scalability.

A performance testing center.

The business has pushed for a low cost performance testing team that is typically in a different geography or off-shore. The development team or the QA team will define the test scenarios and test cases to be executed. In this case, you have people who are not performance engineers defining performance test cases for a testing team that does not have a performance engineer. The remote performance testing team often has very little understanding of the application under test. The PT team produces basic performance reports from the testing tool.

In some cases, the PT team cannot execute a test due to a technical issue that they cannot solve. They end up waiting for the next day for the development team to solve it. The value in a PT comes from the ability to understand the application, overcome many technical issues and provide insightful test results. The testing center model requires more involvement from the development team, in some cases so much involvement, the development ends up running the test.

Agile and PEYour are not Agile

Integrating Software performance engineering with Agile software development methods.  This should be easy, right? These two methods are not naturally suited for each other;

Performance engineering has rigorous and defined methods for defining non-functional requirements in the development cycle, and performance testing requires production like test systems with a proper transaction workload mix, and a large database. Performance testing can introduce more time in a release schedule when the application was designed and developed without well defined performance and scalability requirements.  The value in performance engineering methods is to manage risk and to design and build highly scalable systems that support the busines growth.

Agile methods start with stories and themes, a defined system architecture, and a partial list of required features. The team leader defines a series of releases comprised of a series of scrums.  Each release will have partial features and functions, Each scrum may not finish all the features listed in the scrum and need to be added to feature backlog. The schedule dominates the decisions and value of Agile methods is to get new features into production qucikly.

The Challenge

Businesses are adopting Agile methods for larger and larger software development projects. The projects are moving from department focus to Enterprise focus. The teams are getting larger and distributed. The Non-functional requirements of systems are becoming more stringent.  The end user experience is critical.

How do we implement performance engineering and performance testing tasks and activities in the Agile methods? The introduction of PE/PT cannot compromise the original intent of Agile, faster and paprtial delivery of software. The goal of Agile methods is to produce frequent working copies of the system, with incomplete features.  Features are introduced for each release. Traditional performance testing occurs after all the features have been developed and the application is nearing production.

This is not about fitting Agile methods into performance engineering, it is about adapting performance engineering to Agile methods.  Business are using Agile to accomplish business goals, that goal is speed to market.

The performance team must approach the project that same way the Development team does. People over process, multi-disciplined, learning and adapting. The performance engineer and the performance tester/scripter must understand the performance test process extremely well. During an Agile Performance Sprint, the Performance team will have to adjust and adapt to meet the timeline.  They will have to communicate performance issues to the development team so they can get the fixes into the next Sprint backlog.

Amending performance terms to Agile terms

Define the Performance Theme for the project. This is a project wide message to let everyone know the project has a performance and scalability focus, and the risk associated with the project require PE tasks, activities, and people. This must be conveyed to the Architect and the lead technical person, that best practices, tips and techniques for performance and scalability must be used for this project.

Performance Stories – The performance tests for the project, the workload models, and the test types. How do you add performance needs to the Story?

Performance backlog – The performance tests that need to be executed. The performance issues uncovered. When a performance issue is found, it must be put in the scrum/release backlog.

Spike – How will the performance tests be executed in context of the Release schedule and Sprints? Will there be focused out of band set of performance tests?

Definition of Done – Clearly defined exit criteria for the Performance tests.

New role on the Agile team.  A performance engineer for architecture reviews, code reviews, database reviews, and test planning.

The development team

Since the project has been set to a performance theme, the development team must use best practices for performance when developing code/services. The skillsets and techniques required to design and build highly performing systems must be communicated to the team. A critical success factor requires the development team to build perform into the system from the beginnning.  This will  help eliminate rework or technical debt due to poorly performing code. Performance is designed into applications, it is not tested into applications.  If the development team builds for performance the performance testing process will be much easier.

Performance testing

How many environments are avaialble for the Releases and Sprints? Is there a dedicated performance testing environment ready to go for performance testing to start immedately? The enviroment must be ready to begin testing in order to keep the value of Agile going.

Will there be multiple rounds (sprints) for PT, one or more tests is executed with various degrees of partial feature completion. There could be three or more PT Sprints, plus a final PT once all the features have been completed.

A round of performance testing will be done at the end of either a release or a sprint.  The performance team must be fully aware of the features available for testing. Each feature will require the development of one or more test scripts, with the supporting data.

Performance test database: Tracking the data readiness.  During the course of the sprints and releases, the database structure may change. The performance team must be fully aware of the changes and understand the impact to the test schedule.  If the database changes are significant, you may have to reload the Performance database. This would need to be accounted for in the project schedule. The performance and developmen team should automate the data creation process and reload process as much as possible to adhere to the value of Agile methods.

The typical performance test project has a startup phase where the test suites are planned, the test cases identified and the test scripts created.  The scripts and test cases are vetted and prepped for formal test execution. In the Agile approach, the performance test cases and the performance test scripts are developed along the way.  The performance team will use an Agile approach as well.

Performance testing in Sprints.

The performance plan will follow the same Agile methodology, the upfront planning, testing lab build out (similar to the architecture phase), then a series of sprints leading up to a final release. The workload will be evolving at each sprint. The Sprint should be three weeks, with the rule being “Make no Changes” this will be a measurement only exercise. The first week is the recoding of new scirpts, updating the test User Profiles, and modify the Database as required. Then there will be two weeks of test execution, with a report listing the issues.

more to come..