Operational Intelligence


Dateline Monday April 1st.

Big Data and Software performance engineering combine to help eliminate the Government debt, with the ability to collect real-time speed information from every car on the high-way and instanet performance optimized analytics.

Speed Limits

In an effort to continue to reduce the national and local government debt, the Car makers and the Secret lab from the government have created the ability to enable every car to instantly broadcast its current speed and plate number, encrypted of course, only readable by those who need it.  This allows the state and Federal government to instantly collect new revenue.  The wireless broadcast of the automobile speed and plate number will be good within a ½ mile range of the car, where the remote collectors can assess the fines as they happen.

There are a few options for the program; before each driver is charged, they will be given the option of speeding for the day or just this instance.  This allows the driver to purchase in bulk and at a reduced rate for the day.  If you have urgent meetings during the day, this will allow you to continue to speed and will not be pulled over the rest of the day.  This is a new and revolutionary take on the Speed Pass.  They still have to work through refunds when the driver cannot exceed the speed limit.  Perhaps, roll it forward.  People can enable payment through iTunes, Paypal or credit cards.

Upcoming Releases

It is rumored that the next Release of the program will introduce a step program, increments of 5 MPH over the posted limit to create a new premium tier option.  Also, the government is looking at maximizing its revenue by not allowing the Insurance companies to hit the driver with a surcharge for speeding. One anonymous insurance source is quoted as saying “What the hell?, we’ll show them whose boss.”

The implications for Big Data and performance engineering are tremendous.  This type of service offering by the Federal and state governments was only possible by the breakthrough advances by the Big Data and Software performance industries. A long time data and performance engineer, Chris says “This is an outstanding combintation of the two domains and wants to know if its Shovel Ready”,

Airlines get in the act

In a related story on remote monitoring, payment and data collection; a Major Airline is looking at adding remote sensors in every seat to record the weight of each passenger.  Then after deep analytics, the system will charge people more that are over the average weight for an adult.  They are still in Beta, working through how to identify males and females based on name, when to take the reading and what if people change seats.


System log data is Big Data.

Server log file

Over the years I have been involved in a number of production rescue projects, where the system is either unavailable or in a degraded state. These are complex systems, with many technical components. These projects often start with a group of diverse people and teams saying their component is working fine. The devil is in the details. In order to find the problem, we have to look at the logs and configurations and correlate the events. Many companies have partial solutions, where you have to get the logs from many diverse systems and do the analysis (tailing files, greping, Perl, Excel). Time and resources are consumed to find the issues. Usually there is no central dashboard or reporting, most of the time is spent on finding the right logs and making the data useful, all the while the system is still not working.  There are a number of new products out there to help with this, to Operational Intelligence from machine data.

Machine Data is Big Data

Complex business applications and their supporting systems are a rich source of data. Every technical component in the system produces logging information.  There are detailed logs on every transaction that is processed successfully and there are error or exception logs for failed transactions.

The current definition of Big Data includes;

  • Variety a large number of data sources and formats, system logs have that covered
  • velocity rapidly changing and fast moving, log records are created by the 100’s every seconds and
  • volume of data, data is recorded for events and for time periods.

Enterprise class business applications can have 100’s of technical components.  One business transaction will fan out and create 100’s of log and audit table entries. The Apache web server records each request, the Websphere or Weblogic server records requests, the application within the container will create log entries, the database server, the message servers, and the storage platform.

Do you have a big picture of your operations environment? This rich and complex technical environment presents a challenge for the operations and development teams. It presents a challenge for the software performance engineer. When response time slows down, or a component is not quite working as expected, or in general a production issue arises, root-cause analysis is problematic.  There are partial solutions for providing the entire picture of the problem. Locating the causes of the problem can be time consuming.  The Operations team is managing a larger and large number of real and virtual environments. Plus, the business is asking for more frequent releases into product.

New tools for machine data

These tools not only help with trending performance of key business transaction and root-cause analysis that can track configuration changes (often the root of problems), there is a big compliance and security benefit from them. Correlating system access logs can help with information forensics.

You need to collect all the machine data that is available to you;

  • Network packet flow information from the Cisco routers and switches,
  • Storage subsystem metrics from EMC or Netapp
  • Database metrics from MySQL, Oracle and MS SQL Server
  • Application servers from your Log4j messages
  • Microsoft WMI
  • Web Server information from Apache
  • Operating system information from AIX, Linux and VMWare
  • End user experience

Massive amounts of data are collected and indexed, then made available for real-time analysis. Alerts can be configured for key metrics. If you use these tools and are monitoring all the logs and configuration files, have dashboard created and alerting on the key metrics, you are way ahead on the problem solving curve.

What is the ROI?

You have to spend money on these tools, so how do you position them with the keeper of the budgets?  Will this tool replace some other tool or set of tools?  Does your Application Performance Management Road Map already call for achieving Operational Intelligence?  Your ROI calculation either demonstrates an indirect benefit to the business or a direct benefit to the business. You can start with a free version of some of these products, I suggest you download them and use them in your testing environments.  Then, show the value derived whe you have to solve a problem.

If you can make the claim that this tool will reduce your Retail web site from losing customers, then you might have direct benefit. Or, you can make the claim that with this tool fewer people are required for Root-cause analysis. If any of your production outages were caused by product configuration problems, then these tools can help. You must know the cost of an outage, $5,000/minute, $15,000/minute, etc. Perhaps, fewer people are needed to growth with the business. You can change your hiring profile and curve as one outstanding operations person can now support twice the number of systems.

These are things the forward looking software performance engineer must be doing to continue to add value to the business.

Can you make the case that these help with business agility?

Check them out..

Sumo Logic, Splunk , Alert Logic , Loggly , LogRythm  and more, google Machine Data.