Why do we need IT Operations Analytics?

Recent computer system outages at Southwest and Delta Airlines forced cancellation of several flights and delayed several others, causing millions of dollars in revenue losses. In addition to direct losses, such outages can have longer term impact in terms of brand image and customer loyalty.

Enterprise applications like SAP, Oracle and Salesforce.com are at the heart of many corporate operations. Reliable operation of enterprise applications is mission critical. Any event that impacts application availability can be disastrous to the business.

In “The real cost of downtime”, Alan Shimel refers to an IDC report’s findings that, for the Fortune 1000 (taken together):

  • Average total cost of unplanned application downtime is up to $2.5 billion/year
  • Average hourly cost of an infrastructure failure is $100,000
  • Average cost of a critical application failure per hour is $500,000 to $1 million

It’s no wonder that IT Operations is getting closer attention from executive management.

You need 4 key elements to ensure uninterrupted system operations:

  1. Reliable Infrastructure with built-in redundancy and no single point of failure.
  2. Periodic audit process to ensure infrastructure elements meet the above requirements.
  3. Adequate system resources to handle fluctuating processing requirements.
  4. IT Operations Analytics (ITOA) to detect any infrastructure or resource issues before they impact system operation.

ITOA applies data analysis tools and techniques to IT operational data to derive actionable intelligence to predict and prevent system outages.

Data Sources for ITOA

ITOA tools ingest data from multiple sources, including:

  • Logs generated by the application infrastructure, such as application, database, operating system, storage and network.
  • System monitoring agents installed on various application infrastructure components.
  • Wire data, which is transmitted over the network between all the application infrastructure elements. For example, the application layer communicates with the database over the network and this network traffic is captured for analysis. Several vendors like AppDynamics, Extrahop, Riverbed and Netscout offer Application Performance Monitoring (APM) tools that use wire data for analysis.
  • Synthetic or Probe data is collected by end-user experience monitoring tools. Probes implemented outside the application infrastructure interact with the systems to measure availability, response time and other aspects of the system using synthetic (dummy) transactions.

Though no single source can provide all the data needed to predict and prevent system outages, the combined intelligence derived from all these sources can help with early detection of system issues. However, the total volume of data from these sources is enormous – so in reality, ITOA is a big data problem that requires data science and machine learning techniques. Proactively anticipating failures will go a long way towards improving system availability. With the advanced tools and technologies available today, it no longer makes sense to continue relying on inefficient and cumbersome manual troubleshooting.

While several vendors offer ITOA tools, managing SAP operations poses a special challenge which requires well trained administration teams and regular audits to ensure stability of complex IT environments. In order to provide a stable operating environment for applications, it is important to look at the factors that contribute to application failure.

Stay tuned for my next post, where I will discuss the reasons for application failure in SAP and other IT environments.

Back to top