The Monitoring Pyramid

At DevOpsGroup, our monitoring philosophy is simple - if your customers know about an issue before you do, you’ve failed.

The 5 levels of the monitoring pyramid

At DevOpsGroup, our monitoring philosophy is simple – if your customers know about an issue before you do, you’ve failed.

The Monitoring Pyramid underpins our approach to monitoring and ensures we deliver a first-class service to our Operate customers. There are ten areas of monitoring that make up the monitoring triangle and which are split across five levels:

  • Insight
  • Customer
  • Code
  • Platform
  • SIEM (Security information & event management)

At the top of the triangle are the dashboards that give our customers and our operations engineers insight into the current system state.  The Seven Areas follow on from this, divided into their groups and the whole solution is built on a base of security monitoring providing Threat Detection, Compliance, Governance and Auditing.

Starting from the base of the triangle, we work upward to find out which tools are being used in each section. Let’s look at each stage in more detail:

SIEM (Security Information and Events Monitoring)

Security

Security Information and Events Monitoring (SIEM) involves collecting and analysing information to detect suspicious behaviour or unauthorised system changes on your network, defining which types of behaviour should trigger alerts, and taking action on alerts as needed.

Platform

Service

Monitoring the core service layer that exists between the platform and the application, for instance IIS is running, is SQL server monitoring running etc.

Infrastructure

This is what IT departments have traditionally monitored (e.g. CPU, networks, HDD). It’s the ensuring the underlying platform (infrastructure) that a service/application runs on is monitored. There is a systematic collection of data to provide alerts on unexpected downtime, network intrusion, and resource saturation.

Code

STM

Synthetic monitoring (also known as active monitoring or proactive monitoring) is website monitoring that is done using a web browser emulation or scripted recordings of Web transactions. Behavioural scripts (or paths) are created to simulate an action or path that a customer or end-user would take on a site. Those paths are then continuously monitored at specified intervals for performance, such as functionality, availability, and response time measures.

APM

Application performance management (APM) is the monitoring and management of performance and availability of software applications. APM strives to detect and diagnose complex application performance problems to maintain an expected level of service. APM is “the translation of IT metrics into business meaning ([i.e.] value).”

APM typically automatically instruments your application to capture deep application performance information that help you debug your app. Some tools require agents inside your application process, which may require the ‘include’ of an APM agent library into your code.

Log analytics

Log analysis (or system and network log analysis) is an art and science seeking to make sense out of computer-generated records (also called log or audit trail records). The process of creating such records is called data logging.

Typical reasons why people perform log analysis are:

  • Compliance with security policies
  • Compliance with audit or regulation
  • System troubleshooting
  • Forensics (during investigations or in response to a subpoena)
  • Security incident response
  • Understanding online user behaviour.

Customer

Analytics

Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimising web usage. However, Web analytics is not just a process for measuring web traffic but can be used as a tool for business and market research, and to assess and improve the effectiveness of a website. Analytics is understanding how your customers use your software, where do they fall out of the process, where do they get frustrated.

Real User Monitoring (RUM)

Real user monitoring (RUM) is a passive monitoring technology that records all user interaction with a website or client interacting with a server or cloud-based application. Monitoring actual user interaction with a website or an application is important to operators to determine if users are being served quickly and without errors and, if not, which part of a business process is failing. Software as a service (SaaS) and application service providers (ASP) use RUM to monitor and manage service quality delivered to their clients. Real user monitoring data is used to determine the actual service-level quality delivered to end-users and to detect errors or slowdowns on web sites

Insight

Visualisation

Dashboards, Alerts, tickets, reports and using data to drive decisions on direction of service.

AI/ML

In a traditional monitoring model, it was down to users to understand which parameters to monitor and at what threshold alarms should trigger.  As IT systems grow in scale and complexity this is an increasingly difficult task.  AI and machine learning can be combined to provide proactive monitoring to identify anomalies in systems and highlight issues before they arise.

 

Want a high-res PDF of this?

Click here to download the full diagram of the Monitoring Pyramid

Download Diagram

Related Content

DevOpsGroup Datasheets Icon
Datasheet
Innovative Inception Workshop that lays the foundations for success

Any successful programme of work requires solid foundations. An Inception workshop brings clarity and cohesion, laying a strong foundation for success.

DevOpsGroup Whitepapers Icon
Whitepaper
Digital Transformation and DevOps

How to unlock the value from Digital Transformation and DevOps

DevOpsGroup Diagrams Icon
Diagram
DevOps Platform Teams

For organisations that want to move with speed & agility combining product teams with self-service platforms & modern operations could be the key to unlocking innovation.