So, you’ve invested in the cloud. Most systems and applications are now built there or have moved there. Yet you’re still not moving as fast, far or cost-effectively as you’d like. If this sounds familiar, you probably need to start modernising operations.
Today, many operations engineers find themselves in a challenging position. They must contend with developers evolving products at a heightened pace, while pressures on system security and stability are greater than ever. Unless the organisation has invested in DevOps, they simply don’t have adequate processes, tools or capabilities to meet the demands of the digital age.
This can cause friction between operators and developers. The business can’t adapt to customers’ evolving needs fast enough because operations becomes a bottleneck on every new initiative. At the same time, the business is at risk if it can’t adopt the latest measures to maximise security and resilience.
Cloud-based systems and applications deserve – in fact require – sophisticated operations support. Trying to replicate traditional operations approaches in modern environments is a recipe for inefficiency and burnout.
It’s time to break the deadlock. Which means challenging conventional wisdom that prevents operations from moving with the times.
Cloud-based operations requires different approaches and skills to traditional operations. Acknowledging this, then taking steps to modernise and improve the way you operate in the cloud, will drive competitive advantage.
We believe there are five core aspects of modern operations. However, it’s not practical to embrace them all in one swoop throughout the entire business. Instead, think about where they could unlock value soonest in specific areas, and build your capabilities from there.
The five pillars of modern operations
1. Operability by design
It’s important to consider operability at an early stage in any system or product development.
But what do we mean by ‘operability’?
It refers to features that make it easier, safer and more cost effective to manage a product or system in production. For example, a system design might be ‘horizontally scalable’ making it easier and quicker to handle changing customer demand patterns. The goal is to ensure the entire system, end-to-end, is simple to operate and maintain over its entire lifecycle.
Taking a little extra time upfront to ensure digital systems are easy to run improves overall speed and efficiency for the longer term. Making major fixes after something goes live is always more expensive and disruptive than identifying and rectifying issues during design.
Be disciplined about this. It reduces the likelihood of problems emerging later and lowers total cost of ownership. Better still, it improves resilience and cuts future toil, which will result in faster progress.
2. Agile ways of working
Applying an agile mindset to operations can be revelatory. With traditional operations, there’s an implicit understanding that rigidity equates to stability. Processes and tasks are fully planned and there’s little scope to adapt to emerging or rapidly changing requirements.
Yet this doesn’t prevent unplanned work from landing with operations engineers – and it’s often hard to ascertain what truly needs immediate attention and what could wait. So, operations gets bogged down in a continual cycle of firefighting. Engineers spend all their time addressing symptomatic issues and don’t have the opportunity to think about, let alone work on, the underlying cause.
Adopting an agile approach can help turn this around. Create space for planned and unplanned work; ensure toil (mundane work that could probably be automated) and engineering work (which drives tangible improvements) are properly differentiated and accommodated.
This might seem impossible when operations engineers are already at breaking point handling urgent tasks. But combined with other measures such as automation and self-service, it will reduce stress, improve job fulfilment and unlock better performance.
3. Automate everything
‘Automate all the things!’ is a well-known DevOps meme. Automation tools like Terraform’s ‘infrastructure as code’ and Puppet’s ‘configuration as code’ are critical to modern operations. We are also seeing the rise of ‘event-driven automation’ with tools like Relay which seeks to simplify the task of creating automated responses to common operational challenges. (For a useful article on event-driven automation check out this blogpost by Kenaz Kwa).
The benefits of automation are far reaching. It eliminates toil, improves capacity planning and enables issues to be fixed without manual intervention. Reliability and consistency are much improved too. You can be certain that the correct and most recently updated approach will be used every time.
All of this means operators have more time for engineering work. They can take a more strategic approach, focusing on factors that generate wider IT and business performance improvements.
For a scaling business, this is the single most important factor allowing your team to grow in a non-linear fashion. As automation is extended, the number of services that can be supported by a consistent number of engineers increases. Which means you can scale your revenues much faster than your costs.
4. Frictionless self-service
It’s nearly ten years since the National Institute of Standards and Technology (NIST) published its five essential characteristics of cloud computing. Yet one of the fundamental aspects of this – self-service – is still lacking in many cloud-based organisations. This is despite the fact that aligning with NIST characteristics has been shown to enhance business performance (check out DORA’s 2018 Accelerate: State of DevOps report which shows organisations adopting the NIST’s five essential characteristics are 23 times more likely to be elite performers).
A self-service approach to operations essentially converts business needs into platforms used on demand by product teams, without operations intervention. Building self-service platforms alongside reusable architecture patterns as infrastructure-as-code via an ‘InnerSource’ model enables operability requirements like security to be built-in. This reduces the cost and effort needed to ensure great operability, whilst simultaneously enabling the organisation to move faster than ever before.
Self-service operations demands a significant shift in mindset as well as an investment in the capabilities that enable it. The operations mentality needs to focus on how to ‘get out of the way’ of fast, secure releases to production. It’s about improving the overall management of applications, rather than imposing centralised control. Failure to embrace this self-service approach is probably the biggest blocker preventing cloud-based organisations achieving their full potential.
For an excellent and really detailed overview of self-service operations, take a look at this guide published by Rundeck.
5. Make tomorrow better than today
The central goal – and fundamental challenge – of modern operations is driving continual improvement of systems as well as keeping them running. Whilst ITIL has an entire key process dedicated to ‘Continual Service Improvement (CSI)’ we like the simplicity of the phrase coined by Google’s Stephen Thorne: ‘making tomorrow better than today’.
As a pioneer of Site Reliability Engineering, Thorne considers operations through that lens. Whether or not SRE per se is right for your business is another matter, but the underlying message holds regardless. Operators need to balance toil with engineering work so they can focus on making things better, not just completing reactionary tasks.
According to Thorne, 50 percent of engineers’ time should be spent on project work, and of that, a maximum of 50 percent consumed by toil.
“SREs must have time to make tomorrow better than today, because if you’re not capping that toil and allowing them to actually go and implement that monitoring work, then all they’re doing is getting overloaded with toil and then they won’t be able to do any project work. The next time they need to do some things to improve the reliability of the system, they’re too overloaded.”Stephen Thorne (Getting Started with SRE, DevOps Enterprise Summit 2018)
An iterative approach to modern operations
While it helps to consider the five pillars of modern operations individually, they are deeply interconnected. And that’s a good thing, because when you start to make improvements in one area, you’ll quickly see benefits in others.
As with any technical matter, we wouldn’t advocate a big bang approach to modern operations. Instead, pinpoint a couple of procedures that could make a positive impact in one area of the business, and expand from there. Build new operations capabilities with care and in consultation with the users that are affected.
When you empower engineers to solve the very problems that have been holding them back, business benefits will snowball.