New features, new functionality, new value-adds. Tech scale-ups are under constant pressure to evolve their customer offering and accelerate growth. But the underlying infrastructure and operations need attention too. In fact, the operational maturity of IT plays a vital role in the achievement of business goals.
What is operational maturity?
Operational maturity is about the consistency, reliability and resilience of the IT infrastructure, as well as the way it’s managed and maintained. It’s a measure of IT sophistication relative to an organisation’s size and stage of development.
For instance, we’ve previously written about the inevitable accrual of technical debt during the start-up phase, and the need to understand and manage it as you grow. This is part and parcel of operational maturity.
For a tech scale-up to achieve its full potential, you need to strike an effective balance between time and energy invested in operational maturity versus product innovation.
Why operational maturity matters
A lack of operational maturity often reveals itself in the cracks in performance as a business starts to grow. It might be apparent in frequent outages, lengthy release cycles or laborious, error-prone processes. When you dig deeper, you may find that database performance isn’t scalable enough, disaster recovery isn’t fast enough, application security isn’t strong enough. All these things point to the fact that your operational maturity isn’t scaling in line with business needs.
When tell-tale performance issues arise, it’s definitely time to reprioritise operability. But deciding the best way to go about it isn’t easy.
Clearly, if you put all your time and energy into innovation with little emphasis on operability, problems are going to escalate. But putting the brakes on innovation for six months while you get operations in-hand is risky too. There’s a good chance that competitors will get ahead of you in the innovation stakes, potentially luring away your customers. On the other hand, if you fail to build operational maturity into your plan, there may come a time when a major incident forces you to put everything else on hold while it’s sorted out. And remember, getting your customers back and regaining their trust after a major incident is far from certain.
The answer is to prioritise those aspects of operability that are most pertinent to your organisation or your industry and act fast. When you take small steps in the right places, you can tread a careful path to operational maturity and avoid bringing the business to its knees.
How to improve operational maturity
So, what does good operations look like? And how do you nurture it? There are two key resources that can help determine this: Google’s Site Reliability Engineering e-book and the AWS Well-Architected Framework.
This e-book is a collection of honest accounts from Google’s engineers about scenarios they encountered and challenges they faced as the business scaled. In the foreword, Mark Burgess says:
“Nothing here tells us how to solve problems universally, but that is the point. Stories like these are far more valuable than the code or designs they resulted in.”
We tend to agree with this stance. The book is packed with valuable insights and anecdotes, and its section on Google’s Service Reliability Hierarchy is especially useful.
Ultimately, product performance depends on each of the layers in the pyramid. First, you’ve got to have monitoring, or how will you know if the system is even working? Then you need the ability to respond to incidents when they occur. In a mature system, incident response goes beyond fighting fires to really find out what went wrong and why. This feeds into the post-mortem and root cause analysis which is undergoing a lot of change in DevOps right now. People are waking up to the fact that if you conduct post-mortems in a blame-free way you enhance psychological safety. When there’s no fear of reprisal, people are far more inclined to interrogate issues.
The mid-tier of the hierarchy focuses on testing and release procedures. This is where ‘everything as code’ and automation techniques come to the fore. When your business is growing at scale, handling these aspects of software development manually quickly becomes unsustainable. Capacity planning also benefits from automation, with load balancers ensuring the available capacity is used to good effect.
Towards the top of the pyramid sits development and finally the product itself. Both of these become more sophisticated and effective when properly supported by the lower tiers.
We find that an application-centric approach provides a good grounding for operational maturity. And the AWS Well-Architected Framework is an excellent tool to help identify and prioritise operational needs on an application-by-application basis.
We’ve written about Well-Architected before, and it’s familiar to many. But just to recap, it focuses on five key areas for each individual application:
- Operational excellence
- Performance efficiency
- Cost optimisation
Conducting a Well-Architected Review is probably the single most important step you can take on the road towards operational maturity. It may simply validate what you think you need to do. But it will almost always reveal something you hadn’t considered. And it’s these ‘unknown unknowns’ that pose the biggest threat to a scale-up if they aren’t brought to light.
Rearchitecting for growth
While every tech scale-up is different, in our experience many of the infrastructure and operations challenges are the same. Rather than putting a hold on innovation while they’re ironed out, we advocate making a little space to address some of them. So, you might rearchitect and modernise part of a given application with CI/CD code pipelines. This can deliver quick wins that reduce some of the day-to-day operational toil. And each hour of toil that’s eliminated can be put to better, more strategic use.
Over time, the cumulative impact of these changes turns the tide. The business steadily becomes more operationally mature as engineers have time for proactive work that oils the wheels of the IT infrastructure. Developers also have more freedom to innovate because they’re not being pulled into remediation work. This creates a more cohesive, collaborative and progressive environment. New ideas proceed faster, and they are less likely to trigger performance issues when they’re deployed. In this way, operational maturity helps the business achieve that sweet spot between innovation and reliability which drives the optimum customer experience.