DevOpsGroup engineers Mark Nash, Mike Lazenby, Matthew Dibble and Jamie Ward are experts in cloud cost control. As part of our Operate team they spend their days (and some of their nights) ensuring clients’ cloud-based operations run smoothly. An important aspect of this is finding ways to reduce cloud operating costs without compromising platform performance.
In this blog, they share five proven approaches that help keep cloud operating costs down. Ideally, they should be used together as part of a wider strategy which optimises costs and fosters operational maturity.
1. Develop a tagging procedure
Tagging is the backbone of a successful and cost-efficient cloud strategy. It enhances visibility, which is essential in the virtual cloud environment. Without tagging, it’s very easy to get lost and confused due to the sheer number of resources used.
As Mark explains: “Imagine a cloud estate where there are 20 VMs, some of which are Dev/Test, some are hosting production data, and others are not being used any more. If they are named ‘VM1’, ‘VM2’ and so on, it’s hard to identify what each of them is doing.”
An effective naming strategy reduces this risk, but it requires the recreation of existing resources. Tagging is non-destructive, so you can keep the original name (such as VM1) and simply apply tags related to ‘environment’, ‘use status’ and ‘owner’. This helps you filter through the estate, identifying which VMs can be turned off or removed, and who to talk to about them.
Mike says the non-destructive nature of tagging also enhances flexibility: “You can adapt tags to reflect changes made in the cloud environment post-creation. It’s really useful being able to tweak tags so they better reflect the various resources and their roles without impacting service availability.”
Introduce a tagging protocol, and keep it up
We have our own protocol for tags at DevOpsGroup, but also work with clients’ own naming conventions if they have them.
A standardised approach improves the visibility and traceability of resources and workloads. For instance, using logical group names, such as ‘production systems’, means anyone accessing an environment can quickly see what’s what. This enables better management of resource groups too; if costs start to increase it’s easier to figure out the underlying cause.
It’s also good practice for tags to include a reference to the associated cost centre, so that when additional cloud operating costs are incurred they can be allocated to the relevant department.
Sometimes, even with the best intentions, names don’t reflect a resource’s eventual use. It’s important to avoid this; tagging needs to be treated as an ongoing process and kept up to date to ensure it delivers the full benefits.
Finally, tags can facilitate integration with third party applications such as CloudCheckr which analyses tagged resources and suggests changes that could deliver immediate cost savings.
“Organisations that are new to the cloud sometimes underestimate the value of tagging and don’t do enough of it. Eventually, unseen costs creep in and start to escalate. A methodical approach to tagging can turn this around, and the more granular you get the clearer the picture becomes.”Matthew Dibble, Cloud Engineer, DevOpsGroup
2. Implement rightsizing and autoscaling
Avoidance of overprovisioning is one of the foremost benefits of cloud computing from a cost management perspective. But it doesn’t happen automatically. The best way to optimise provisioning – and therefore costs – is through the rightsizing of instances. However, many organisations neglect this when they first move to the cloud.
According to AWS, rightsizing should be an ongoing process of “matching instance types and sizes to your workload performance and capacity requirements at the lowest possible cost. It’s also the process of looking at deployed instances and identifying opportunities to eliminate or downsize without compromising capacity or other requirements, which results in lower costs.” (Read more here).
Combining rightsizing with autoscaling is the secret of success. Cloud-based environments should be carefully configured to meet day-to-day needs in the most cost-effective way, but have the ability to adjust fast when traffic or load increase.
Mike explains: “Autoscaling works in multiple ways – horizontal and vertical – and it can be triggered by various event types such as a sudden increase/decrease in demand or failures (to ensure available capacity is maintained). It can also be scheduled to ‘prepare’ for known peaks or troughs in demand.”
So, rather than having devices provisioned for the expected maximum load, you provision for minimum load, then scale up when required to support higher load levels.
Using reserved instances to meet minimum load requirements can play a major part in driving down cloud operating costs. You commit for a year or more rather than accessing on-demand, and receive discounted rates in return.
If you’re asked how many resources are deployed, the answer should always be ‘enough’. No more, no less.
“We often see businesses following behaviours adopted in the on-prem environment, such as overprovisioning to cope with future demand. This is not necessary in the cloud. The lag time between ordering and receiving devices is eradicated in the cloud environment, and you can scale on-demand to suit usage patterns of the moment.”Jamie Ward, Graduate Engineer, DevOpsGroup
3. Don’t forget Dev/Test costs!
For many organisations, Dev/Test environments are one of the first workloads trialled in the cloud ahead of largescale adoption. Being able to spin up these standalone entities as needed fosters innovation, and they feature heavily in cloud-based strategies.
In theory, Dev/Test environments are destroyed once they’ve served their purpose. However, remnants can be left behind, quietly consuming resource. This results in unseen costs which mount up over time.
If you have a stable cloud cost, setting up alerts for when certain limits are breached is really helpful. It can highlight payments for resources that you didn’t realise were still in use.
Mike says: “Having the right cost monitoring and alerting in place, as well as lifecycle management processes such as shutting down non-production servers when they’re not required, can save a fortune.”
4. Consolidate billing and find the best cloud licensing agreement for your needs
There are several different payment options associated with the cloud. Finding the right one can make a big difference to overall spend.
Most organisations start out with a pay-as-you-go subscription. This is great if you’re just playing around with the cloud, but not great for running a business there. As cloud adoption takes off, alternative approaches will probably offer better value and be easier to manage:
- Partnering with a Cloud Solution Provider allows you to lean on their support rather than conversing directly with the cloud vendor (this approach works well for less technical companies).
- Enterprise Agreements provide high levels of discount for customers planning to spend a large amount on cloud infrastructure.
- Dev/Test subscriptions can be a cheaper alternative for development teams, offering reduced costs at the expense of availability and redundancy.
- Consolidating subscriptions into a centralised bill helps with cost visibility and management; it’s applied at the organisational level with subscriptions feeding into the total.
- Hybrid licensing can also be used to reduce cloud license costs across VMs and databases if applicable.
- Free subscriptions may be available for testing and learning. These are sometimes associated with software development licenses such as Visual Studio.
“There are many ways to make cost savings in the cloud. Sometimes there are opportunities to make a big cost reduction in one hit, for instance by losing an entire VM. But it’s important not to lose sight of the smaller improvements – their cumulative effect can make a significant difference.”Mark Nash, Senior CloudOps Engineer, DevOpsGroup
5. Take advantage of AWS Spot Instances or Azure Spot VMs
Major cloud providers including AWS and Azure offer heavily discounted rates for the use of spare capacity or ‘spots’ of unused resource. Typically, spot users agree a maximum hourly rate with the provider, and certain activities only run when the price is below that point.
Engineers that know the cloud platform well are best placed to take advantage of opportunities to reduce overall spend.
Our engineers are big advocates of using spots for tasks that can be interrupted without causing any detrimental impact. Activity is programmed to stop when the cloud provider issues an alert to say the price is increasing, then resumes when the price falls again. This can offer significant savings for tasks such as data analysis, batch jobs or background processing.
We hope these tips from our Operate engineers were useful. Contact us if you’d like to find out how they can support your team.