DevOpsGroup has joined Sourced, an Amdocs company. Read more about this partnership and what this means for the future in our blog.

DevOpsGroup Blog Datadog hackathon boosts cloud monitoring best practice

Datadog hackathon boosts cloud monitoring best practice

Our managed services team runs an ongoing Datadog hackathon to keep on top of the platform’s emerging and evolving features. Cloud engineers hone their skills while contributing to a repository of insights and ‘as code’ templates for rapid deployment.

My colleague Mike Lazenby recently wrote about how we use Terraform to manage Datadog resources on an ‘as code’ basis. An advantage of this approach is the rapid uplift of managed services customers’ cloud monitoring capabilities. They benefit from sophisticated approaches on par with cloud-native solutions even if they’re just finding their feet in the environment. It can be a great way for organisations to improve performance en route to cloud maturity, especially after lift-and-shift migrations.  

As a leading cloud-scale monitoring service, Datadog is continually releasing new features and updates. Some of these can be game-changing, adding immense value to cloud management and operations. However, keeping up with the latest developments is no mean feat. We need to identify which offer the greatest potential benefits, then figure out how to use them to maximum effect.

Enter the Datadog hackathon.

How the hackathon works

Like any hackathon, the idea is to create a safe space for engineers to explore, experiment and innovate. While the initiative is fluid and engineer-led, the sheer scale of the Datadog platform meant we had to be methodical about the inclusion of features. We initially narrowed it down to 40 using a tech radar to prioritise those most relevant to customers’ tech stacks and pipeline projects. We also considered technology gaps where there might be opportunities to add value to existing cloud management and operations activity.

Each feature was allocated a ticket and added to the backlog of our improvement epic on Jira. Engineers were then given free rein to explore features that interested them when they had capacity between customer jobs.

After setting up the pre-requisites to evaluate a feature’s use for AWS or Azure, the engineers run through Datadog’s documentation. Throughout the process they assess ease of use and highlight areas of ambiguity or gotchas. Sometimes, they might discover that while the metrics are useful, the alerts are a bit noisy. Other times they find a given feature works better in conjunction with another solution for certain deployments.

Once these early insights are established, they’re saved to the Inner Source repository. Then, if we think the potential outcomes justify the time, we may go a step further and set up a templated ‘as code’ integration using Terraform. We often tailor these integrations, introducing variables or combining multiple features to optimise how alerts are triggered and escalated. 

Datadog hackathon activity is discussed during our fortnightly Operate Skills Development meetings. So, if an engineer encounters difficulties, the rest of the team can offer guidance or act as a sounding board. In this way, individuals build their own skills and confidence while the wider team and customers benefit from their outputs. 

A mainstay of service improvement

The initiative is so successful that it’s become a permanent feature on the managed services team agenda. We periodically add new Datadog features to the backlog, and all our cloud engineers take part. Some of the features we’ve looked at have gone on to play an important role in our managed services offering. For instance, a process monitoring hack quickly revealed the root cause of an enduring issue that had been evading a customer’s existing monitoring solution. We’ve now deployed it across several customers’ cloud environments with great success.

We’ve also developed some interesting hybrid solutions. One noteworthy example enhances SQL database visibility for organisations that aren’t yet using cloud-native functionality. It allows them to take a standardised approach to SQL management just as they would in a cloud-native scenario. So, following a lift-and-shift migration they can quickly benefit from modern, sophisticated ways of working.

The cloud modernisation journey

Cloud modernisation is a journey, not a destination. And Datadog is a powerful ally to have onside along the way. If you know which features to use, and use them well, it can help drive strong and steady progress.

With new releases coming thick and fast from cloud vendors and third-party providers alike, nobody can keep up with everything. A pragmatic approach is needed to identify then evaluate those with the greatest propensity to add value. From here, it’s important to experiment to get the best out of features and adapt them for specific requirements. In our experience, off-the-shelf solutions often do the job, but to do it well requires a certain amount of hacking. This requires energy, effort and expertise, so making it interesting and enjoyable is a good thing.  

The Datadog hackathon is popular with our engineers because they have the freedom to follow their interests. We all share a natural curiosity about anything cloud related and enjoy solving problems. The initiative has resulted in direct benefits for our customers too, making it an all-round winner. We’d encourage any organisations thinking about running a hackathon to give it a go.  

Andrew is Lead Agile Delivery Manager for cloud managed services. Get in touch to talk about how our progressive cloud-based support can help your organisation.


Leave a Reply

Your email address will not be published.