In this live webinar a DevOpsGroup panel discuss; The five essential characteristics of cloud computing that make you 23 times more likely to be an ‘elite’ performer
Date: 18th October 2018 | Duration: 59:38
Hello again. I think let’s take the opportunity to get started. So welcome again to our second DevOps Discussed webinar. Today we are going to discuss the really powerful findings on Cloud from the latest 2018 State of DevOps report. We’re going to look at the exceptionally strong link between how to implement Cloud services and what they call their elite performance status.
So my name’s Rael Winters, I’m the product manager for CloudOps here at DevOpsGroup and I’m joined today by two of our distinguished consultants Colin Barker and James Reed. Both of whom have tons of experience and lots of opinions on the topics we’re gonna be covering today so hopefully, we have got to get some good discussion going. Say Good afternoon to you both.
So my job is to kick us off with a quick overview of the latest State of DevOps Report and lead into the conversation on Cloud. So we can then cover off our four main topics for today. So we’re going to start by looking at the essential characteristics of Cloud and a link to high performance I.T. and then we’ll dive into Cloud native and have a conversation about what Cloud native really means and indeed whether it’s worth it. We’ll have a discussion on our thoughts around options migration and then we’ll look at how we can build these new infrastructure operations capabilities to your business.
We want to keep this pretty informal, it’s a DevOps discussed webinar so please feel free to jump in with any questions you’ve got any time and we’ll try and answer them as we go along. You should be able, in theory, to put your questions into a Q&A box which will be about I think about here on your screen. Although I’m probably in a small corner of the screen, just underneath the central agenda there should be a Q&A box. I think Jo just said that she’s disabled chat for all participants. So I was going to say be handy for us if you put things in the Q&A box but it looks like that might be your only option anyway. But they’re a little easier for us to manage through the Q&A function rather than chat.
So for anyone who’s not familiar with the State of DevOps Report, it’s a research project, it is now five years old. I think it’s headed up by Dr. Nicole Forsgren along with Jez Humble and Gene Kim and the team at DORA (DevOps Research and Assessment). It attempts to understand DevOps patterns and practices and how they’re impacting the ability of teams to perform. It’s a really detailed study. There’s loads of science that goes into it and they’ve now got data from over 30,000 technical professionals across the world and in pretty much every industry imaginable.
And so these are the headlines of the report. Some of the findings are pretty consistent year on year but we do have changes in 2018 which are quite interesting. The first thing that always strikes me, always stands out in the State of DevOps report that makes this so real, is that DevOps really does make a difference. When we look at the highest tier of performers they are 1.53 times more likely, so more than one and a half times as likely, to exceed their own goals for organisational performance. So, that’s goals for profitability, productivity, market share and a host of other organisational performance metrics. This is true across all organisations, in all industries, for-profit and not for profit, large and small. It’s not the reserve of the Silicon Valley elite. It doesn’t matter whether you’re a 100-year-old insurance company or a scaling tech Start-Up DevOps makes a difference and not just to I.T. performance but actually to the success of your organisation.
We’ve seen every year that throughput and stability are not trade-offs. And I think traditionally they’ve often been viewed as trade-offs. If you want to get faster you do so at the sacrifice of stability but you can have both. You can release frequently, you can release quickly, release with fewer incidents, and you can have a faster recovery time and better availability. And the people in the High Performance Group and the elite performance group are doing just that.
We also see the DevOps adoption is growing. There are now more organisations in the high performance group than ever before. Which is probably part of what’s encouraged DORA to introduce their new classification this year, in 2018, which they’ve called their elite performers. Elites are a subset of high performers and they let us get a little bit more granular on exactly what’s going on in that top layer. But not everyone’s getting better, sadly. The gap between the high performers and the low performers is widening. The high performers are accelerating away from the low performers which is creating a class of companies that are probably likely to be left behind sadly.
Now if you haven’t seen a copy of the report I would really recommend downloading it. It’s a good read. It is very insightful and as well as these key headlines there’s also some deep dives this year into open source and open source technologies and culture. Looking at Capital One as a case study, its technology practices particularly around continuous integration and supporting technology practices as part of that.
There’s a lot on outsourcing this year which is really interesting. Outsourcing was the topic of our first DevOps discussed webinar. There is a strong link, that people should be aware of, between what they call functional outsourcing and the low performance group and that’s worth a read and some of the challenges outsourcing introduces to organisations.
And then, of course, this year for the first time they’ve done a bit of a deep dive into Cloud which is really exciting. That’s what we’re here to talk about today. But before we do that and Colin and James, just more generally, on the report what stood out to you guys particularly from this year’s report?
Yeah, thanks Rael. I think for me it was this introduction of the elite performers, that they’ve got this new subset of high performers that are really pushing the boundaries of speed and stability. But also not just the fact that that group exists but it’s actually that its 7 percent. That’s a not-insignificant number of organisations that are really striving to be as good as they can be.
We use the phrase unicorns a lot in the DevOps world to be either high performing people or high performing organisations but actually if it’s 7 percent they’re not really mythical beasts anymore they are actually those big organisations. Not just organisations that were born in the Cloud or have been using DevOps and the sort practices of DevOps for a long time.
You mentioned Capital One, they’ve gone on a huge transformational journey over the last few years. So yeah I think that high performing and elite performing status is something that’s really interesting to me.
For me, it’s going to be the medium performers actually. The cautious people within the medium performers, the misguided performers they call them, they have a low speed, low deploy frequency but they’re comfortable with this. And I think that’s the thing, they’re comfortable with the idea that things are incredibly slow and take a long time to work. It does mean they do have a low change rate failure because they’re not actually making changes at any speed but any meantime to recovery is very high. The reports talks about one to six months to recover from a system failure which is absolutely awful from my point of view. For me, they need to get that mindset to really improve and to get into the high performers and definitely into the elite performers.
Yeah. So the cautious performers was a new introduction this year as well wasn’t it? Again as a subset of medium performers. These guys who were releasing very infrequently but they spent a long long time testing all of their very infrequent changes and generally they’re fine. But when they go wrong they go really really badly wrong don’t they? Yeah. So that’s quite interesting. Certainly, that’s almost a little bit of a cul de sac to getting to the high performance mindset. So thank you both.
Let’s have a look then at the headline findings on Cloud. So the thing we really want to get into is this; what really matters is how teams use Cloud services not just that they use them.
So Cloud is growing, I think we all know that the stats here in the first bullet point from Forrester and Forbes probably don’t come as a surprise to anyone. 67 percent of respondents said they were using Cloud of some sort. And you can see a breakdown of what they mean by Cloud on the right-hand side screen which is interesting.
Now the state DevOps report is a self-selecting sample so it’s not necessarily representative but DORA has done quite a bit work to cross-reference these numbers against other reports including RightScale State of the Cloud Report. And they do seem broadly representative of the industry as a whole which is good in this case.
And the finding then that we’re most interested in is that teams who are embracing all five of the NIST Definition of Cloud Computing, or sorry all five of the characteristics in the NIST Definition of Cloud Computing, are twenty-three times more likely to be an elite performance group. That’s a really staggeringly high number when you look at some of the other predictive multipliers that are in the report.
So let’s look at that in more detail. NIST is the National Institute of Standards and Technology and they first introduced this definition of Cloud computing in September 2011. And the Cloud market has changed a lot since then. But the enduring relevance of these characteristics is really quite staggering. So on the face of things in today’s market, dominated by AWS and Azure, they look really obvious.
Let’s open the discussion then with a question for James and Colin. Given these seemingly fairly obvious characteristics of the hyperscale Cloud providers. What’s stopping people adopting these characteristics into their applications?
So if I start at the top looking at on-demand self-service. You’re right, that is a feature of the Cloud. I don’t think you even consider them a Cloud provider if they didn’t have that capability. But I think what we see a lot is that when organisations adopt the Cloud, Ops teams tend to hide that self-service away from the delivery teams.
Now there are probably some really good reasons why they do that. Budgets in particular I think. With a traditional project model projects have a finite lifetime and then your Ops team is stuck with a run budget. So if your development teams, your delivery teams are spinning up lots of resources I think that can be quite nerve-racking for Ops teams. They say we could walk in tomorrow and find out we spent a lot of money. And I think the other thing that does is it creates a lack of trust between the two groups.
With the on-demand self-service, it needs to be in the hands of the people that are actually using those services and for the most part that is the delivery teams. You need to be able to manage that stuff. An operations team needs to be able to provide safety nets, examples of best practices for provisioning infrastructure and ultimately monitoring to make sure that budgets aren’t being spent without any oversight. For me, that’s really what the on-demand self-service means.
I think actually for me that segways very nicely into the very last point on the list, the measured service element. Because the service is measured you are going to need real-time granular insight into what is happening with your product, with your infrastructure. That additional visibility can give you that segway into the setting up policies, for example, to create those safety nets. You then are opening up the on-demand element to dev. You can actually spin up what you need when you need it rather than having to worry about running out of budget.
That goes back into allowing teams to be more self-managed they can look after themselves. You don’t have to go through different systems to be able to bring up a particular resource or log that ticket that means that you can have to wait forever to do something. You can just say right you have this budget available to use it in a way that you feel that you’re happy with, that gives you the ability to do what you need in the time that you want. Which actually gives you a much better way of dealing with new features as they come along or support availability issues. Or anything like that.
But at the same time the speed of that insight is incredibly imperative as well. For example, what about a security flaw that exists in your system? What happens if someone accidentally left a secure access key within a code in a public repo and the next thing someone out there is spinning up loads of GP based instances to do some bitcoin mining on your money as it were. That speed of being able to react to that, those safety nets, the ability to respond are all very key to ensure that you don’t end up going out of business for something as silly as a simple access key.
Yeah, that’s a really good point. I think the role of dev in all of this is to make it easy for Ops to get out of the way. So if we look at things like rapid elasticity, for example, your applications need to be designed to take advantage of these capabilities. You can’t just build a traditional on premise based application, you need to be thinking about the Cloud right at the beginning of your journey. In order to keep your budget holders happy, be that your product owner or your Ops team, being up to rapidly scale up and scale down again, making sure your applications are stateless so that it’s very easy to turn stuff off. You’re not having to wait watching a load balancer, waiting for users to drain down before you can take resources away to be able to either decommission them or patch them. That elasticity has to be a feature of your application software as well as the Cloud that you’re running it in.
Great points guys. I’m going to pause really briefly just to just do a quick bit of housekeeping because it seems that our Q&A function may have switched off when we switched off the chat function. If you’ve got questions and you haven’t been able to ask them I’m really sorry. If you could raise your hand, I think there’s a raise hand button that’s working, that would be great.
So let me just replay those first three points after that little bit. So I think what we’re saying is that delivery teams need self-service access to be successful but they’ve also got to effectively build that self-service access into the applications. So the applications can scale themselves rapidly, elastically, building in the observability at design time. But to do that they’ve got to consider the fact that Cloud is a measured service and that means that we need safety nets effectively, in place to catch things quickly if and when something unexpected happens.
I think that’s probably the key point about all five of these actually is it’s not enough that they exist in the Cloud platform you have to actually use them in the applications to get benefit from them. What about the other two? What about broad network access and resource pooling? Because I think to me these are probably slightly less insightful is that fair to say?
Yes and no. I think from my point if I take broad network access for example. That was written by NIST back in 2011, back when smartphones and tablets weren’t novel or people had them broadly to hand. I think we all know the answer to this. It’s just not broadly implemented everybody has it on their mind when you pick up your phone to do something we need to do something really quickly. It’s not seamless at the moment. And I think that’s probably a mix between the two. People know that it needs to happen just people aren’t doing it. I think that’s probably why it’s not really less insightful, it’s there, just not really captured upon.
How about you James? What do you think about the final one the resource building?
So that is an interesting one. I think that as traditional Ops teams when we work in operations where we’re very used to caring about what server is in what rack, what VM is running on which host, those are things that are really important to us. I think the big mindset shift is in the Cloud I don’t really care, or at least I shouldn’t need to, that I don’t need to know what VM is running on which host I don’t necessarily even need to know what data centre this stuff is running in. If you do have to care about those things then it’s not really the Cloud.
This might be a slightly controversial opinion but that’s why I’m slightly sceptical that there could ever be something called a private Cloud because you still have a team of people that really need to care about those fundamental things. I guess it’s fair enough if you’re abstracting it away from the development teams if you can provide the other features like the elasticity and the on-demand self-service then you could call that a private Cloud. But if you’re really worried about hosting stuff then that’s something that you really want to try and move away from if possible.
Yeah, I think that’s some really good points. I don’t think it’s too dangerous to say that a lot of private Clouds aren’t really private Clouds. A lot of private Clouds don’t meet those essential characteristics. I’ve been a product manager for more than one service provider Cloud platform that hasn’t technically met all of the definitions of what a Cloud platform should be. There are plenty more out there besides the ones that are the companies that I’ve worked with in the past. They are very definitely part of the public Cloud platforms and I think that ties in with that Gartner’s slimming down of their magic quadrant for infrastructure service this year so dramatically as well.
We’ve talked about some of the characteristics of Cloud. Let’s move on now to have a look at some of the technologies and that exist and some of the technology accelerators that are discussed in the report that are associated with Cloud computing.
The first thing that strikes me about these statistics is that none of them are anything like as predictive of high performance as the five essential characteristics that NIST defined. So let’s just explore thoughts on these technologies and how they play into high performance. So, Colin, do you want to kick us off with some thoughts on infrastructure as code.
Yes. Why not. And the first thing I’m going to do is quite literally rip a quote from Wikipedia which is actually the Wikipedia definition of infrastructure code. I’ve slightly changed it to make it work but it should be close.
Wikipedia describes it as infrastructure as code is the process of managing and provisioning computer resource through machine readable definition files. Now I think that’s very key on this, you’re moving your infrastructure into essentially text-based format that can be placed in repos, that can be version controlled and not having to deal with wizards or anything like that to spin up your infrastructure.
For me, this is a foundational enabler to doing everything else before you get into containers, PaaS, Cloud Native, all that stuff. Start with the infrastructure as code level. You have to define it from somewhere and that is the best place to start. I think that’s why it’s 1.8 times more likely to be an elite performance group. It’s the thing you start with. It is the core of getting yourself into that Cloud native mindset as well. And that’s my thoughts behind it.
Brilliant. So infrastructure as code is an absolutely foundational investment really before you do anything else on your journey towards high performance in the Cloud. What about containers then? I think we were all a little bit surprised.
I agree Rael I think we all were when we saw 1.3 times is the lowest of these indicators given the amount of hype there’s been over containers over the last few years. That’s been the buzz word from a technology sense. Everybody’s been talking about containers for the last few years.
I needed to dig a little bit deeper into this to try and understand what was actually going on. So I think when you actually look at the report 1.3 is for people using containers in production. If you actually look at people who are using containers in general then it’s 1.5.
That confused me slightly because I have this dogma that you should be using the same deployment mechanisms in your prod and non-prod environments. I’ve been bitten enough times in the past by having a different deployment process between my environments that everything goes smoothly until you try to go live and then everything breaks loose because you haven’t tested your deployment process properly.
So that confused me. But then I looked a little bit deeper and actually if you look at the number of people who, when we talk about people who are using containers in production 50 percent of the people using containers in production are just using containers in production. 50 percent of people using containers in production are using containers across their whole environment estate.
And then it started to make a little bit more sense to me that the usage of containers that actually the people that were getting the most were people using containers across the board. People that were just using containers of production are perhaps slightly less performing and maybe that does support my personal dogma that you should be using the same tools across your environment chain. There’s a little bit of speculation there but from the numbers that seems to be what’s happening.
I guess the only other thing I’d say about containers is that the real big advantages of containers are that you could be Cloud independent. You can deploy them on-prem you can deploy them into the Cloud. Most modern Cloud providers have some sort of Kubernetes as a service or you can just spin up a VM among containers there. So they are agnostic to the Cloud and I think that’s a real advantage of containers.
Great thank you for that. Now onto PaaS, platform as a service.
I see PaaS as a step into being Cloud Native. It is another level abstraction that you can put your application on. The bit that surprised me was only 24 percent of the respondents said that they were actually using PaaS. I think that’s something that I’ll come into a bit later. But for me, that surprised me as being very very low.
A lot of apps can easily be re-platformed into PaaS without any major rewrites. So Azure SQL and RDS as a platform as a service for databases. In a majority of cases, you could actually uplift your database and move it into an RDS system or Azure SQL. Elastic Beanstalk as well. They’re all quick ways of getting applications started on PaaS but with minimal rewrite.
That then allows your DBAs and app admins to use the knowledge they currently have to add real value to their expertise. They’re not having to worry about the underlying hardware how it’s hooked together, they can literally connect into their end points and go right. Here we go. This is us making changes. This is us making significant updates to applications without having to worry too much.
But for me, and I think this goes back to the beginning about the respondents saying that only a very low percentage are using it, PaaS as a definition I think evolves every single day. And that could explain why it’s quite low. I think a lot of people don’t sometimes realise they are using PaaS. Some people might turn round and say oh I’m using RDS. I’m using a database system. They may not realise that actually is that is a PaaS system at the back end.
And again new services keep coming out every single day. For example the Azure Cosmos DB, that’s a globally available system that touches on the boundaries between PaaS and Cloud Native. Or even the Aurora service DB from AWS. So moving from PaaS into Cloud Native in a very very cloak and dagger way.
A cloak and dagger and move to Cloud Native.
We’ve got some great chat going on the chat, which I’m keeping at keeping half an eye on, about CNDBs and modernisation CNDB function which is good. I think we’ll let that carry on in chat as Steve seems to be doing a great job of leading that conversation. But if we scroll up slightly in the chat window there’s a great question here from Raj. I think it relates more to our previous slide than this slide but let’s just take a little diversion before we get on to Cloud Native.
So Raj says that on-demand and rapid elasticity can provide great benefits for engineers and systems. But how do organisations stop costs spiralling out of control? Many organisations move to the Cloud for cost benefits, particularly mode one adopters. Maybe less so with mode two, I’d say on that, but it’s a very good point. Certainly, a lot of people are seeking cost benefits. So these two characteristics could scupper the business case. What’s our advice? So guys, any advice for Raj?
I think a couple of points and I’ll let Colin come in on this as well because I think he’s going to have some really good insight. But when we talk about it is that monitoring. It’s about providing that best practice that we’ve talked about infrastructure as code, that if you’re providing templates if you’re setting policies to make sure that things can’t go out of control.
Taking an example from Azure you could give your development team an Azure subscription but you could set some limits on the spend on that. And then if you have some monitoring on that as well, to be able to understand what’s their current spend versus what they’re actually allowed to spend. Then you give them the benefit of that rapid elasticity and on-demand provisioning but you’re still putting controls in place to make sure things don’t go completely crazy and all of a sudden, like Colin said, you come in the next day and you suddenly find you’ve got a thousand bitcoin mining machines running in the background.
I don’t know if you want to add anything to that Colin?
I mean for me it extends on that. I think it’s getting to know the systems that are in place that actually already do monitor costs on these systems. A lot of the Cloud providers provide such granular access to your billing records and bits and pieces like that, that means that you could analyse that. And then there are third-party tools out there that can take that to the next level so you can actually have visualisations, you could have alarms based on specific spends.
You could also use that to restrict the way that you build things. I know of a particular customer that would use tagging in a very particular way. So not only it gives them access to the resources that they need, it allows them to start up what they actually need in a restricted way, but also tags on the way that they can actually monitor the spend they’re doing as well. So I would definitely look into not only just what the Cloud providers can provide but looking at the way that you would do cost spend within your company as well.
I mean a good example is how is the Cloud any difference to you actually using an expenses system for example? A lot of people have company cards to hand. How do you stop the expenditure on those company cards going out of control? You set limits on it, you use the tooling that exists already for that particular card provider to not just restrict but gives a level of freedom at the same time as well. And I think you have to look at it in that case.
Plus my final piece of advice on that as well is let the engineers actually have a bit of freedom. If you restrict too much you’ll find that the engineers will start getting complacent with the way they’re doing things, they probably will just start spinning up based on what spend they already have. If you give them a little bit of freedom to be able to spend money in the way that they want, they’ll be the ones being more in control and more thoughtful about the way they actually spend money on your infrastructure, on your company spend as it were.
Thanks, Colin that’s some great insight. So a couple of key points there, one tagging I think anything on cost management tagging is the first step to getting a cost under control, isn’t it. Making sure things are correctly tagged at the start so you can understand what you’re spending money on and then start controlling it.
Then the second thing is tooling. I mean I mentioned the RightScale State of the Cloud Report earlier, of course, RightScale are one of the leaders in the Forrester wave for, I’m going to say the Gartner term here, Cloud service expense management. Forrester call it something slightly different. There are a bunch of other tools out there as well. Cloudyn, obviously bought by Microsoft not so long ago, does something similar. CloudCheckr, I know Colin you’ve done a lot of work with it in the past. There’s a lot out there that will augment the Cloud providers native capability and work multi-Cloud as well. So it’s worth looking at some tooling there to help you out. It’s something where software could definitely add some value to the human element.
So the last thing on this slide then and we’re gonna use this as a segway into our next topic, is Cloud Native design principles. So the State of DevOps Report finds that teams whose application was originally designed and architected to run on the Cloud are 1.8 times more likely to be in the elite performance group. For me, that raises all sorts of questions.
The first thing is why is it so low when adopting the five essential characteristics of Cloud computing gives you such a high multiplier by comparison? You could say, you might say arguably if you were being confrontational, if something was designed and architected to run in the Cloud but doesn’t embrace all five of the essential characteristics then it wasn’t very well-designed. You might have a fair point but you’d also be neglecting, I suppose, all the good thinking that we have around building MVPs (minimum viable products) and around iterative design. So I think that would be slightly unfair. I guess the next question it raises for me is, is that really Cloud native? I suppose it is in a way in the most literal sense. If an application was designed to run on the Cloud it was born in the Cloud I suppose quite literally but for me, Cloud native has always meant a lot more than that.
So we went digging on the Internet with the help of Google and not Wikipedia in this case actually. Maybe we should have looked at Wikipedia, that might have given us some authority. But we went digging for definitions and it turns out there are rather a lot of definitions around what constitutes Cloud Native.
Here are some of the recurrent themes in the various definitions of Cloud Native. Colin do you want to kick us off with some thoughts on what Cloud native really means?
No problem. I’ll have a go at this. These are my opinions actually on what Cloud native really means. For me, the first thing has to be those five essential characteristics of the Cloud. You’ve got to have that as part of your application, that way of thinking, you’ve got to have those five in there as such.
The next thing for me is to be Cloud Native and I think this is probably something that’s a bit controversial as well, is the agnostic portability. A lot of the time it’s the future step, I know later on we’ll talk about this but for me being agnostic means that you are truly Cloud Native. It doesn’t matter what the underlying Cloud you’re running on, your applications can run everywhere from there.
And then for me, it’s the joining factor, is the 12-factor app. I think that’s very key when designing your Cloud Native application or service. I mean the 12-factor app can be used on many other Cloud systems, Cloud ideas as well. Internet of Things is an example of things that can be applied to 12-factor app as well.
So, for me, those are the three things that I feel really count towards what Cloud Native really means. I think James can probably take on from here what his thoughts are.
Yeah, thanks Colin. I guess I don’t know if this is some inherent bias in the person who designed this diagram, Rael? But the first thing that stands out to me is that it looks like there’s a Dev Ops split in here. We’ve got some operability stuff on the left-hand side around policy-driven resource allocation, essential characteristics, operability by design. Then there’s a big cluster of stuff that feels like it’s more Dev centric where the 12-factor apps, containers, microservices etc.. So, that was just an interesting observation on this diagram.
I think from what Colin was saying around agnostic portability but also that we’ve got to think that saying that you’re leveraging the native features of the Cloud. So that feels like a bit of a contradiction to me that you can’t necessarily have both. You’re either portable or you’re leveraging negative capabilities.
I think that’s an interesting jumping-off point really. For me, it’s making sure that we’re using the right tools for the job. If you’re thinking about time to market and if leveraging the native CSP feature set is going to help you get your products out there quicker, that’s probably a sensible choice. If you’ve got a mature product that’s making you money, doing good things, then portability, being able to switch around between Cloud providers, may be a higher prerogative for you. But I think that there’s some pragmatism required in looking at this diagram.
It’s a really good point to call out. I know you were going to come on to that point anyway. It was quite interesting to see it coming up in the chat, just around that conflict between leveraging the native CSP features. Particularly around serverless and being agnostic.
I said we were going to come back to containers when we were on the last slide. Containers feature in lots of definitions. Even some of the really simple ones that have got three factors of what it means to be Cloud native and containers are there as one of those three. Now I know there are a lot of container fans out there, particularly in the DevOps community, so I’m going to tread carefully here. We’ve often viewed containers as almost a little bit of a holding pattern until people properly get their head around serverless. Is it right that containers are part of the definition of Cloud Native or part of any definition of Cloud Native? Do you think they have a place there?
Yeah, I think so. I say at the end of the day we have many tools in our kit bag now and that’s really what we should be looking at. We should be using the right tool for the job. And there is a place for containers, there is a place for things like lambda or Azure functions. I think really being Cloud Native I think it means you’re using all of the tools available to you in your Cloud provider and making pragmatic decisions about which features you really want to take advantage of and which features are less important to you.
Yeah, I think I can expand on that as well to answer a question on the agnostic portability side of things as well. You have the idea of the serverless elements like for example a lambda function. You probably hear behind the scenes that actually it’s controversially a container behind the scenes.
For me to be Cloud Native and the agnostic portability bit with serverless on top of it as well. I remember seeing a presentation from Netflix a while ago where they did stick with one Cloud, they leveraged the CSP feature set within a particular Cloud provider. Then they realised well actually for them to be truly global and truly agnostic they needed to build something on top of that. So they built an application that sat on top of all of the individual Cloud providers so that their app became agnostic on top of it. And I think it’s just levelling in that way.
So being Cloud Native for me, agnostic portability, there is a way of doing it. You need to think about both development and Ops side. Think together and think of how to actually make that happen. And I think that’s how you could then leverage service on multiple different levels going forward from there.
Great. That’s an interesting insight I think. So, I think we could talk about these all day. I mean we were talking about this in the build-up to the webinar and that the 12-factor app right there in the middle. I mean that’s almost a series of twelve webinars in and of itself. Let’s take the opportunity to leave our attempted definition at Cloud Native there. I don’t think we’re going to get one totally nailed today are we? But that was an interesting exploration.
Let’s move on to, based on our loose definition of what Cloud Native means, is it worth it? And particularly, just to set out the circumstances, I suppose if we’re building something new going Cloud Native is almost taken as read really. We should always if we’re building a new application and service we should always be aiming to build in a Cloud Native fashion. I think it’s difficult to see a scenario where we wouldn’t be aspiring towards that.
But the more interesting thing is, what about if you’re migrating an existing app? What if you’ve got an existing application or an existing service either in the Cloud or not yet in the Cloud, is going native really worth it? I mean it’s certainly not for the faint-hearted. James, what do you think about this?
So I think it’s thinking about your business case. To start off with we’re making an assumption that going Cloud Native means a rewrite of your application. Because it probably wasn’t built with the Cloud Native in mind and therefore not a lot of that application can be reused easily in the Cloud.
So if you’ve got an app that’s quite happily sitting there, making your business money, doing all the things it needs to do, doesn’t need to change that often. I think you’re going to struggle to make a business case to say we want to rewrite this piece of software. I think that there are other routes you can take rather than trying to make things Cloud Native.
The other thing you need to take into account is if you’re starting this journey into the Cloud if the Cloud is a relatively new idea to you, there’s a lot to learn. You don’t want to put too much pressure on your developers or your Ops teams to say learn all of this new technology all of this new terminology. So I think you want to focus your efforts where you’re going to get the most bang for your buck.
Following on from that, it’s not just about the tech it’s about the people. AWS, for example, is massive it has a massive feature set and they release hundreds of new features a year. Getting that understanding it is going to take your team’s time. You want to make sure you’re focussing your efforts, doing the right thing, building the right business case and building the right apps, migrating the right apps or services to Cloud Native because it can take a long time and it can be quite daunting.
Great. Colin, have you got any further insights on this? What are your thoughts?
Okay. Well for me I think one of the biggest things we need to learn is to deal with the unpredictability and unreliability inherent within the Cloud. If you think about it your data centre, for example, you knew what was going on you were the one standing in front of the rack you could then do what was needed to ensure it came up online. When it comes to the Cloud you’re handing all that responsibility over to someone else, a third party, and you lose the visibility on that. I think you just need to realise that there is a level of unreliability and unpredictability when it comes to that.
And that I think is further exacerbated when you try and run micro services. I think because you’re now running individual bits dotted around individual elements within the Cloud you probably have to keep in mind that something may not run. So you need to design almost like design for failure as well.
But because of all this, some people resist that level of change because it is a big change. It is a big mindset change to be able to suddenly go well someone else is dealing with my stuff, I need to deal with my small bit now I can’t see what’s going on. Yes, that is a very natural thing to feel but I do feel there is a very valuable role for people within an infrastructure and operations background in making the Cloud effective for the development teams.
They could come on board with their ideas from what they’ve seen in the past when they have been there sitting with the actual physical virtual hosts running bits and pieces on top of that. They can use that knowledge they’ve gained be able to build out a structure for moving into the Cloud a lot easier than before.
It is interesting talking about people’s backgrounds and the traditional infrastructure operations background and its ongoing value in a new operating role for I.T. We’ll get onto that in, I think, two slides time if I can remember how this flows. Let’s first take an opportunity to look at the Migration journey from traditional data centres to Cloud Native.
I’m going to switch over the slide, but before we start talking through this slide and there’s a question, well comment really, that’s come up in the chat about app modernisation patterns. Fascinating wide-open topic at the moment. Do we want to cover off any views on app modernisation patterns before we dive into or do you think it’s going to come out?
I think something will definitely come out as we go through this diagram.
The questions are one step ahead of us. Great. Let me tell you what’s going on here. This diagram is essentially our take on the five or six Rs of Cloud migration. So Gartner originally set out their five Rs of Cloud migration in 2011. 2011 as it turns out is quite a busy year for Cloud predictions. So NIST set out their five things and Gartner set out their five with a slightly different slant. In 2016 a bit more recently Stephen Orban who was then enterprise strategy director, I think, at AWS he revised them slightly and came up with AWS’ 6 Rs. I’m assuming people have probably come across these Rs in one form or another at some point. Both of those sets of Rs are well worth a read if you’re not familiar with them but we really like to keep things simple.
For me, the problem with all of these Rs is that a lot of them aren’t very intuitive. Some of them are, some of them more intuitive than others. It feels then a bit like the rest of them got shoehorned in to keep the alliteration going. Then with AWS redefining refactoring, so it means something slightly different to what Gartner originally set it out to mean. And refactor was already a word that had a meaning in the software development industry anyway. It all gets a bit confusing. So, we thought we’d keep it simple. This is our version of the five Rs or six Rs depending on which set you’re looking at.
One of the things I will just say about the five Rs, one of the things they do quite well is that they, to their credit, is they have got a bunch of options that we don’t really cover off that essentially centre around the concept of not moving. Now I don’t necessarily think that not moving should feature in a set of options for moving but maybe if you’re looking at an application estate you do want to consider not moving some things. So the three that are just worth touching on that we definitely miss out here are the concept of retiring i.e. just turning something off and not moving it. I think there’s a big percentage of savings that come in any Cloud adoption from doing that. The option to retain. So, just keep something on-premise. And then the option to replace as AWS call it, which essentially is the move to SAS, the move to software as a service. So they’re missing from ours, intentionally so, because we really just want to focus on the options of actually moving an application itself into the Cloud. Colin do you want to talk us through how this actually plays out in real life, in real enterprise scenarios?
I’m working with a large client at the moment. They have multiple global data centres. They decided to move everything to everything to AWS, they desperately wanted to go native and they tried.
They did a great job, they’ve moved a lot of their applications across natively but then data centre closure deadlines came into play, harder release rest cycles came into play. Suddenly it’s we need to get this all over out of our data centres. So they started using the Evolve route by getting things into containers and PaaS elements and towards the end, they started lifting and shifting.
One of the key things that they have in their mindset is that the migration does not finish once they’re in AWS, or within the Cloud. They realise that that’s one step and they need to continue from there. So that’s where the refactor road and rewrite road come into play. This is what they’re doing at the moment. We’ve got their stuff and we’ve helped them move elements into the Cloud.
They’re now looking at ways of changing from IaaS into PaaS into Cloud Native. That’s something I really want to stress on here is that a migration is never finished. Once you’re in the Cloud and if you’re trying to go Cloud Native you do have to refactor, you do have to rewrite and modernise at a later time.
Fantastic thank you, Colin. So while you were talking through that, I’ve just been doing a little bit of Googling because that’s what we all do right?
So in the chat, we had a comment about it would be interesting to see what percentage of savings you get from turning stuff off, from just retiring stuff if we move it. Which is a really good point actually.
If you look at AWS’ six Rs of Cloud migration they’ve got a comment in there, under the retire option. I think this is taken from Stephen Orban’s original blog where he says ‘we found that as much as 10 percent, I’ve seen 20 percent of an enterprise I.T. portfolio is no longer useful and could simply be turned off. The savings could boost the business case, direct your team’s scarce attention to things that people use unless it’s a surface area you have to secure’. So a really good point. 10 or 20 percent. I’ve been involved in plenty of data centre migrations, plenty of Cloud migrations and I don’t think that’s unrealistic.
Great insight and yes we are going to get a T-shirt printed with this diagram on I think. That’s a great suggestion thank you, Rob, for that one.
I’m struggling to keep up with everything that is going on in chat, so let’s move on to our final big topic for this discussion and talk about operating models. Really how high performance infrastructure and operations practice tends to look in many of the businesses, certainly many of the more mature Cloud adopting businesses that we’re working with. James do you want to talk us through what’s going on here?
Yeah sure. So first of all a bit of a caveat. This is obviously a very high level diagram. Don’t rush off and implement this in your organisation. That’s probably not a great idea.
Don’t try this at home kids!
This is to have the discussion around it’s not a target operating model itself. I think the first thing I’d want to come onto is look at this product teams. We’ve got these product teams in green in the middle. Now that’s a modern product team, it has developers, it has people from operations on it, it’s very product-centric rather than project-centric.
So you bring the work to the teams and the teams are long-lived so that you get that benefit from Tuchman’s team. You go through the stages, you build a high performance team and those teams are empowered and autonomous. They can do their own thing.
I think that then around that you’ve got those modern operations practices. You’re protecting people from unplanned work, you’re sharing the responsibility for uptime with your product teams and you’re balancing that speed and stability and getting rid of the toil, as Google terms it. So we’ve got that triage or sight reliability engineering function there.
Now I think the key point in there is with things like sight reliability engineering it’s not necessarily for everyone, even if you do adopt it not all of your products will have site reliability engineering because your product team should be able to build and run the code that they’re developing. I think that if you do have that triage or site reliability there then it needs to be a shared responsibility. We want to get rid of this old devs just throw things over to ops they catch it and see what explodes. That’s really core to this operating model.
Underpinning that you have your platform services, I like to think of these as enablement teams. These are teams that help your product teams and your SREs or triage teams or whatever, help those teams deliver their work. They’re providing self-service themselves for things like continuous integration, monitoring, security all of the things that you need to be able to deliver stuff to the Cloud in a repeatable, reliable way and gain insight in what’s happening to the stuff in the Cloud.
You need those enablement teams to make sure you’ve got the right tools and infrastructure in place to allow that to happen. But these teams should be enablement teams not doing teams. You don’t want them to become a bottleneck because the three people on your continuous integration team are busy building pipelines for someone else so they can’t build your pipelines. They should be experts in these things but not necessarily doing the work.
When you bring all that together you’ve got those modern operational practices, you’ve got that product-centric cross-functional delivery team, and then you’ve got those enablement teams helping everything happen.
I think a big thing to call out here is that this will be different depending on the size of your organisation. If you’ve got a relatively small number of delivery teams you may not need the enablement teams because you’ve got that expertise in the delivery team itself. Or you’ve got that you build it you run it thing going within the product team. But as you scale out, as you grow larger, that’s when you need to start thinking about do we want to have those enablement teams. Do we need something like site reliability engineering to be able to really allow our product teams to focus on delivering value to the customer?
That’s I think what we’re looking at for a modern operating model. But as I say this is going to be very dependent on your organisation and it will change over time. It’s not an end state. You will grow into something like this as you build in those modern practices, as you understand the modern tooling, whether it’s the Cloud or whether it’s things like continuous integration and continuous delivery. As you learn and grow and as your teams become more capable you will grow into a new operating model. I don’t think it’s something that you can necessarily or would want to do in a big bang.
So I wasn’t meaning to encourage you to stop there. I think you were probably drawing to a close before I cut you off and I was, I have to say sadly, getting a little conscious of time because we have got four minutes to go before, I think, Zoom forcibly removes all of us from this webinar.
Thank you for that overview. I think we’ve had some really great insights on this webinar, some great discussion, some great questions from the audience. I’m going to see if I can conclude and do justice to all of those discussions with a little bit of a summary. So here goes.
Thing one Cloud is good do more of it. It’s not enough that your Cloud platform is very capable. You’ve got to expose these capabilities to your product teams and you need to use them in your applications. Which means you’ve got to do them, you’ve got to consider these capabilities at design time and build them into the functionality of the application. Infrastructure as code is foundational to that.
In fact, infrastructure as code is pretty much foundational to everything that we’ve discussed today. It is the gateway to containers, the gateway to PaaS, to Cloud Native, to rapid elasticity, to self-service, and developer enablement on the last slide.
So thing two start with infrastructure as code. If you’re not already doing infrastructure as code start. If you’re doing infrastructure as code do more. Make sure you’re doing source control, make sure you’re managing that code properly as well, not just writing things as code actually learn how to manage code properly.
We also need to see a bit of a mindset shift. So operations needs to become about enablement as much as possible. Operations essentially need to adopt a mindset of getting out of the way. But Ops still need to own this concept of protection. Protecting the business as it were. Guardians but in a very different way.
So there needs to be a mindset shift. It needs to be a mindset shift to safety nets rather than control mechanisms or control gates. So letting teams do what they want but catching things before they go wrong, or catching things really quickly as they start to go wrong. The good thing about this is that there is a really rapidly increasing amount of tooling that can help you do that.
That mindset shift becomes easier when we move from projects to products. Teams that have long-lasting responsibility for what they develop, for the services they provide, have greater encouragement and greater support for the business for acting responsibly in every regard. That’s really really key. Then think about the mechanisms that you need to protect and support those product teams. You still need Ops and infrastructure. Absolutely definitely. Likely both within those teams and outside product teams as well.
Traditional Ops skills and not invalid at all. As an I.T. pro, you are far more than just the tools you understand, your mindset, your experience of running apps in production at a scale is more valuable than just tools. Now sure you’re going to have to learn some new tech but I think the DevOps movement is really swinging back to recognising the value in operational practices, or value that operational practices do or bring to the equation when they’re done right. We see that in the State of DevOps Report, we see that as well, increasingly, on the agenda for DevOps Enterprise Summit. I’m going to use that as a Segway onto this next slide.
That’s pretty much us done for today. I’m going to say thank you to James and to Colin for all of your insight. Thank you to our audience for staying with us through my broadband issues and such like and thanks for all the great questions.
Our next DevOps discussed webinar, this is my little segway into this slide. Our next DevOps discussed webinar is actually just less than three weeks to go. We’re going to be broadcasting live from DevOps Enterprise Summit in Las Vegas at 1:00 p.m. Don’t know what time that is in Las Vegas probably over breakfast I would have thought, maybe even before breakfast. We’ll send everyone an email after this webinar with the details of how you register for the next one. We’ll also share some of the reading for everyone.
So thank you, everyone. Thanks for that. Thank you, James. Thank you, Colin, and have a great afternoon.
Accelerate mass-migration to the cloud and prime your IT for speed, agility, and innovation.
How to supercharge your Cloud migration with DevOps
DevOpsGroup CPO Stephen Thair talks about maximising the value of cloud through DevOps, and how DevOps drives cloud productivity.