Service Level Objectives (SLOs) are a great tool for driving customer value. They help companies find a balance between releasing new features and enhancements and maintaining site reliability, which are two of the things that have the greatest impact on user experience (and customer happiness). And the real kicker is, site reliability doesn't always have to be 100%. But how do you communicate the necessity of SLOs to managers or colleagues who may be less than familiar with them?
You may be thinking, "I can't tell my boss we're striving for less than 100% reliability. His boss and our stakeholders will think we're not concerned about building a reliable system!" However, once they have a better understanding of how Service Level Objectives (SLOs) and error budgets work together, this will shift the way your entire organization views reliability.
When your company adopts Site Reliability Engineering (SRE) practices—which use SLOs as metrics to determine whether your engineering team should keep building and deploying new features and enhancements, or press pause on that while they focus on reliability—you should be able to deploy product features more often, while maintaining or improving reliability, all with less toil. You'll be allowed a certain level of unreliability, which is tracked via your error budget, as long as it doesn't interfere with the customer experience.
Here's the deal: no service is perfect, and customers will always tolerate a small degree of error. They might not even notice it.
An error budget is the amount of error an aspect of a service, like latency or availability, can experience before the customer becomes unhappy. So, while SLOs are internal metrics, and error budgets are ratios, your customers actually have a pretty big influence on where you set them, because they're the ones who indicate how much unreliability they'll accept before it becomes a problem. You'll need to observe their signals in order to figure out what that is.
Once you've established SLOs, you'll need to track all the indicators that tell you that you might be getting close to using up your error budget. As long as you have a pretty good error budget left, you can keep building and releasing new features and enhancements. But if you're burning through your error budget, you need to push pause on deployments and work on site reliability so that your customers don't get cranky.
SLOs help everyone get on the same page about what to prioritize work-wise so that new stuff gets released, while the system is as reliable as possible, and most importantly, customers are happy!
Want to learn more about SRE, SLO, and error budgets? Check out a few of our favorite resources on these subjects:
- Isos presented a webinar on SRE and SLOs on August 25, 2021. Check out the recording here.
- Isos is working in conjunction with our partner, Nobl9, to offer SLO Bootcamps. Find more information on our bootcamps here.
- Check out Google’s blog post on SLI/SLO/SLA fundamentals.
- For a more in-depth read, pick up Alex Hidalgo’s Implementing Service Level Objectives: A Practical Guide to SLIs, SLOs, and Error Budgets.
And as always, feel free to contact us if you want to learn more about SLOs.