What Does AIOps Mean for SREs? It’s Complicated

If you’re an SRE, you might view AIOps with great excitement. By automating complex workflows and troubleshooting processes, AIOps could make your life as an SRE much easier.

Alternatively, SREs may choose to view AIOps with disdain. They might think of AIOps as just a fancy buzzword that doesn’t live up to its promises, and that can become a distraction from the SRE tools that really matter.

Which perspective is right? Should SRE embrace AIOps with open arms, or should they resist marketers’ efforts to position AIOps as the latest, greatest tooling innovation in the IT industry?

Those are subjective questions that we can’t answer definitively, but let’s at least gain some perspective by examining what AIOps means for SREs.

What Is AIOps?

As you’ve probably heard by now, if you keep up to date with your IT buzzwords, AIOps — which is short for artificial intelligence for IT operations — is the use of AI and machine learning to help automate IT Ops workflows.

The big idea behind AIOps is that, by using AI and ML to perform advanced analysis of large volumes of data from IT systems, IT and SRE teams can solve complex problems more efficiently than they could when using a manual approach.

AIOps can, for example, help to surface the root cause of a performance issue in a complex, multi-layered environment like Kubernetes. Or, it could make recommendations about how best to resolve an incident.

AIOps entered the IT lexicon in 2016 when Gartner coined the term. At this point, it’s a relatively well-established tool domain.

How SREs View AIOps

Despite the fact that AIOps has been around for some time at this point, it doesn’t yet appear that many SREs have bought into the AIOps revolution. Catchpoint found in a 2021 survey that just 7.5 percent of SREs reported that AIOps tools delivered “high value” to their organizations.

It’s unclear exactly why SREs report low rates of excitement about AIOps. But we’d speculate that there are a few key factors at play:

  • AIOps is a new term for an old idea. Many monitoring and observability tools have included at least basic AI and ML analytics features for a long time, starting before tool vendors slapped the AIOps label on their products. SREs probably realize this and view AIOps to some extent as an effort by marketers to rebrand functionality that is not actually fundamentally new.
  • AIOps is hard to implement. Setting up an AIOps tool requires integrating it with diverse data sources and customizing it to fit your workflows and environment. It’s possible some SREs view this setup work as more effort than it’s worth.
  • AIOps can’t replace human insight. While AIOps tools may be useful to a point, it would be unwise to place blind trust in AIOps-based analyzes or recommendations. For this reason, some SREs may believe that AIOps encourages organizations to rely too heavily on automated tools, at the cost of the expert analysis and perspective that only SREs can provide. (This is kind of like how it sometimes makes sense to prioritize human intuition and expertise over playbooks.)

From an SRE’s perspective, then, AIOps may appear over-hyped, overly complicated, and underperforming compared to traditional approaches to SRE.

What SREs Can Gain From AIOps

SREs’ wariness AIOps is valid – but only toward a point. It’s important not to let suspicions about the limitations of AIOps turn into excuses not to use AIOps at all. AIOps has some value to offer to SREs, even if it’s not perfect.

For example, AIOps can play a role in reducing toil. To the extent that AIOps tools can recognize complex patterns or interrelate data sets more quickly than human engineers, AIOps reduces the time SREs have to spend manually troubleshooting problems or poring over complicated information.

AIOps also helps to enable a more proactive approach to monitoring and incident management. If AIOps tools can alert SREs to emerging issues before SREs would otherwise recognize them, AIOps can help the SREs get in front of the problems before they turn into true incidents. That’s better for SREs and end-users alike.

There is also an argument to be made that AIOps can help SREs do more with fewer engineering resources. If you can use AI to automate some aspects of monitoring and incident response, you can maintain the same levels of availability and performance with fewer human engineers on hand.

Conclusion: AI Won’t Replace SREs, But It Can Help

None of the above is to say that AIOps can replace SREs, or that it magically solves every problem SREs face. Anyone who believes AIOps is a silver bullet has bought into the marketing hype to an unhealthy degree.

Nonetheless, AIOps tools do offer value to SREs. They make their jobs easier in some respects, and they can improve reliability outcomes.

So, while it’s wise to maintain a healthy perspective about the limitations of AIOps, SREs shouldn’t rule out AIOps tools as one way to improve reliability engineering.

.

Leave a Comment