Many technology professionals are now tasked with applying artificial intelligence and machine learning to business problems, with great hopes of improving product and service delivery. An emerging approach -- AIOps -- promises to apply machine intelligence to vexing IT problems.
That's the gist of a recent report out of Constellation Research, which makes the case for AIOps to improve the state of IT operations and help untangle the spaghetti architectures that have arisen over the years. "IT leaders face massive challenges to be efficient because they have added too many tools and have become siloed," says Andy Thurai, the report's author. "In addition to the fragmented data, many tools produce critical alerts for the same event, thus creating 'alert fatigue.'"
"AIOps is about applying AI to enhance IT operations," Thurai explains. "Contrary to some beliefs, it is not about improving AI with IT operations, but rather is the other way around."
AIOps then is a tool that potentially could increase the productivity of IT teams. Thurai provides seven good reasons to consider an AIOps approach to managing complex IT complexity:
Reduce IT noise and alert fatigue. "Today's IT teams are truly overwhelmed by the noise created by false alarms as well as by too many alerts for a single incident," Thurai writes. "The overwhelming amount of noise can create an alert fatigue." AIOps can help reduce such noise in the range of 80% to 90%, he estimates.
Speedier root cause analysis. In today's multi-cloud or hybrid environments, "it is extremely hard to pinpoint the underlying event that caused the incident," Thurai says. "The main issue with root cause analysis is piecing together logs, metrics, and traces happening in the same time frame across the full stack." AIOps helps shed light on the origins of anomalies. An AIOps solution also will also "show the incident timeline from the time the incident occurred."
Enhanced capacity planning and resource utilization. "With data-driven, AI-assisted mapping, you can deploy workloads on the right combination of servers, instances, and machines," says Thurai. "If a specific combination didn't work, you can adjust it in real-time and continue to make changes in real-time as well until it works as expected, without manual intervention."
Ability to correlate events. AIOps can play a role that can "group associated telemetry information together -- logs, metrics, and traces." It provides the ability to "look at associated telemetry information from various tools all together, on the same dashboard and at the same time, will give you a clear view of what is happening in the system and help identify the root cause fairly quickly."
Context/alert/incident enrichment. "Once an incident occurs, the first step the ITOps team needs to take is to figure out the context of the incident (what, when, and why) as soon as possible,' says Thurai. "A properly implemented AIOps solution will add context to the incident or alert, instead of notifying to death the support personnel involved."
Anomaly detection. "AIOps should be able to analyze all data and figure out patterns."
Self-healing and automation capabilities. "A good AIOps solution should either have automation in place or integrate with automation vendors via APIs to initiate remediation measures. For example, if there is a CPU or memory overuse, either rebooting or killing some processes might fix the issues without the need to create an alert, spark an incident, and waste IT resources investigating and remediating that incident."
Staying on top of all the requirements of today's array of systems can be overwhelming for IT teams limited in size, time, and budget. AIOps brings in intelligent digital assistance to help manage the day-to-day issues -- so IT professionals can keep their eyes on the business.