Amazon Web Services (AWS) has announced a new fully-managed operations service that uses machine learning to make it easier for developers to improve application availability.
It does this by automatically detecting operational issues and recommending specific actions for remediation, CEO Andy Jassy said, delivering the re:Invent keynote on Tuesday.
The service, Amazon DevOps Guru, identifies anomalous application behaviour, such as increased latency, error rates, or resource constraints that could cause potential outages or service disruptions, and then alerts developers with these issue details. It flags the resources involved, the issue timeline, as well as related events via Amazon Simple Notification Service (SNS) and partner integrations like Atlassian Opsgenie and PagerDuty, Jassy explained.
The aim is to help organisations quickly understand the potential impact and likely causes of the issue with specific recommendations for remediation.
"Developers can use remediation suggestions from Amazon DevOps Guru to reduce time to resolution when issues arise and improve application availability and reliability with no manual setup or machine learning expertise required," AWS says in a blog post.
"Application downtime events caused by faulty code or config changes, unbalanced container clusters, or resource exhaustion (e.g. CPU, memory, disk, etc.) inevitably lead to bad customer experiences and lost revenue."
Like many of AWS's customer-facing services, DevOps Guru has been used internally, with Jassy touting it as the culmination of 20 years of operational expertise in building, scaling, and maintaining highly available applications for Amazon.com.
Speaking with ZDNet about the new service, director of public sector technology and transformation in Australia and New Zealand Simon Elisha said ultimately, DevOps is all about doing a lot more with a lot less and moving more quickly. He said DevOps Guru is the right tooling to make that happen.
"If you think about systems today, they generate more information than ever, more telemetry than ever, warnings, notifications, messages etc and that's a good thing because you're getting lots more information, but it can be very difficult to figure out when is something changing, when is something different, and a lot of what DevOps is about is understanding the relationship between the changes you make in code and what happens in production and that loop," he said.
"The ability for anyone, whether you're in a large development shop or a very small operational shop, is to see what's going on within your environment without any manual configuration, without training any models, without doing anything except a few clicks."
Elisha said thanks to the pandemic, teams are realising they have to work more closely together, but that doesn't necessarily mean being physically together.
He pointed to AWS Proton, which was also announced on Tuesday, as a great example of bringing "infrastructure folks" and developers together, but in a way that allows both to get the outcomes they need.
Elisha said customers are saying they need people to operate in their discipline as well as they can, while still talking and communicating with others, but that they need the tools to do that effectively and easily.
"So the ability to send really robust guardrails around security or the versions that you're running or the way you choose to build, or the ability to innovate really fast, it's that combination which is why people look at things like DevOps because they want that, 'let's go really fast, but let's do things right at the same time'," he said.