Finding the anomalies in big data with machine learning

Q&A with David Drai, CEO of business intelligence company Anodot.
Written by Colin Barker, Contributor

Anodot's approach to business intelligence (BI) is to use automated anomaly detection systems to discover the important signals in vast amounts of data noise, and to find the anomalies and correlate them across different data. It then turns that information into business insights.

The three-year old company says it has secured 'multimillion-dollar' bookings since it launched and has doubled the data analyzed by the company's proprietary machine learning algorithms in less than six months, to 5.2 billion data points per day.

ZDNet recently spoke to Anodot's founder and CEO, David Drai, to find out more about the company's technology.

ZDNet: Did you work at several companies before founding Anodot?

Drai: I did, but wherever I was, they were using BI and it struck me that BI really needed a shakeup. All of the companies in the field were doing almost the same thing as each other.

They were all focusing on visualization using dashboards or something similar. But when looking at, say, big data, these methods are not enough. This was the motivation of Anodot.

The aim was to identify those insights using big data, but without using dozens of dashboards and expecting the human being to sift through them to find those insights.

For example, if I'm in e-commerce and I'm tracking my revenues in each shopping store, for each mobile operating system and for each country the number of combinations is so huge that it's impossible to track everything -- you cannot expect the BI team to do that.

This is where Anodot offers a solution in the BI world. We call what we do 'autonomous analytics', which works exactly like an autonomous car that will take you from point A to point B to point C and so on. We changed the way that you look at the data.

Instead of you asking tons of questions and receiving insights via dashboards, we will give you all the answers automatically.

We developed the machinery that will gather all the relevant data and relevant insights from you -- and that can be millions of data KPIs, because we scale well in the cloud.

Whenever we identify significant changes and insights within your data that are relevant to you, we will automatically show them to you.


Anodot uses machine learning to find correlations in data.

Image: Anodot

For example, let's assume that there's bad weather and people do not go shopping as usual. We can correlate the data around those facts and give you the insight we found. The insight is that in a specific state, in a specific city, in a specific store, a certain product is selling less than usual. And we help you identify why that is -- it could be because of the weather, or because a competitor is doing a promotion.

This is the product that we came out with in 2016, and the traction is amazing. We have customers like Microsoft, vf Corporation [brands include Wrangler, Timberland, Lee, and The North Face], and e-commerce companies. Gett Taxi offers a service that will automatically get you a taxi -- they are a customer.

Our solution is agnostic. It doesn't matter if you're an IoT company, an e-commerce company, or any kind of company: we will learn your data. It could be the temperature in your machine, the output from your engine, the revenues of your shopping website or your e-commerce. We will track the data in the same way for all those verticals and then we will provide an insight according to your vertical.

In e-commerce applications, it's one thing to have some way of tracking the data, but at some point you will want to use AI, won't you?

Where does that technology come from?

We developed the learning algorithms ourselves. We have a team of data scientists who developed all of the algorithms and insights. We have 15 data algorithms that are learning what the data looks like and what should be the normal behaviour of the data.

Then whenever the data is abnormal, they will learn to identify how and why it is beyond the normal. This is what we do.

Is it identifying where the exceptions are?

It's not only exception-driven. The exception is a combination of many exceptions -- you call it exception, we call it anomalies.

We find the correlation between them and then we give you a story about what's happening in your tons of data, and why it's happening.

How long did it take you to develop this?

It took 18 months to develop the first version and we are innovating from there. We have a huge vision -- a road map -- in this area. For example, what we are bringing to the market in the next quarter is the ability to predict stuff based on those anomalies -- predicting when you will see another anomaly.

That is part of it. The other part is to then connect into an NLP [Natural Language Processor] so that we can give you the story in a human language.

That's important because if you are a marketing professional, it can be very difficult for you to understand the raw data. So our idea is to bring you the data in an intuitive way.

That is part of the road map, but we have many capabilities and features that we want to put in, because the more we have customers involved, then that will continue to help us grow.

BI/analytics/machine learning is one of the fastest-growing areas of IT, isn't it?

It is a fast growing area, but you'll find that all of those machine-learning and BI solutions are very vertical to a specific need. We look for an agnostic approach to attack all the different verticals together with a machine-learning approach.

I don't see anyone who approaches it in the way that we do. The fact that we are agnostic gives us a huge advantage. Although it's more complicated to do that, we can take use cases from one vertical to another vertical without the need to develop something special. We have found a common denominator for all of them.

And here, when we do the analytics functions, we do machine learning on top of them. This is why I call it 'autonomous analytics', because you don't do the work for the system -- the system does the work for you.

It is totally different from the BI solutions of today. With those, you need to provide queries to show in the dashboard, but then in order to follow those dashboards you need to understand what's happening in your views, what is happening in a specific region. You need to select a region that you're interested in. You need to track back your revenues for every region, for every operating system and so on.

Instead of that, we took the opposite approach. We opted for push mode, so we will tell you. We will tell you that we found a drop in your revenues in a specific region, without the need to look at a dashboard.

But the other thing we have done recently is to give you the anomalies directly onto your dashboard. We all get used to our specific dashboards, and so you can use your own dashboard. We can work with it so that when there is an anomaly, we can notify you on your mobile and tell you that we have seen an anomaly in a specific region that affects you.

How specific can it get?

First of all, if you compare us to regular BI, the regular BI will not tell you anything. It won't tell you that there is an anomaly or where it comes from. But here we are tracking all of the anomalies individually, so we can tell you.

First we track if there is an anomaly at point A, point B, point C and so on; then we will look at all of them and see if there is a correlation; and then we will look to see if there are anomalies in the correlated data.

And then we will drill down to find out more.

So, this is the revolution in the BI world -- to make it autonomous.

To give you some idea of the scale, today we are tracking 80 million metrics. When we find anomalies, there are many anomalies from all the different companies that we are tracking.

Every day there are 200,000 anomalies. To track the significant anomalies we reduce that to 8,000. We then further get it down to 2,000.

And this, of course, is the trick. You cannot track 80 million dashboards -- it's impossible. So we drill into the data and it shows many, many use cases of what the data is looking like. With each use case we have an algorithm that knows what is normal.

There are many different data types so if you are, say, an inventory, the data is usually very sparse. If you are a website, it is very smooth. It all depends on which vertical you are and we handle many different use cases.

So we collect all the data in from the many different use cases and we normalise the data. We can then visualise it and show you all the different KPIs and how they should look, and where the anomalies are.

Interestingly we are a small startup, but 50 percent of our customers are public companies.

If you look at what we are offering, it is a huge change from the way analytics is done today. Now, what you get is a lot of different charts which you have to go through and decipher to find out what is significant. With us, we show you what is significant and how and why we think it is.

Read more about business intelligence

Editorial standards