X
Innovation

This startup thinks it knows how to speed up real-time analytics on tons of data

Making sense of the vast amounts of data gathered by businesses is a problem for business that Iguazio says it's cracked.
Written by Colin Barker, Contributor

AI applications are sprouting everywhere as companies look to build complex and advanced systems quickly. But there's one issue that can hold back their plans: performance.

Iguazio's Continuous Data Platform aims to help accelerate performance of real-time and analytics processing. ZDNet talked to co-founder and CEO Asaf Somekh to find out more.

ZDNet: Can you give me some background on the company?

Somekh: We're trying to solve the problem of running real-time analytics across a lot of data. In today's world people want to take data from machinery, from news feeds, chatbots, and so on. Being able to contextualise it with many other data forms to get to a decision and transition that into actions, recommendations, services and such.

Traditionally, around data you have things like data warehousing, but those are more batch-oriented and are not real-time or interactive.

SEE: Sensor'd enterprise: IoT, ML, and big data (ZDNet special report) | Download the report as a PDF (TechRepublic)

Generally, once you're getting into more real time, workloads and their technology are in-memory but memory is limited. And that limits the amount of data that you can store, analyze or act on.

That's the challenge we're addressing. Now you can split that challenge into two layers and our platform is addressing those layers. One is a real-time database layered with a design that has a couple of unique attributes.

One is that it's not designed for memory, it's designed for flash so it can, essentially, perform like an in-memory database but with something like 30 times higher density and at a lower cost.

Do you see that as a key point of distinction for you, that you're doing it in flash and not memory?

Yes. It works with the performance of memory but can still be significantly faster than traditional NoSQL or similar. We're acting just like an in-memory database -- a MemSQL, a Redis or similar.

iguazio-asaf-somekh.png

Somekh: "You can essentially write and read concurrently from many standard APIs to the same database."

Photo: Iguazio

But they're usually limited to something like half a terabyte of data per server. Whereas with flash today with NVME and adapters you can get to something like 100TB of usable storage on a single node. That's 100 times denser than what you get on memory.

And obviously flash is about 20, 30, 40 times cheaper than memory. That means you can process a lot more data in real-time applications.

The second unique distinction of our platform is that it does not create its own API. What we have done is create multiple models of a databases on a single organisation.

We can emulate columns and rows and screens and objects on the same data layer with different, indexing strategies.

Instead of coming with our own APIs, what we have created is a set of light-weight micro services that implement traditional APIs. You can essentially write and read concurrently from many standard APIs to the same database.

And we emulate all the Amazon data services like DynamoDB, Kinesis, S3. We have native integration with Spark, Python Data Science Frameworks, and many different tools.

We provide SQL support, and you can concurrently just write data as streaming data coming from sensors. You run queries against it through SQL and simultaneously run AI tools through Spark or Python interfaces. That all gives you a very fast pipeline for taking data, adding more intelligence to it through AI and putting it through the same end-to-end solution.

Beyond speed and efficiency, it also simplifies the work of the developer and, overall, building a solution that can present itself in many standard ways.

That's interesting because I have read a number of articles about you and not one of them mentioned the flash connection. Why do you think that is?

We have a lot of expertise in the company around these technologies and the interesting thing about our technologies is that, usually databases access flash through storage-layer file systems and the like. In our case the database accesses flash directly and gives it the attributes of memory and not the attributes of storage.

Essentially, it's providing a very high-performance data engine for injection, cross-correlation enrichment, AI and analysis and then serving it all in a single platform.

We essentially have micro-services that provide all those APIs and you can be flexible in adding new APIs to the platform. And that is essentially a layer of serverless functions that imitate those APIs.

Now, our customers tell us that they don't just want to run standard APIs like SQL and so on. They want to run their own chatbots or image-recognition logs or whatever. They want us to open-source this layer of embedded functions and give it functions that they can all use.

And this is where we developed this product called nuclio which is a real-time framework. So, now you have a real-time, serverless framework that can sit on the data and allow you to do, essentially, whatever you want.

SEE: Special report: How to automate the enterprise (free ebook)

You want to do logic, or image recognition, or speech recognition or translate all that to other data. Or address the APIs that you need for your customers and so on and do all that in a single platform.

Now if you look at our platform it contains those two layers: a very fast and flexible data layer and on top of it a layer of functions and application services, and all this tied together on the same high-speed bus that will essentially allow you to rapidly develop solutions.

And what else is unique about the platform is that it is really a platform that is a service. It's a fully managed offering. And it can be consumed either as a cloud-managed service or as an appliance, or a software stack that can be employed on-premise.

Can you give an example of users and how they're implementing it?

In the telco space, for example, there is a very large company in Asia and what they are doing is, essentially, planting our solution in their network and they can listen to all the network telemetry information from switches, from routers, from gateways and from other activities on the network.

They do that and then feed all of those into our solution. They can then cross-correlate all the information in real time and then use AI logic on the cross-correlated, enriched data. As a result, they can make predictions of network failure.

Because it's networks, they keep on flowing at very high concurrences, and our solution is capable of processing all that data. Using predictions and algorithms they can reroute networks, they can identify malicious attacks, they can predict outages.

Another example was a company called PickMe [Software], which essentially has systems running with very high concurrency, many users accessing the data simultaneously and getting to do real-time fraud detection, real-time pricing decisions based on all that pricing.

You're an Israeli company and it's early days for you, but what's your financial stage?

As you say, we are still an early-stage company and relatively new. We've just had our B round of funding. Last year we got around $30m in funding. Our investors include Verizon Ventures, Bosch, Del Technologies, and the CME Group.

We're still just ramping up although we released a new product about six months ago. We're creating a lot of strategic partnerships with companies like, for example, Google.

A lot of the performance benefits must come from your use of flash?

It's not just flash. Everyone can use flash. The problem is using it the right way. There's the computation element. Within the computing architecture, we're using vector instruction sets on the CPU from Intel because we use high concurrency in our software.

We're coming from 20 years' experience in real-time systems so we know how to tailor the software to leverage the flash correctly, to leverage the CPU and so on. We're using 100 gigabit Ethernet adapters on our solutions because we can actually saturate that huge balance with our software.

You might say, yes, you can get that performance from that huge capacity but, no, it's from being able to leverage it correctly by writing efficient software.

I think what differentiates us is not just the low-level expertise we have but being able to tailor that all the way up to abstract programming semantics.

Previous and related coverage

How to securely wipe the data off hard drives, SSDs, flash drives, iPhones and iPads, and Android devices

Here's a quick guide to securely wiping hard drives (HDDs), solid state drives (SSDs), flash drives, and even iOS and Android devices.

IronKey D300: Ultra durable USB flash drive with built-in encryption

IronKey is a trusted name, and if you want a USB flash drive that will securely and safely store your data, the D300 is worth taking a look at.

Decision factors: Do you need real-time analytics? (TechRepublic)

Real-time analytics is a hot topic--but not every company needs it. These best practices offer a real-time analytics reality check.

Western Digital adds NVMe, flash heft to data center storage lineup

After expanded its data center portfolio to include an object storage system, new all-flash arrays and hybrid platforms the company has rolled out the follow-up as it looks to enable big data and analytics workloads.

Carmera's human-driven sensors could pave the way for self-driving cars (CNET)

The mapping startup wants to provide up-to-date maps that can flag obstacles to autonomous cars.

Editorial standards