Like most buzzwords, the Digital Twin sounds both catchy and perplexing. Although it has been becoming popular as of late, having been featured by Gartner and used by Oracle and GE, it has been around as a term since 2002.
It was introduced by Dr. Michael Grieves at the University of Michigan, and it refers to a virtual representation of a physical entity. Originally introduced in the context of Product Lifecycle Management, the Digital Twin has gotten a second wind with the advent of the Internet of Things (IoT).
Digital Twins reloaded
Indeed, it sounds like a fitting metaphor. IoT is about, well, things, and having these things collect and send telemetry data as well as (potentially) receiving and executing commands through sensors and controllers.
A "thing," such as a drone or a car, typically incorporates an array of sensors and controllers that make this two-way interaction possible. The combination of data from this array of sensors that refer to the same physical object creates a virtual representation of the object: its Digital Twin.
If you're thinking there's nothing particularly new about this, you're right. But today it's the scale that is on a different level altogether.
Take cars for example: in F1 racing telemetry data collection has been a given for years. There's only a handful of F1 racing cars in the world however, so this was something that concerned a select few. Today, everyday cars are in the process of being fitted with an array of sensors which would have been unheard of even for F1 racing a few years back.
This means that challenges and opportunities related to data collection and modeling are now a widespread concern. Hence the need for metaphors to popularize concepts, and the revival of the Digital Twin.
A data architecture for Digital Twins
So, what kind of data architecture could one employ to model and process Digital Twins? To answer this, let's first expand the definition somewhat. Why limit this to physical entities? Could an eShop landing page for example have its Digital Twin?
Well, why not? Although there are no sensors in landing pages, there is an array of data associated with them, such as items it contains and user interaction related data -- clickstreams. Ideally merchants would like to collect clickstreams in realtime and use it to tailor their pages accordingly.
These clickstreams have something in common with sensor data: they are streaming in nature, which means they flow in constantly in realtime and in large amounts. This marks a shift in the orientation of data architectures towards streaming, which we have been covering.
Interestingly, streaming platforms such as Spark Streaming, Flink, or Apex are also oriented towards in-memory processing. In processing streaming data the assumption is that there is value in getting and acting upon the data as soon as possible. The choice to go for in-memory processing makes sense in that light, as it can lead to orders of magnitude faster processing.
As we recently noted, the cost of memory has been dropping, and there are new memory technologies in the works that promise to unleash even more capabilities. In-memory storage and processing architectures have been evolving for a while now, and there are vendors with more than decade-long presence in this space.
It was only natural that they would take note of the trend towards streaming data and aim to position themselves in this space. ScaleOut is one of those vendors, recently announcing ScaleOut StreamServer, an addition to its line of products having this exact goal.
ScaleOut does Digital Twins
ScaleOut was founded in 2005 by industry veteran William Bain. Bain, an expert in parallel computing with stints at Bell Labs, Intel, and Microsoft, says ScaleOut set out to deal with the problem of enabling web farms to scale to very high workloads.
ScaleOut initially focused on distributed caching, gradually evolving to in-memory storage and compute solutions in commodity hardware clusters. Bain says they wanted to allow applications to handle rapidly changing data, and then adding compute to storage was just a logical step away:
"If you have in-memory data storage in your cluster, it's very natural to add compute to that. You can analyze the data you are storing at very low latency and high scaleability because the data does not have to move."
Bain says in-memory data grids are particularly well suited for streaming data processing, because of their ability to not only ingest data fast, but also analyze them on the fly. But then again, isn't that what streaming platforms do as well?
"People in stream processing don't really talk about Digital Twins. Digital Twins is core in what we do," says Bain. He argues that while platform like Spark Streaming and Storm have their strengths and weaknesses, they were not really designed to deal with live data.
By contrast, he continues, ScaleOut is a data grid designed from the ground up for live data. For Bain, the difference is in the modeling and API:
"Take Spark Streaming: it is centered around micro-batching. If you want to do something like process clickstreams, you'd have to batch data from many sources together. You can do it, but it's cumbersome. Our approach is a more natural fit."
ScaleOut features an object-oriented API that enables modeling multiple data streams associated with the same entity and encapsulating properties and behavior related to that entity in a way that mirrors it.
Ring a bell? Yes, that does sound like a Digital Twin, which is exactly why ScaleOut is using this as its key message. Bain says they have been modeling their API in this way since 2009, but it wasn't until recently that someone pointed them to this direction:
"We were discussing our approach with a client and he said, that's a Digital Twin. We said, that's a great name. We were looking for a name to describe this capability, so we co-opted it."
A streaming platform killer?
Gartner apparently approves as well, if attention is any indication. A key analyst from the team behind the recently published Hype Cycle for In-Memory Computing Technology for 2017 was at ScaleOut's presentation in In-memory Computing Summit EMEA and seemed to appreciate time spent with the ScaleOut team.
Bain also emphasizes that ScaleOut does ingestion and orchestration at the same time, but clearly separate the two because of the encapsulation that object orientation offers. Plus it has a fully distributed, peer-to-peer design without a single point of failure. The result, he says, is super fast processing and clean design.
For example, when it comes to inference over ingested data, Bain cites use cases where rule-based and machine learning approaches have been used in conjunction or interchangeably while being transparent to developers, again due to object-oriented encapsulation.
All that is nice and well of course, but the key questions to ask is whether you should care about Digital Twins and terminology more broadly, and whether that makes ScaleOut stand out.
Buzzwords do have their place, but it's what they signify that is important. Whether you call it Digital Twins or by any other name, the ability to ingest and process data in realtime and act upon the results will be increasingly important going forward.
APIs and architecture are important beyond appreciation of elegant design. Although indeed modeling Digital Twins should be possible using any streaming platform, an object-oriented API out of the box will save time and effort.
Recently ScaleOut released a new version of its platform, in which streaming is a first-class citizen. But does that make ScaleOut a Spark / Flink / Storm killer? Not necessarily. Why? Two words: open source.
Over time, open source has come to be considered table stakes for middleware. Besides being able to innovate at a faster pace, the community approach mitigates the perceived risk for organizations forced to make hard strategic decisions on their software infrastructure.
We have heard this over and over from decision makers from organizations everywhere. Take for example the latest story about Basho, the provider of the widely acclaimed and used Riak database, going out of business. Organizations that have been using Riak are stepping up to maintain it (at least until they find an alternative), as Riak is open source.
Bain acknowledges this fact also. ScaleOut offers its software in 2 flavors, being able to run in both Microsoft's .NET and Java environments. Although there is parity between the flavors, and they can interoperate in a mix-and-match cluster as well, Bain notes that:
"The majority of our users are in the .NET world. In the Java world, people focus on Apache projects and expect software to be open source. We are one of the few in-memory vendors that are not open source, and that inhibits our ability to penetrate that market."
ScaleOut is obviously aware of the implications, so we have to assume they are content with claiming a piece of the action in the Microsoft world. So the Sparks and Storms of the world can rest somewhat assured for the time being.
PREVIOUS AND RELATED STORIES
Oracle is enhancing its IoT Cloud applications but sees the secret sauce as the data and automation that's available in its ERP and supply chain systems.
In this Q&A with GE's Colin Parris, the company outlines how it wants to use monitoring physical conditions as a part of defending equipment from cyberattacks.