What to do with the data? The evolution of data platforms in a post big data world

Thought leader Esteban Kolsky takes on the big question: What will data platforms look like now that big data's hype is over and big data "solutions" are at hand?
Written by Paul Greenberg, Contributor

Note: I've had the eminent thought leader Esteban Kolsky, founder and managing principal of ThinkJar, doing guest posts before on this blog. Time and again, the guy simply nails what the core of contemporary thinking is and how to approach it.

This time, he goes to the heart of how the business world is evolving and what it takes to have a transformative success - and that means ecosystems and platforms.

This post is the first of two that he will have here. (Part two comes next week.) The idea for these posts grew out of research that Esteban just finished for Radius, a company that characterizes itself as providing Customer Data Platforms (CDP) for B2B revenue teams. This research inspired more than simply a post with market data; this is significant thinking on where data platforms are going in a world that has solved (more or less) big data.

So, Esteban, start the ball rolling...

Thanks, Paul, for letting me use your blog to spout on data and data platforms. I want to split the research I did in two posts (for easier consumption). First one (this one) on the evolution of data, and the second one (next one) on the evolution of data platforms.

There has been a lot of discussion recently on the "thought leadership interwebz" about what is the best way to aggregate data. We talk about data lakes, swamps, BI, MDM, CDP, and much, much more -- but none of this provides a simple solution to the problem of how to optimize data use in a digitally transformed organization.

Also: IBM Cloud Private for Data preps Red Hat OpenShift certification, queryplex search tool

The problem has recently risen to the executive level, where I am having conversations about the differences between all of them. Where did all this problem start? Glad you asked.

Evolution of data: Where it all started

Mind-blowing volumes of data started the problem.

By 2025, the volume of all data created will top 163ZB (zettabytes). Enterprises will experience a 50-fold increase in data they must manage. This is what we started using the last five to six years under the name of big data. As with all technology-only solutions, they quickly became "solutions" looking for problems to solve -- not the solution to existing problems.

What is available today is focused on the sheer amount of data available (big data), and how to store it, rather than finding value from it. If we only wanted to process data, the big data movement would've been fine, but since we want more (actionable insights became the holy grail of data processing shortly after big data started, and the origin of digital transformation), we need to find different value propositions for that tidal wave of data.


In the last 10 years, we saw slow progress from simple, demographic data-in-storage to multi-dimensional data-in-use: We moved away from creating huge electronic storage areas for data, and we began to use it in real-time; unfortunately, most enterprise data today is still stored in disparate systems waiting to be processed. Value comes from aggregating the right data from myriad sources and using it efficiently and effectively to solve business quandaries and optimize processes -- and to do that, we need to understand what the data shows, not just the data itself.

We don't have a problem finding data, we can find more than we need. The problem comes down to appropriately using it.

Also: Apache Flink takes ACID

Enterprises are beginning to understand the concept of data-driven, outcome-focused, customer-centric operations, and the need for digital transformation (ensuring that data flows easily and fluidly across the enterprise). Most of them have early strategies and operations in place.

The biggest problem remains understanding how data affects transactions and processes (what data is and how to use it to achieve intended business outcomes) and not being able to learn from past results. This is where the "gold in them hills" is -- in using the lessons learned to engender continuous optimization, not just one-time improvements. The correlation between digital strategies and existing data is what necessitates data platforms, but first, we need to fix serious operational problems.

We found four problems organizations face when using data:

Poor Operationalization. All businesses have analytics tools, just not the right ones. How to aggregate all these tools into a common data model and then use that to run the business is the operationalization strategy that most companies miss.

Bad Data, It Happens. The era of "big data" brought bad data to 40 perceent average in enterprises. There is an inherent risk in aggregating data poorly -- and the tools we are using are not focused on solving that problem but rather in increasing the size of the data stored.

Depleted Resources. All organizations I talk to have the same problem: Not enough qualified resources (people, money, technology, time) to do the work they need with data.

Understanding. Everyone knows what data means; there's plenty of definitions, and we have been using it for close to forever to run businesses. But few understand how it works. The lack of data governance and investment as the company grew is culprit.

Management techniques, storage, manipulation, analytics -- they all have evolved dramatically over the past few years and created the world of big data. They just left a bigger mess: Too much data, not enough insights derived from that data.

Also: Big data is now economics and not just technology TechRepublic

And the wrong tools to do it well going forward.

Post big data world and what to do with the data

We are in a post-big data world.

The original promise of big data -- collect enough information and you will find a way to use it that will improve operations and results -- failed. To paraphrase The Notorious B.I.G, the more data we come across, the more problems we see.

What we did end up with is lots and lots of data in silos, isolated from each other, overwhelming the users that are trying to figure out how to use it, and the IT people who are trying to figure out how to manage it.

This is the beginning of the post-big data world quandary.

Most large collections of data are inaccurate, overwhelming, complex, and tend to be stored in silos. Organization believes they have data -- but what they have is one of two things: Noise, that shall never be converted to signal and only aids the bottom lines of storage providers, or un-connected, un-correlated, disparate, and bad data that is useless because they don't know how, or why, to use it.

Also: The Power of IoT and Big Data

Data collected, processed, stored, and used must be always aligned with a purpose. It used to be the company determined what the purpose was, but now the customer is demanding specific outcomes that changes how much data is processed, stored, and used.

It's not the amount of data collected that matters, but the actual usefulness of the data. As we move further and further into data-based decision making, both automatically and assisting humans to make those decisions, processes require better, cleaner data, and lessons from the past (actionable insights) to continuously improve how we work with data -- both historical and new data. Learning what happened last time will help us make a better decisions next time, just like in real-life (and becomes the basis for machine learning and artificial intelligence).

Storing data in case we will need it someday only aids in polluting processes with useless noise. Organizations need to understand what the data used represents, where it came from, where it's going, and how it's used, but more importantly, what is the value proposition of the data as used and as stored.

Also: Turning Big Data into Business Insights

Those parameters will then yield a complete real-time, aggregated data repository that can be used to understand customer expectations, optimize processes, achieve outcomes, generate insights, and more.

But, to do that, you will need a good data platform. And that is the next post...

NOTICE: The CRM Watchlist 2019 registration and the registration for The Emergence Maturity Index Award for 2019 is closing on Sept. 30 with no extensions possible. So, if you want to make sure that you are part of these (see the links in the names for the details), then you have roughly three weeks to go. I'd do it, if I were you, though I'm not. You, that is.

To request the registration form for either of them, please email me at mailto:paul-greenberg3@the56group.com.

These are 2018's biggest hacks, leaks, and data breaches

Previous and related coverage:

The past, present, and future of streaming: Flink, Spark, and the gang

Reactive, real-time applications require real-time, eventful data flows. This is the premise on which a number of streaming frameworks have proliferated. The latest milestone was adding ACID capabilities, so let us take stock of where we are in this journey down the stream -- or river.

Arcadia Data brings natural language query to the data lake

Arcadia Data provides a search engine-style text box as its latest query interface, bringing BI natural language query to the data lake.

This startup thinks it knows how to speed up real-time analytics on tons of data

Making sense of the vast amounts of data gathered by businesses is a problem for business that Iguazio says it's cracked.

Editorial standards