Business

Big on data, always and forever

Since I began covering the data and analytics space over a decade ago, it seems like everything has changed. Really, though, it's just a tech cycle repeating itself. Data is, and will be, a part of every such cycle.

Written by Andrew Brust, Contributor March 30, 2022 at 1:17 p.m. PT

Some ten years ago, I started writing about what we then called "big data" for ZDNet; in fact, I was the first person at ZDNet focused on it exclusively. Coming from a consulting background in enterprise business intelligence and application development, I thought it would be fun to cover this burgeoning new area of the analytics game that I had been part of since the late 1990s. The editors wanted to name this blog simply "Big Data." My thought was that that term wouldn't age so well...that whatever was "big" then would seem "regular" in ten years' time. I suggested a slight variation: "Big on Data" (because I was, and I am). And that's how the blog and its name came about.

Also read: Big Data: Defining its definition

I was a bit amused that so many people saw big data as shiny and new. It wasn't...instead, it was a logical progression of the enterprise BI technology that had existed for the period of about 20 years prior. There were some important differences, though. Instead of being based on expensive commercial software, the tech of the day -- Apache Hadoop -- was open source. Instead of leveraging proprietary data warehouse hardware appliances, using (limited) enterprise storage, Hadoop used commodity servers, and their inexpensive direct-attached storage (DAS) commodity disk drives. And rather than struggling at terabyte scale, Hadoop bragged it could work at petabyte scale -- handling data volumes three orders of magnitude bigger.

Also read: MapReduce and MPP: Two sides of the Big Data coin?

Lots of warts

There were downsides, too. And lots of them. Hadoop didn't work with SQL, but rather required engineers working with it to write imperative MapReduce code -- in Java -- to get their work done. It worked in batch mode, and not interactively, so it was...slow. And beyond analytical queries, every workload required its own engine. Data transformation, streaming data processing, machine learning and job flow required other open source components, with names like Pig, Storm, Avro, Mahout, and Zookeeper, each of which featured its own arcane command line interface.

Also read: The MapReduce 101 story, in 102 stories

Everything was based on simple data files; security was file-based, too. In fact, the granularity of security was so unwieldy that many orgs using Hadoop simply gave all their users full access to everything, but limited that group of users to a small cohort. Corporate standards be damned...they only served to stymie innovation. Beyond individual technologies, there were so many vendors that it prompted me to deliver a talk in 2016, at the now defunct Hadoop Summit, called "The Ecosystem is Too Damn Big."

Also read: The Odd Couple: Hadoop and Data Security

It struck me at the time that all this new technology, meant to democratize data and analytics, was doing just the opposite. Worse, so many of the new startups in the space were founded and led by practitioners, who, though brilliant, ignored the technological gains, sensible standards and broad appeal of the BI and data warehousing tech that preceded them and their companies' platforms. In the name of casting off the old technologies' hegemony, the new technology in many ways represented a regression, rather than an advance.

Fast forward

Today, Hadoop and, more important, its complexity have largely been rejected, and the data warehouse is back with a cloud vengeance, featuring Snowflake as its poster child. NoSQL databases now speak SQL. Open source platforms sport real security, and detailed data governance. File-based analytics technology has taken on the "data lake" moniker and Apache Spark has superseded Hadoop as the tech standard. Even in that world, Databricks, founded by Spark's creators, has embraced a hybrid data warehouse/lake concept it calls the data "lakehouse."

Also read:

When it comes to established technology and concepts that work well, what's old often becomes new again. Startups get new CEOs that are more business-focused, and less tech- or academics-oriented. Snowflake and Databricks are cases in point. Areas with too many vendors see a wave of consolidation, often resulting in rivals joining forces, which is exactly what happened with two pioneering Hadoop companies, Cloudera and Hortonworks, now unified under the former's name.

Also read:

Other vendors simply get swallowed up, sometimes in asset purchases, with the third Hadoop company, MapR, now part of HPE, a perfect exemplar. Hero pure play companies get acquired by larger, entrenched players, as was the case with Salesforce acquiring Tableau, and Google Cloud grabbing Looker. Enterprise old dogs learn new tricks, as with Microsoft and Power BI.

Also read:

Exuberance gets rational

All of this is a sign of a new, innovative sector stabilizing, maturing and moving from cutting-edge curiosity to mainstream mission-critical technology. The tech gets easier to use, its market gets more sustainable, and it gains adoption even from conservative customers. The category gets more elbow grease, though it may lose some of its sheen.

That doesn't mean it gets less important. Data isn't going anywhere. As I often say in talks I give at conferences, data is simply a set of point-in-time recordings of events that have taken place, and of the actions and facets of the people, devices, organizations and processes that participated. Data can't go away any more than business itself can, and likewise for analytics. Business runs on data. Even if data isn't "a thing," it's the thing that enables and powers all the other things, including AI and data science -- in name, and in substance. Just as I was amused over 10 years ago that people found big data to be shiny and new, I'm amused now that they may find it old and crusty.

Also read: Farewell ZDNet: Data remains the lifeblood of innovation

I must be going

For over a decade, it's been a thrill to write for ZDNet, the enterprise tech news site that draws heritage, and the first two letters in its name, from the company that still publishes PC Magazine, which I once eagerly read, physical cover to physical cover, when I was literally still a kid. It's hard to imagine a more venerable site.

And yet, I'm moving on, to not just one but two other excellent outlets, where mature, but innovative technology retains its spotlight. Join me at either or both if technology that's cool when it's behind the scenes, and when it's center stage, still fascinates you.

Editorial standards

Show Comments

augmented-reality-glasses-technology-and-engineering.jpg

Big on data, always and forever

Lots of warts

Fast forward

Exuberance gets rational

I must be going

Related

The business guide to AR and VR: Everything you need to know

I stress-tested this rugged external drive. Now, it goes with me everywhere

Microsoft is changing how it delivers Windows updates: 4 things you need to know