To the cloud, big data sisters and brothers, to the cloud

While reports of big data's death have been greatly exaggerated, the skepticism is not unwarranted. The cloud may have some of the answers, but it won't solve all of big data's problems.
Written by George Anadiotis, Contributor


Gartner says the big data party is over, and Tony Baer wonders whether we can even make data science (and engineering) work. And results of a survey recently published by Dimensional Research (DR) and sponsored by Snowflake are clear: although 100% of participants acknowledge that data initiatives are important, the vast majority (88%) have had "failed" projects.

Respondents have reported some things that would help them get more out of their current data environments, including the ability to implement and deploy faster, reduce time to make data available, simplify tool sets, or reduce the overhead of managing infrastructure. DR states that cloud analytics has the potential to deliver these benefits.

But expecting cloud infrastructure alone to deliver big data initiatives from their maladies verges on the metaphysical, much in the way Chekhov's heroines expected fleeing to Moscow to deliver them. There's more to getting big data right, and we've had Jon Bock, Snowflake's VP of products, weigh in on the subject.


What would benefit your current data environment? (Image: Dimensional Research)

Data, in itself

People cite data infrastructure "inflexibility" as a major cause of issues, and according to Snowflake that often boils down to delays caused by complexity and resource constraints. It takes time and a huge amount of work to get the right capacity deployed and to start using it, and each new project can put new strains on the existing infrastructure.

An increasing share of data today comes from outside the corporate data center, from applications that process data natively in the cloud. As a result, it's often easier to bring together that data in the cloud than to suck it all into the corporate data center. Data management tools have been rapidly evolving to support cloud and hybrid environments, optimizing data transfer.

Cloud solutions can deploy resources in minutes, alleviating the challenges of capacity planning, and they can do that with minimal to no work for organizations because the complexity is handled by the cloud solution and cloud infrastructure vendors. Getting data to the cloud is one of the first things organizations start thinking about, and it's getting easier and easier to do so.


Cloud adoption for analytics effort is getting mainstream (Image: Dimensional Research)

Cloud to the rescue

It's clear then that data ingestion is a major part of cloud-based analytics, as network latency is added to the inherent computational and I/O cost associated with ETL and/or data mapping and integration. Recently IBM claimed to be the fastest around in data ingestion, but revealing very little to substantiate this. So is this an anything goes, mine is better than yours game?

Architectural blueprints, standards, and benchmarks might help clients have a better picture of the oversubscribed data infrastructure and analytics landscape and contribute towards fair comparisons, so one might wonder why don't we see more vendors publishing benchmark results for example.

Snowflake's take is that this is not really due to marketing taking precedence over architecture, but more due to benchmarks not being able to catch up with the explosion in the use case diversity and cloud flexibility. Traditional benchmarks assume that they will be run on a fixed hardware configuration, however in the cloud resources can be changed on the fly. Benchmarks have not evolved to capture that.

Snowflake believes the scalability of the cloud can provide more horsepower to support data ingestion, but more important is moving away from a data architecture that requires transferring a batch transfer of the entire data set on a repeated basis, as designing for incremental data ingestion on a continuous basis dramatically simplifies the process.


Agility in analytics is about more than infrastructure (Image: Linked Data Orchestration / Gigaom)

Agility is everything

Snowflake has designed and built a data warehouse for the cloud from the ground up on top of AWS, because it reckons AWS provides the most established platform, the largest installed base, and continues to grow rapidly. But organizations that Snowflake talks to often already use cloud services from multiple vendors.

At this stage in cloud maturity, most organizations look for ways to reduce vendor lock-in when it comes to cloud, but are not ready to handle a multi-cloud environment themselves. So the quest for agility is on, even though the goal may not be in sight yet. And this does not only apply to cloud infrastructure.

Findings of the DR survey seem to confirm that cloud support, one of the key features of agile BI (together with Advanced visualization, Domain-specific knowledge, Data-source agility, and Distribution-channel agility) is becoming the norm. Snowflake thinks agile BI is an approach that has gained interest because organizations are looking for ways to be more nimble and flexible in their data analytics.

The biggest challenge is developing the culture needed to make it successful. Organizations often discover their methodical waterfall approach to developing reports and analytics is so ingrained that agile BI is resisted and distrusted. That resistance results in a lack of buy-in that often makes it impossible to succeed.

Organizational change and digital transformation must go hand-in-hand in an interactive process. Big data and the cloud may be great enablers of flexible approaches and measurable value creation, but without appropriate organizational culture and structure in place, they will fail.

Editorial standards