Hortonworks introduces Data Warehouse Optimization jumpstart solution

Hortonworks is taking the next step with the OEM tools that it is now reselling by introducing a prescriptive, jumpstart engagement to implement the full bundle. With the market calling for simpler onramps to Hadoop, we believe that Hortonworks should take this new jumpstart program to the cloud.

Roughly six months ago, we wrote of Hortonworks' evolution from a dogmatic pure open source strategy to one that is more pragmatic. Its core platform remains 100% open source, but as we noted, the company is looking a lot more like its rivals in adding vendor-specific content through resale partnerships with Syncsort for data transformation, and AtScale for providing virtual OLAP data mart views.

The Power of IoT and Big Data

We delve into where IoT will have the biggest impact and what it means for the future of big data analytics.

Read More

Now Hortonworks is taking the logical next step, introducing a services-led prescriptive offering to implement the Hortonworks and OEM tool bundle. The new Hortonworks EDW Optimization Solution delivers a jumpstart, 7 -8 week services-led engagement to customers who are new to Hadoop and looking to extend their data warehouses. Bundling the Hortonworks Data Platform (HDP) with Syncsort DMX-h and AtScale, the new offering is designed to get customers to prototype stage.

Also: Hortonworks, Neustar collaborate to secure IoT, more efforts to follow | Hadoop vendors are listening: Hortonworks gets pragmatic | All Big Data clouds are not alike

This offering addresses two core issues: First, Hadoop can be complex and intimidating to new customers to implement; and second, there remains the perennial need to tame the more mundane systems integration issues germane to any data warehouse or data mart project.

Target customers are those looking to use Hadoop for its cheaper compute cycles and storage to shift some workloads, such as ETL; extend analytics in an "active archiving" use case to include older historical data; or build new data marts. For instance, while the Syncsort DMX-h tool can source and target a wider range of data types (e.g., JSON), the Hortonworks offering focuses strictly on transforming conventional structured data coming from data warehouses.

The operable notion is that Hadoop will be a more economical platform for many of these workloads, especially with data transformation and exploratory analytics. But it also assumes that the data warehouse remains the place best-suited for routine operational analytics. That's why Hortonworks labels this engagement as data warehouse optimization, not replacement.

The engagement includes Hortonworks field consultants installing HDP, Syncsort, and AtScale; configuring sources and targets, processes (e.g., Hive LLAP for interactive SQL), and ODBC/JDBC interfaces; data transformation routines; creating up to three virtual OLAP cubes in AtScale; and demo'ing the results.

While there are many use cases for Hadoop, EDW optimization is often the first as objectives such as moving ETL can generate tangible ROI. And that explains why Hortonworks has narrowly targeted this jumpstart package to this scenario.

The Hortonworks EDW Optimization solution comes at an auspicious time. To date, we estimate that the Hadoop installed base goes just north of 3000. That's obviously just the tip of the iceberg in the overall data warehousing and analytics market.

Hadoop has been jokingly referred to as a collection of zoo animals, in part because many of the projects are named after creatures in the wild, but also because those projects also behave like creatures in the wild. And not surprisingly, the profile of early adopters are those with IT organizations with the resources and skills to take on complex projects, such as taming those creatures. For the next 3,000 adopters, things are going to have to get a lot easier.

And that's why you're seeing Hortonworks unveil this package, and you're also seeing many more targeted onramp options as well. The obvious ones are the managed big data analytics services that behave like SaaS cloud services by taking care of the underlying plumbing, maintenance, and updating. Among the most established are Amazon's Elastic MapReduce (EMR) and Microsoft Azure HDInsight (which is built around HDP). And in the data warehousing space, you have Amazon Redshift, Azure SQL Data Warehouse, and Snowflake.

But you're also seeing more narrowly targeted services, such as machine learning from all of the major cloud providers, and dedicated Spark computing services. They are delivered on the premise of running targeted operations rather than a full-blown Hadoop or data warehousing platform. Hortonworks has also fed this narrative with Hortonworks Data Cloud, itself a simplified offering built around Hadoop's two most popular workloads: Hive and Spark.

Our take is that managed services in the cloud are the logical paths for the next wave of big data adopters. We believe that by year-end 2018, over half of new Hadoop implementations will be cloud-based, and managed services will be essential for making Hadoop - or other forms of analytics - attractive to new adopters. As of the announcement today, the Hortonworks EDW Optimization solution is targeted at on-premise customers. But the logical path will be for Hortonworks to extend this to the cloud, with the Hortonworks Data Cloud being a target that's all too obvious.

How can you turn big data into business insight?

How can you turn big data into business insight?