An integrated set of data management, analytics, and insight application development and management components, offered as a platform the enterprise does not own or control, may sound scary, or cryptic.
Scary or not, that's the definition of Insight platforms as a service (IPaaS) by Forrester. The scary part has to do with the lack of ownership of control. Many enterprises would be put off, as the need to exercise precisely that is engraved in their DNA.
The move to the cloud however, much debated in its early years, is pretty much a given now. Ownership and control have been central issues there, and yet somehow the pro-cloud arguments have prevailed and the majority of enterprises is now on that camp.
Pace of innovation, economies of scale, elasticity, and flexibility seem to outweigh ownership and control. Initially applied to infrastructure, the trend soon expanded to applications and platforms. The end result is that by now a big part of enterprise applications and data live in the cloud.
If those are valid reasons for moving to the cloud, then would it not make sense to have the tools needed to get insights from that data in the cloud too? Why move your data back and forth?
The notion of the cloud as an integral part of data-driven analysis has been around for years. What is new however is that now we are not just talking about cloud-based tools, but entire platforms that offer everything in a bundle: from the mechanics of server provisioning and data ingestion to analytics, automation, and collaboration facilities.
Automation (powered by machine learning mostly) has gone from stand-alone libraries to integrated environments to automating automation. Automation is no longer offered solely as a capability or even a service to end users, but also internally in big data tools and platforms to boost their own capabilities and efficiency.
Solutions for empowering intra and inter team collaboration are flourishing. From collaborative exploration among data scientists, to productizing solutions and managing infrastructure working with data engineers and ops, to serving insights for business users, data-driven analysis is a team sport and needs software that supports this.
Managed cloud big data services are becoming established as a realistic path by which big data analytics and machine learning will make it out to the mainstream. So, to quote fellow contributor Tony Baer, this will be the next great platform decision.
Qubole does IPaaS
Qubole, the company founded by ex-Facebook Thusoo and Sarma, seems to be among the ones who get this. That was evident when discussing with Thusoo the concept of DataOps, or how infrastructure, process, and culture can work together to empower decision making in organizations.
Now Qubole just launched a new incarnation of what it calls Qubole Data Service, which falls under the definition of IPaaS and was included in Forrester Wave™: Insight Platforms-As-A-Service, Q3 2017.
In fact, Qubole's originally planned release date almost coincided with Forrester's report. Although being included in the first place is an achievement in and by itself, considering Qubole is up against the likes of Google, Amazon, and IBM, in the report QDS was pictured as lagging compared to other options.
As the release was rescheduled, when connecting with David Hsieh, Qubole VP of marketing, one of the first things we discussed was whether the two events were related, if Qubole used the extra time to add to its offering, and what is their view of and response to Forrester's critique.
Hsieh said that moving from a single product to three was a significant business development for Qubole, and in order to ensure that everything functioned as perfect as possible on both the technical back end and business front, they took the extra time necessary to polish the fine details.
Hsieh also pointed out some specifics in Forrester's Wave, having to do with timing and methodology, as the evaluation is based on the products in the market at the time when Forrester starts the evaluation process, and descriptions submitted by vendors, rather than demos of the products themselves.
Presumably Qubole believes they would get a better evaluation based on their current offering, in the end however it's not all that important. What is important is the definition of this space, and how Qubole's offering is addressing it.
Hsieh concurs: "We think the concept of an Insight-platform-as-a-service is a good one - that's essentially what the Qubole Data Service (QDS) delivers. Forrester's trends are typically spot on, and it's what Qubole is focused on delivering to our customers."
Co-opetition in the cloud
More and more vendors are getting on the IPaaS bandwagon these days, and this creates interesting situations. Since the big cloud providers are also in the game of offering big data services themselves, in many cases we have co-opetition situations.
Take QDS, which is offered on AWS, Azure and GCP, and these are also vendors QDS competes with. QDS also supports Oracle Bare Cloud, and to some extent they are or will be competing with them too, but does not support IBM.
So what were Qubole's criteria for choosing vendors to work with, how do they see this space playing out, and what is the edge of QDS over cloud provider offerings?
Hsieh acknowledges the "frenemy" relationship with the leading public cloud providers, but says the market is very large and there are plenty of opportunities for multiple companies to succeed.
He says that Qubole has substantial production customers on each of the supported clouds, so the choice was market driven. Some customers want to avoid vendor lock-in, some use different clouds for different things, and in some cases large companies can't dictate homogeneity.
Hsieh cites three ways in which Qubole differentiates itself from the cloud vendors: no lock-in, TCO advantage and automation:
"We're cloud agnostic so the same queries and applications can run across whichever cloud customers want without any coding changes. Cloud vendors are not motivated to invent features which reduce consumption, but we are. Automation improves agility, scale and TCO; I expect other companies will have to do this, but they're already at least a year behind."
But cloud vendors are not the only ones in the IPaaS space, so why go for Qubole over, say, Databricks or Confluent? Hsieh says that while the Databricks and Confluents of the world might know their technologies (Spark and Kafka respectively) better, they don't know how to take advantage of the cloud as well.
That's a highly contentious claim, but Hsieh goes on to add that Qubole has proven that in TCO benchmarks a number of times. Whether you should take those benchmarks at face value is up to you, but where Qubole does have a point is in saying theirs is a more comprehensive platform.
QDS supports Spark, Hadoop, Presto and others all under a single management / control plane. Hsieh says that QDS is designed for multiple workload types (ETL, data science, ad hoc, streaming etc), but of course everyone's going platform these days.
Hsieh also says the competition is "really behind" when it comes to automation. How so? This is a reference to QDS Cloud Agents, one of the three new Qubole products: an optional add-on to QDS Enterprise Edition which autonomously executes a range of data management tasks.
The initial release of QDS Cloud Agents includes the Workload-Aware Auto-Scaling Agent, the Data Caching Agent and the Spot Shopper Agent (AWS Only). These agents optimize cluster sizes for workload requirements, optimize the locality of data and shop across AWS cloud to assemble compute instances respectively.
Hsieh argues that "the only way out of the siege data teams are under is to automate. To have machines take over the mundane tasks humans are doing (only faster, cheaper and more reliably) so that humans can focus more on problem solving, innovation and delivering business value."
Qubole is set on continuing to develop new Cloud Agents to make this happen, and Hsieh says they have a sizable engineering team that focuses solely on automation and agents. The most interesting aspect of it however is what Hsieh calls Qubole-on-Qubole:
"Half of our team works on the infrastructure end -- engineers using QDS to manage the data lake that collects meta data. The other half is ex-practitioner data engineer/dataops experts and data scientists who work on feature development, creating useful insights and alerts, building algorithms that create recommendations and developing new Cloud Agents.
We use notebook features in QDS to develop our Machine Learning/Deep Learning models and then engineering puts the results of those models back into QDS. It's basically Qubole-on-Qubole, which can be a little mind-bending at times."
Collaboration and takeaways
So how about some collaboration after all this head-to-head comparisons? Collaboration, and more specifically support for notebooks, is also an important part of IPaaS and QDS. Until recently however QDS only supported Zeppelin.
"Notebooks have essentially become the "IDE" for data scientists and increasingly data analysts, too. We think it will be important to support multiple Notebooks in the same ways we support multiple open source processing technologies," says Hsieh.
Hsieh says that QDS notebook capability was built several years ago, and Zeppelin at the time was a bit more mature. Based on customer feedback, support for Jupyter and RStudio has been added -- although they still don't look like first class QDS citizens.
So, what to make of all this? IPaaS is a very interesting class of offerings, which will see increasing adoption as more organizations move more applications and data to the cloud. Which offerings belong in that category and what are the criteria for evaluating them is somewhat subjective however.
Should you go for the one stop shop and get all your data needs covered from your cloud provider? Or would you rather have someone run your Kafka/Spark/whatever platform of choice in a multi-cloud environment as a managed service for you?
As usual, it depends on your needs and budget. Qubole is trying to get the best of both worlds, by offering a platform that is multi-cloud, multi-platform and managed. If that is what your organization needs and the math works, it sounds like a tempting option.