Business

Birst-ing into mainstream: Machine Learning meets Semantics in a networked world

Birst is one of the poster children of self-service analytics. Convergence and democratization are the key themes underlying Birst's new release out today, as Birst is trying to balance self-service with enterprise requirements, and making a case for some of the industry's defining trends while at it.

Written by George Anadiotis, Contributor Nov. 29, 2016 at 5:32 a.m. PT

Today, Birst is announcing version 6 of its analytics platform. Birst, an analyst darling ranked as a leader by both Gartner and Forrester, has not had a new version in a while. We spoke with Pedro Arellano VP, Product Strategy for Birst, who offered some insights on the new features in this release: Machine Learning (ML) for the masses, data preparation and the Networked Analytics mantra.

While in and by themselves those features are not revolutionary, they are introduced in an effort to address real issues, and they mark the evolution of Birst into a platform with the ambition to chip away pieces of traditional enterprise IT analytics. Birst is trying to address 2 issues with this release: empowering business users and connecting data silos.

Birst, ranked as a leader by Forrester and Gartner, now expanding its offering with Birst 6. Image: Forrester / Gartner

Machine Learning, meet Semantics

special feature

IoT: The Security Challenge

The Internet of Things is creating serious new security risks. We examine the possibilities and the dangers.

Read now

For Birst, ML is not really new: Birst was founded as an analytics platform for financial institutions before pivoting to a general purpose analytics platform, and ML was there in its initial incarnation to help address the need for predictions. As Arellano put it, the algorithms have been there all the time. These algorithms have been used under the hood already, but now they gain prominence in 2 use cases - data preparation and predictive analytics.

As any data scientist will tell you, preparing data for analytics is a real pain and takes up a lot of time. And if that is an issue for data scientists, it much more of an issue for business users using self-service tools like Birst to analyze their data. While platforms like Birst offer a wide array of connectors to ingest data from various sources, in the end the bottleneck is not so much in the pipelines for the ingestion process.

The real issue according to Birst is that people inside organizations who actually make business decisions lack understanding of fundamental analytics concepts such as tables, columns and joins. Such familiarity is necessary to be able to handle situations in the real world where analytics require different datasets. So, ML + Semantics to the rescue.

Birst architecture. Machine Learning was already used internally, now it is also leveraged to offer predictive analytics features. Image: Birst

Birst already had what it calls a semantic layer in place. Although its foundations were not discussed, what it effectively does is it utilizes metadata to handle schema management for ingested data. Now Birst is utilizing ML-powered tools to make the connections between newly ingested datasets and this semantic layer, promising to handle everything transparently for the end user.

ML is also used under the hood to be able to provide predictions. Again, the idea is to enable business users to have one-click access to predictive analytics, without the need to involve data scientists or to know ML basics. Birst says it uses automated "best fit" selection and scoring of predictive models to predict business outcomes and deliver smart pattern detection, automatic dashboards and automated recommendations of analytic content.

Data governance in a networked world

One of the issues with self-service approach to analytics is that it contributes to data silo proliferation. Empowering people to perform their own analysis may lift some burden off the shoulders of IT and shorten the analysis end-to-end cycle, but at the same time it means that if left unchecked schema drift will creep in, data and metadata will not be shared across the organization and the resulting insights will be unevenly distributed and not reach their full potential.

Networked Analytics is all about collaboration and governance. Image: Birst

Birst acknowledges this issue and is trying to address it by introducing a data governance layer. Users have their own spaces, or sandboxes, that they can use to work in, import their own data and work on projects independently. At the same time though, they also have access to organization-wide managed datasets and schemas. This access is streamlined by Birst and controlled by roles with privileges to authorize and push upstream new data and metadata introduced in separate spaces.

At the same time, users are free to work as they see fit in their local spaces, choosing whether they want to keep their own datasets and metadata definitions, or import and merge with organization-wide ones. If they choose to do so, semi-automated approaches powered by ML tools are there to assist them. Same goes for upstream changes, and this 2-way reconciliation approach is what Birst refers to as Networked Analytics.

Where is this relationship going?

So what to make of the new features? On its own, none of this is really new or revolutionary. Automating ML is in a way the holy grail of predictive analytics, but it's not something that has not been done before - see for example Beyond Core, now acquired by Salesforce. And for people and organizations that want to have a more involved approach to ML, there are all sorts of libraries and frameworks available, ranging from the proverbial Python and R scripts to fully-blown ML frameworks with a wide array of algorithms and capabilities.

Automating data wrangling is, again, something that the competition has been onto for a while. Solutions like Trifacta do exactly this, in fact utilizing the same ML-powered approach to assist non-expert users. And tools like Qlik or Tableau have had access to these capabilities via integrations (and partnerships) with Trifacta for a while now. Same goes for collaboration features - Birst is not the first to support those.

That's not to say there is no value in the new version: these features, although not really innovative, are highly useful in the real world. Birst is reinforcing its leading position and trying to bring a pragmatic, best of both worlds approach to the table for organizations torn between the ease of use and low entry barrier of self-service analytics and the reliability and process streamlining that centrally managed solutions offer.

Birst sees this as the starting point for incremental changes that will be released throughout 2017, adding more capabilities and control. Birst looks like it's set out to be an all-inclusive platform, adding features that in some cases compete with existing partners or integrations like Tableau or Hadoop distributions. But then again, that's not new either: coopetition and feature convergence is the new normal, so in the end it's all about execution.