Looker flouts conventional wisdom in making data lakes accessible
Looker aims to make self-service manageable by taking a unique approach to modeling the data assets underneath. Sounds far-fetched? Tell that to the hundreds of customers who've doubled the company's business over the past year alone.
Ever since business intelligence (BI) was invented, people have been trying to fix it.
It was supposed to put visual, easy-to-understand dashboards on the desk of everyman/woman, but that promise remained unfulfilled because of the limitations of those dashboards and the complexity of classic BI platforms. It wasn't until the emergence of Tableau that at least the promise of ubiquitous dashboards (we now call them visualizations) finally materialized.
It starts with data lakes, which are supposed to put all the data in one place, and self-service, which threatens to undermine that by proliferating islands of visualizations, each with their own caches or extracts of the data. The question is, has self-service come at the cost of providing a single source of the truth from the data lake? It's the latest chapter in the age-old saga of empowerment vs. control.
Looker claims it has an answer to that - it aims to offer the best of both worlds. How it accomplishes that is the interesting part: Looker's approach has constantly tilted at windmills.
First, it straddles the self-service vs. centralization debate. Looker is not against self-service, but it's against proliferating separate caches of the data. It offers a visual client side that allows business users to dynamically explore data on their own without being limited to what's cached or in memory. The modeling tier allows that to happen, keeping the views and the data populating them consistent, so everybody is literally on the same page.
Secondly, Looker isn't pigeonholed; it's both end user visualization client and developer tool.
Then there's Looker's middleware-style architecture that may seem a bit retro (especially if you survived the SOA era) as it abstracts the view of the data from the underlying physical representation. And finally, in an era of open source languages, Looker has the audacity to introduce its own proprietary LookML SQL-based modeling language.
By conventional wisdom, Looker's done everything wrong, but in the five years since its founding, the company has raised nearly $100 million in venture funding, and over the past year has doubled the client base to roughly 750.
Looker works by relying on the LookML modeling language to, in effect, wrap SQL with far richer metadata on what tables to use, how to join them, and how to calculate derived data. Much of this metadata is autogenerated by the tool itself from crawling target databases. By modeling, not only how the data is structured, but also how it is consumed, Looker can reduce or eliminate ETL jobs. Furthermore, LookML is not limited to introspecting SQL relational databases; it can also handle JSON as well. This extensibility is crucial as enterprises build data lakes that are likely to contain data of widely different types.
It currently lacks a data dictionary that would add business context to the data, but that's on the roadmap. It has added a search capability, but that just begs the question of whether Looker will build its own data catalog or partner to deliver one. For instance, Looker appears to be complementary to providers like Alation that harness a combination of machine learning and crowdsourcing to not just find the right data tables, but also form queries against them.
This week, Looker 4 was announced, which the vendor claims is the biggest revamp to date. It presents changes, both to developers and end users, and it may ultimately push the company's focus to behind the scenes through a potential OEM strategy.
For developers, Looker has modernized both the language - transitioning it from the limits of its original YAML base - to make it more flexible and extensible. More importantly, for a developer tool, it finally offers a decent IDE that offers the code completion, error checking, contextual documentation, and so on that coders expect. For business users, as noted before, the search capability was added. For both constituencies, a marketplace called Looker Blocks has just been added as the linchpin of a new third party ecosystem.
But what's under the hood could be the most profound change. Looker is now exposing and making freely available APIs to its data modeling engine. In a crowded market where end user visualizations tools have multiplied like rabbits, the last thing an organization needs is yet another end user client app to manage, or for end users, another pane of glass to learn and take them away from their workday applications.
Looker has often been accused of trying to boil the ocean. Its modeling approach is aimed at preempting, not only classic ETL tools, but also the new generation of data preparation providers.
The question is how far this can go, given the flexibility of Looker's approach that suits itself for data lakes, and the sheer scale and variability. We expect that data lakes will have tiers of data, all of which come in raw form, with different portions advancing to varying stages of refinement; Looker need not boil the whole lake. The key will be to avoid stumbling into the familiar trap of pursuing that elusive enterprise data model.
Either way, there is little question that Looker has changed the conversation when it comes to addressing the issue of keeping data lakes from becoming victims of their own success.