A couple of decades ago, I was an application developer, in a new job, working with a client/server database for the first time. When I joined the project, the more senior developers explained to me a cardinal rule that had to be followed for all databases on this new platform.
First, all external access to tables in the database had to be disabled. Next, a set of stored procedures (snippets of code that were stored and executed in the database itself) that queried the data had to be written for each table. These stored procedures would automatically filter the data based on the user calling them, and the groups that user belonged to. Finally, users would be granted execute permissions on those stored procedures, providing a secure method by which data could be read.
Limiting access to the database in this way meant that a whole set of data access coding techniques I had previously used didn't work anymore, and that certain reporting packages didn't work either.
You might ask why we went through all this trouble. The reason was that the company I was working for was a major bank and it had to ensure that users could only see the data for which they were authorized. It wasn't enough to implement this security in the application; it had to go in the database, so that no matter how a user connected to it -- through the application or directly -- unauthorized data remained inaccessible.
Eventually I got used to the new programming patterns, and subsequent releases of major reporting tools became stored procedure-friendly. In effect, stored procedure access to tables had become an Enterprise standard throughout the industry.
Today, a number of relational databases have added row-level security (RLS) as an explicit feature, thus once again allowing for direct table access by users yet still ensuring they see only the data they have been granted access to. For example, Microsoft has added RLS as a preview feature to its Azure SQL Database service (essentially the cloud version of SQL Server). In a blog post, SQL Server MVP Brent Ozar explains how the feature works -- and he also discusses how Oracle and PostgreSQL already have RLS features implemented, and released for general availability.
So, clearly, RLS is a critical Enterprise feature that has been implemented through somewhat manual means for decades and is now gaining explicit support in major commercial and open source relational databases.
But what about Big Data platforms and their legacy of all-or-nothing access to the data they manage? Will the day ever come where they implement RLS as well? Surely, they should; after all, users shouldn't be able to do analysis on data to which they don't ordinarily have operational access. If RLS is important in a operational database, it should be important in an analytics database as well.
As it turns out, in the Hadoop universe, HBase has cell-level security. And last week, Google announced that it was adding a "row-level permissions" feature to BigQuery, its cloud service for analytics on large data sets. While neither the documentation nor the UI make it clear yet how to use the feature, its mere announcement is significant. It means that Enterprise-hardened RLS standards are starting to be met in the Big Data analytics world, and not just in the relational/OLTP world.
That, in turn, starts to pave the way for Big Data and analytics technology to go mainstream. Row-level security, master data management, data lineage and other "governance" features may not be super-exciting, but they are super-important. Without them, technologies like Hadoop can't be compliant with certain regulatory regimes, and (understandably-) conservative IT organizations won't trust them.
That such features are beginning to pop up in a number of Big Data products is excellent. In the case of Hadoop, these features have to become consistent across all components and be compatible across different distributions. Once that happens, and everything works well, it will be difficult for enterprises not to adopt Big Data technologies in-house.