X
Tech

Encryption's holy grail is getting closer, one way or another

Working with encrypted data without decrypting it first sounds too good to be true, but it's becoming possible.
Written by Stilgherrian , Contributor

Whether it's a reaction to the Snowden revelations, a reaction to the continual news of massive data breaches, or just the obvious need to secure data in the cloud -- or all of the above -- new technologies for working directly on encrypted data are getting plenty of attention.

"Yes, it is a very, very hot area," Raluca Ada Popa, one of the creators of CryptDB, told ZDNet last Friday. Her new startup, Prevail, building on CryptDB's successor Mylar, is just part of the buzz.

When Popa and her Google- and Citigroup-funded team at MIT first published their paper CryptDB: Protecting Confidentiality with Encrypted Query Processing (PDF) in 2011, it was a breakthrough -- the first practical system for manipulating encrypted data without decrypting it first.

"It works by executing SQL queries over encrypted data using a collection of efficient SQL-aware encryption schemes," the paper said.

"CryptDB can also chain encryption keys to user passwords, so that a data item can be decrypted only by using the password of one of the users with access to that data. As a result, a database administrator never gets access to decrypted data, and even if all servers are compromised, an adversary cannot decrypt the data of any user who is not logged in."

We've had encrypted databases before, of course, which take care of encrypting the data at rest. But that data still had to be decrypted before it could be used, and that decrypted data could potentially be read from a server's memory. Not so with CryptDB.

"The main issue it's addressing is if you have a database in the cloud, for example, and you want to protect against attackers who break into the cloud, or you want to protect against employees of the cloud," Popa told ZDNet. It's also a defence against a server being stolen or otherwise physically compromised.

The ultimate goal -- encryption's holy grail, some have called it -- is something called fully homomorphic encryption, where the entire system works on encrypted data, and returns an encrypted result. The only point in the process where data would be decrypted would be when the user wanted to see the result, and that would presumably happen in the application or client software, not in the database server in the cloud.

The first fully homomorphic encryption system was developed in 2009 by cryptographer Craig Gentry at IBM, and he put together a working implementation with Shai Halevi the following year. However, this and other early fully homomorphic encryption systems have a problem.

"Fully homomorphic encryption handles any function you can imagine, so you could run any function on the encrypted data, but that would be nine orders of magnitude slower than the regular computation -- that's really not something practical," Popa said.

For those who prefer their database queries to return a result this century, CryptDB made two key compromises in functionality and security.

"We support six basic functions: Addition, multiplication, greater than, equality, search, and nesting these functions, and we show that with these functions, you can actually implement a lot of interesting database applications, web applications, and so forth," Popa said.

Popa's team showed that using CryptDB in a variety of applications had a performance overhead of just 14.5 to 26 percent, compared with an unmodified MySQL database.

"It was really low ... and that was in a situation in which you encrypt absolutely all the data. But if you look at a lot of realistic applications, not everything is sensitive," Popa said.

"If you just take the things that are considered to be more sensitive and encrypt them, then you'll have even less performance overhead. You can get 3 or 4 percent, almost not noticeable."

The security compromise is that performing some operations reveals a "little bit" about the encrypted data.

"The order operation, for example, leaks, because you want the server to be able to order encrypted data. And when the server orders encrypted data, and knows what's the first item, which is the smallest, it doesn't know the value, but it knows, OK, this is the smallest value, because it's the first one in the ordered relation," Popa said.

Despite these compromises, CryptDB represented a significant advance, and it has inspired plenty of work elsewhere.

Google's Encrypted BigQuery client is based on CryptDB, for example. SAP has implemented SEEED, a system for searching over encrypted data, in its HANA database management system. Several startups are applying CryptDB techniques to Oracle databases. And Lincoln Laboratory has added the CryptDB design on top of its D4M Accumulo NoSQL system.

Popa followed up CryptDB in 2014 with Mylar, applying a similar vision to web applications rather than databases.

Mylar allowed the server to perform keyword search over encrypted documents, even if the documents were encrypted with different keys. Mylar ensured that client-side application code was authentic, even if the server was malicious.

A Mylar prototype was deployed as part of a medical application at Newton-Wellesley Hospital in Boston. The results were promising. Porting six applications required changing just 35 lines of code on average, and the performance overheads were a 17 percent loss in throughput and, for example, a 50-millisecond latency increase for sending a chat message.

"It got an incredible amount of press, because even though that was not the main target of our paper, it does protect against government attacks too, because even if the government subpoenas the servers in the cloud, the cloud just doesn't have data, it just has encrypted data," Popa said.

But while the Mylar code is available online, and is "decent" quality, Popa said it isn't being maintained.

"We are researchers, and it takes years to produce a product with all the details worked out, and all the customer stuff worked out, and we didn't do that for Mylar for sure. Neither did we do that for CryptDB."

That's about to change.

In July, Popa will return to the US after spending time as a postdoctoral researcher at ETH Zurich. She'll be joining the University of California at Berkeley as an assistant professor, and she'll soon be launching a startup called Prevail.

"Just now, [as] part of the startup, we're building a version of Mylar for certain kinds of applications, so that's one thing where we're building a real product," Popa told ZDNet.

While Prevail and others pursue the homomorphic dream, others are taking the zero-knowledge route, where the database server knows nothing about the data being stored -- not even the ordering information that CryptDB leaks.

One such startup is ZeroDB, a Silicon Valley company founded just weeks ago in March 2015, which is currently in closed beta.

"We're pushing all of the query logic, encryption and decryption, and compression to the client. And so we're basically turning the database server into just a simple data store. The idea is you never give keys to the server, so it has no understanding or insight into the data that it's storing," ZeroDB co-founder MacLane Wilkison told ZDNet on Thursday.

"So now you can run your database server up on AWS or Azure or some other cloud provider, and you don't have to worry about that cloud provider knowing what your data is, or, more likely, an attacker of that cloud provider getting access to your underlying data."

The key difference from CryptDB's approach is that ZeroDB does in fact decrypt the data and run computations in the client -- but that still means the data in the cloud is encrypted at rest and in transit.

"So this way, we are able to not share information with the server, but we have a little bit more of network communications," ZeroDB co-founder Mikhail Egorov told ZDNet.

"That said, when we do that, we offload these computations from the server, so I can imagine certain situations when our [system] has actually better performance [than CryptDB]."

Wilkison said ZeroDB is currently running pilots with several companies.

"We don't have a public implementation freely available for anyone to go and just pull down the code and download, but we're working with financial, technology startups, a couple of healthcare startups," he said. They're looking at an official launch around August.

While there's no indication yet as to what ZeroDB's business model might be, one proposed feature gives a hint: Direct sharing of encrypted data in the cloud with software-as-a-service (SaaS) vendors using technology called proxy re-encryption.

"If you have a database in the cloud, encrypted under your private key, and you want to share that with Salesforce, you can take Salesforce's public key and then your private key on the client, you take those two and create what's called a transformation key, and you send that transformation key to the cloud, and then cloud can actually apply that transformation function onto your encrypted data, and then it will be subsequently encrypted under Salesforce's key," Wilkison said.

"You can do all of that in the cloud without exposing any of the private keys."

Egorov admitted that the technology isn't "super new". "It exists. Our IP here is probably how to apply it properly to pieces of the database. But other than that, it kind of existed on the level of algorithms," he said.

Just like Prevail and Mylar -- and, for that matter, just like IBM -- ZeroDB is participating in a race to turn algorithms and prototypes into robust products and services that people will pay for. And according to Wilkison, regulation may well be a driver.

"We actually even have an EU government that's using [ZeroDB]," he said.

"I think the new EU data privacy Act is going to be -- or is already -- a big driver for us, because that imposes a lot of regulations around what happens if you're storing customer data and that gets compromised."

Editorial standards