How researchers can get genetic data and maintain privacy

Summary:Data is not transformed. This is not an encryption code. Instead data that might lead to an identification of anyone is generalized inside a computer to render the identification impossible.

By simply changing  ICD-10 codes into a series of related numbers, clinical data can be made anonymous so researchers can use it in population studies.

Vanderbilt post-doc Grigorios Loukides developed the technique alongside assistant professor Brad Malin and research fellow Aris Gkoulalas-Divanis.

ICD-10 codes are standardized codes for medical conditions used both in billing for services and in clinical research.

The Vanderbilt technique appears to solve an important political problem. Scientists want to use clinical data in population studies to find the specific causes and best cures for disease, cross-referenced to genetic information. But patients rightly fear that their privacy could be compromised.

The Vanderbilt technique solves the problem for both sides. A minimum number, called k, is set where privacy might be at risk. Until that number is reached records are given multiple, related codes before they're reported -- a patient with Type I diabetes is listed as having both Type I and Type II, and vice versa.

Data is not transformed. This is not an encryption code. Instead data that might lead to an identification of anyone is generalized inside a computer to render the identification impossible.

Researchers, of course, would know this is being done,  and in the example would not draw conclusions about differences in Type I and Type II diabetes, only conclusions relating to diabetes generally. Once k is exceeded, then data could be reported with more specificity.

Vanderbilt is part of the Electronic Medical Records and Genomics (eMERGE) Network, a nationwide alliance of research institutions looking to combine genetic and clinical data. (The illustration is from the group's home page. It is based at Vanderbilt.)

One of their big efforts is the GWAS project, which aims to ethically combine genetic and clinical databases for common disorders that may have a genetic basis, like cataracts and dementia.

That project, and many others like it, could now get the go-ahead. Go Commodores.

UPDATE: You may ask, as I did, well what's the value of k? That's in the eye of the beholder, says Prof. Malin.

Statistical agencies, such as the U.S. Census Bureau, have tended to lean towards parameterizations that would suggest we use k=5.

The value of k, in other words, is whatever value you think you need in order to guarantee privacy.

Topics: Security, Legal

About

Dana Blankenhorn has been a business journalist since 1978, and has covered technology since 1982. He launched the Interactive Age Daily, the first daily coverage of the Internet to launch with a magazine, in September 1994.

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.