How researchers can get genetic data and maintain privacy

How researchers can get genetic data and maintain privacy

Summary: Data is not transformed. This is not an encryption code. Instead data that might lead to an identification of anyone is generalized inside a computer to render the identification impossible.

TOPICS: Security, Legal

By simply changing  ICD-10 codes into a series of related numbers, clinical data can be made anonymous so researchers can use it in population studies.

Vanderbilt post-doc Grigorios Loukides developed the technique alongside assistant professor Brad Malin and research fellow Aris Gkoulalas-Divanis.

ICD-10 codes are standardized codes for medical conditions used both in billing for services and in clinical research.

The Vanderbilt technique appears to solve an important political problem. Scientists want to use clinical data in population studies to find the specific causes and best cures for disease, cross-referenced to genetic information. But patients rightly fear that their privacy could be compromised.

The Vanderbilt technique solves the problem for both sides. A minimum number, called k, is set where privacy might be at risk. Until that number is reached records are given multiple, related codes before they're reported -- a patient with Type I diabetes is listed as having both Type I and Type II, and vice versa.

Data is not transformed. This is not an encryption code. Instead data that might lead to an identification of anyone is generalized inside a computer to render the identification impossible.

Researchers, of course, would know this is being done,  and in the example would not draw conclusions about differences in Type I and Type II diabetes, only conclusions relating to diabetes generally. Once k is exceeded, then data could be reported with more specificity.

Vanderbilt is part of the Electronic Medical Records and Genomics (eMERGE) Network, a nationwide alliance of research institutions looking to combine genetic and clinical data. (The illustration is from the group's home page. It is based at Vanderbilt.)

One of their big efforts is the GWAS project, which aims to ethically combine genetic and clinical databases for common disorders that may have a genetic basis, like cataracts and dementia.

That project, and many others like it, could now get the go-ahead. Go Commodores.

UPDATE: You may ask, as I did, well what's the value of k? That's in the eye of the beholder, says Prof. Malin.

Statistical agencies, such as the U.S. Census Bureau, have tended to lean towards parameterizations that would suggest we use k=5.

The value of k, in other words, is whatever value you think you need in order to guarantee privacy.

Topics: Security, Legal

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Easier way

    Report age, city of residence, and provide ailments. You could also include family history in the same generic way but have a linking system that links it.

    You could also provide genetic information if approved by the patient.
    • With this you don't need permission

      If the data is anonymous, if it has been made
      anonymous and guaranteed anonymous, what is this
      need for permission?

      And if you're going to require permission from
      everyone whose data you are going to analyze
      anonymously, you won't get proper data.
  • Of course permission is needed.

    The data is mine, anonymous or not.

    But of course you don't get that.

    You want to tax and impose burdens on Real Americans (y'know, the ones who are born here, and pay taxes).
    • Uh, no

      If the data is anonymous -- if it can't be identified as your data -- then where is your interest in keeping it from researchers who want to save your life, and the lives of your fellow Americans?
  • What part of my medical history do you need?

    I'm really not that concerned about medical privacy. It's over
    protected in a lot of ways.

    To start - health insurance companies have you sign away privacy
    rights in order to get a policy. That's the top need.

    Then the laws make it difficult for doctors to get information in an
    emergency when I might not be able to give it. Which is why we wear
    med alerts around our neck or wrist.

    And, to be blunt, I haven't had any medical condition that was a shock
    to my doctors. Gall bladder? Check. Prostate cancer? Check. I'll
    even admit that I had a T&A when I was in the 5th grade.

    Now it someone wants to match the T&A in the 5th grade to the
    gallstone or prostate CA - let them have at it. The T&A might not
    have a relationship that is relevant (especially since it isn't the T&A
    from Chorus Line) but researchers can find relationships in data and
    those finds might help one of us one day.

    Oh, I forgot the ruptured cervical disc that was replaced with some
    bone around 1990.
    • There can be a lot of data

      As records go electronic, the medical histories of millions can become available for use in population studies of all sorts.

      Many of the most important insights of the last 10 years came from population studies, but these were small populations -- usually defined by studies on other subjects.

      The purpose of the Vanderbilt research is to render all the data on many millions of us, through EMRs, available for medical research by making it anonymous. It's just a string of numbers. It can't be traced to you.

      Thus we can draw more conclusions, because we have larger populations to work with.