On the dangers of DNA data: Genealogy tests, Elizabeth Warren, and the end of privacy

Many of us simply want to know more about our family heritage. Yet, genomics and big data may be making it possible to weaponize DNA. Is this another nail in the coffin of personal privacy?

Last year, I found out I had a very small amount of Native American DNA. Now, I've learned that I actually don't. Why? The short answer is analytics and big data are getting more accurate. The long answer is this article. Read on.

cnet directory

Best DNA Ancestry Testing Kits

Looking for the best DNA test kits for 2018? Here are your best options.

Read More

When I was a young boy, my parents and grandparents told stories about their heritage, their families, even bits about the old country where their parents and grandparents grew up. I don't remember many of their stories, because while they were sharing their heritage with me, I was thinking about moon landings, baseball games, and the science fiction books I was reading.

I mean, when you're nine or ten years old, who understands the significance of that kind of family stuff, right?

Also: Before taking that DNA test: Six things you need to know

Over the years, bits of those stories came out in conversation again. I was always more interested in technology, and after all, I'd already heard them tell those old stories. Worse, they weren't even consistent in the telling. Sometimes, we were from Russia. Sometimes, one grandparent or the other was from Hungary. Sometimes some relatives were from Austria.

On the rare occasion when I asked for clarification, one elder or another would hold up a hand, shake a head, make a dismissive shooshing sound, and finally admit after being pressed further, "Well, I don't know."

And that was it. That was pretty much my full understanding of my family's background until just a few years ago.

When I reached middle age and had a family, I started becoming more curious. I asked my parents, who were by then quite elderly, for details about some of the stories I vaguely recalled. Unfortunately, by then, they'd become forgetful about some of the details and couldn't give any clear answers.

About a year after both parents passed away, I decided I wanted to learn more. My wife was also curious about her heritage. We bought two Ancestry DNA kits and an Ancestry.com subscription, and I started to do some research.

See also: Best DNA Ancestry Testing Kits

Our searches on Newspapers.com (which I subscribed to along with my Ancestry subscription) helped answer some long-time mysteries about my wife's family. What little I found out about my family wasn't unexpected. My grandparents were from Poland, Russia, Hungary, the Ukraine, and Austria.

Of course, since the late 19th and early 20th centuries, the map has changed. A lot. For example, when my Austrian ancestor left Europe, she didn't live in either Hungary or Austria. She lived in the Austro-Hungarian empire.

You can begin to see how both geopolitics and my lack of attention could cause the stories to blur.

When I was first told about my family heritage, it was during the sixties and seventies. At that time, the Ukraine was a Soviet state, part of what most people would have described as Russia. It's just as likely, therefore, that my family elders would have said they came from Russia as they would would have said they came from the Ukraine.

techrepublic

The beautiful, terrifying future of DNA sequencing

When your genome costs less than your iPhone:

Read More

Verbal history is inaccurate in that way. I know I barely listened when my parents talked about their ancestry. It's very likely that neither my dad or mom paid all that much attention during their childhood either. So their stories, the passed-down family history they tried to teach me, could very well be inaccurate, poorly remembered, or even confused because of changing national boundaries.

Here's another example. As far as I can tell, one of my great grandparents (the father of my grandfather) was from Galicia. This Eastern European land has a rich history, but the most interesting detail is that it sits right between Poland and the Ukraine.

Today, the region where Galicia was (and, just to be clear, we're not talking about Spanish Galicia) is mostly in the Ukraine. Back in the day, Poland reached further east, so much of Galicia was in Poland. So was my great grandfather Polish or Ukrainian? Back in the 1960s, Polish jokes were all the rage, so it's entirely possibly my family might have described that great grandfather as Russian to avoid Polish prejudice. I don't know.

What I do know is that my entire family tree appears to have originated in Eastern Europe.

DNA says Native American

This is why I was rather surprised, 18 months ago, when my DNA results came back from Ancestry. According to the results, I was two percent Scandinavian, had some DNA from Great Britain, Ireland, Scotland, and Wales, and even had a very small amount (less than one percent) of Native American DNA.

dna-origins-2018-10-23-15-02-32.jpg

This is the first DNA result I got back from AncestryDNA.

I found these results baffling. After all, I'd been told that my family had descended from Eastern European peasants, most of whom arrived in the new world from about 1890 to 1920. Besides, I personally knew everyone in my family tree from that time period forward.

I thought that perhaps a Native American somewhere way back in history had traveled to Europe, had children with a native Eastern European, who then became my uber-great-great-grandparents. It was possible, but it was also unlikely.

I probably would have remembered any tales about Native American, Scandinavian, British, Scottish, or Irish ancestry, but I'm pretty sure no one ever mentioned anything like that.

DNA says not Native American

Then, about a month ago, I got an email from Ancestry informing me that my DNA ethnicity estimate had changed. As Ancestry says, the science used to analyze ethnicity changes over time.

dna-origins-2018-10-23-14-59-20.jpg

Ancestry says their results change as science improves.

For the first DNA ethnicity estimate from 18 months ago, Ancestry used 3,000 reference samples across 363 possible regions. For the second DNA ethnicity estimate, Ancestry used 16,000 reference samples across 380 possible regions.

dna-origins-2018-10-23-15-03-31.jpg

There's a lot more data here to work from,

That improved sample pool enabled them to clarify my DNA ethnicity estimate. The new results dropped any reference to British, Irish, Scottish, Scandinavian, and Native American DNA. Instead, 99 percent of my DNA comes from what they call Western & Central Europe.

dna-origins-2018-10-23-15-35-18.jpg

Far more clarified and in line with the family folklore

Based on what I previously knew, and the family research I did, this estimate makes a lot more sense.

The only confusing factor is that they call the region Western & Central Europe when the countries they mention (Ukraine, Moldova, Romania, Poland, Slovakia, Hungary, and Moravia) are generally considered to be in Eastern Europe. We'll talk more about the issues these regional labels bring up later in this article.

Elizabeth Warren and DNA

I got Ancestry's notice about the change in my DNA ethnicity estimate right about when Senator Elizabeth Warren released her DNA results to the public a few weeks ago.

Just about the time the minimal Native American ethnicity estimate was removed from my analysis, Warren produced what she said was strong evidence of proof of her claims. She has previously said she was told by her grandparents that she was part Native American.

Now, before I go deeper into this topic, let me caution you. I'm here to talk about science and sociology, not politics. I am neither trying to justify Senator Warren's claims, nor dispute them. I'm also not going to get into a discussion about whether her claims or actions were right or appropriate or wrong or an example of cultural appropriation. Instead, I want to show how heritage, family folklore, and science can inform the stories we tell our children.

I was curious, though, about the report Warren released. In light of all the fuss, was there any scientific basis to her claim of ancestry? To learn more, I reached out to another DNA firm, Living DNA. They were one of the companies I profiled in CNET in my article on DNA ancestry testing kits.

According to Living DNA co-founder David Nicholson:

Living DNA, which includes four Native American regions in their ethnicity database and is expanding further, has reviewed Prof. Carlos D. Bustamante's analysis of Senator Elizabeth Warren's ancestry and agrees with the method and conclusion that the results would suggest that Senator Warren has Native American Ancestry.
The conclusion of the scientist is accurate based on the science presented. If more Native American DNA was available, the precision would increase further but it would not change the conclusion that Senator Warren has Native American ancestry.

There has also been much discussion about whether or not Warren has a valid claim to tribal membership. It's important to make a key point here. A DNA ethnicity estimate is very different from membership in a tribe or community. According to Cherokee Nation Secretary of State Chuck Hoskin Jr.:

A DNA test is useless to determine tribal citizenship. Current DNA tests do not even distinguish whether a person's ancestors were indigenous to North or South America.

The granularity of region is particularly interesting. If you recall, my DNA test says "Western & Central Europe" when the countries mentioned are mostly in what we would generally consider Eastern Europe.

The statement that Hoskin made, that DNA testing today can't distinguish between native North or South America ancestry, has been reflected in some conservative coverage of Warren's DNA release. These outlets claim that Warren matches "natives of Latin America", rather than those that are Native American.

Regional attribution is a component of genomics science, and over time it will probably get more accurate. As we saw with my ethnicity estimate, Ancestry's sample size increased from 3,000 to 16,000. That's more than a five-fold increase, in just 18 months.

What makes a DNA ethnicity estimate change?

I reached out to Ancestry to discuss this. While they didn't feel comfortable discussing the Elizabeth Warren story, an Ancestry spokesperson did explain why my ethnicity estimate changed over the space of 18 months:

We are always looking for new ways to enhance our customers' experiences and support them on their journeys of self-discovery. Genomics is advancing rapidly and, as a leader in this field, we remain committed to investing in 'what's next'. By leveraging improvements in Genomics and an increase of more than 13,000 samples in our ethnicity reference panel, we developed a new algorithm that determines customers' ethnic breakdown with an even higher degree of precision.
With the new algorithm, customers may see notable changes, such as increases or decreases in percentages from their ethnic regions. Additionally, increased precision allows us to have more confidence in a customer's results which means that low confidence regions from previous results may disappear entirely.

The granularity for some regional labels is changing as well. Ancestry explained how the labels for regions get more precise over time:

We've used the expanded reference panel and updated algorithm to add more specific regions in Asia and Europe. For example, we have expanded our regions in Asia to include Western and Central India, the Philippines, Japan, Korea and Northern China. Also, Scandinavia can now be reported more specifically as Sweden or Norway.

Ancestry also pointed to a blog post and a video, which you can check out here.

So while Professor Bustamante described Senator Warren's DNA ancestry as Native American in his analysis, it's entirely possible that right now, there's not really enough information to determine where in the Americas that indigenous ancestry originates. Over time, however, the precision with which geographic origin is determined (or guesstimated) is likely to increase with larger and larger datasets.

411-1: Too much information

What you're seeing here is science used to either justify or marginalize political advantage. Is it possible Warren's grandparents told her the story? Sure. But should anyone's heritage be fodder for poltical debate in a country that ostensibly judges based on merit? Is that even a question we should be asking? What assumptions about race and power does the question reveal?

As I described at the beginning of this article, my family tried to pass on bits and pieces of their stories to me during my youth. Even if I had been a better listener it may not have helped, because the stories themselves were based on details from a changing world, a world where nations and their borders have changed repeatedly over history.

In trying to understand heritage based on geography, it's important to understand where in time the geographical claim is being made. My case is a good example. Is a story about being Russian different if told in 1978 than if told in 2018? Was that 1978 story referring to the Russia of 1978 or of 1908, 1918, or 1938?

To link national identity with geography, it's critically important for the story to ground itself in terms of reference dates. Most of our family folklore doesn't have that level of detail to share.

The stories we tell our families are based on hearsay, lore, and, now, sometimes science. But what are the implications of weaponizing our family heritage?

Two-time Jeopardy champion and screenwriter James Erwin makes a strong point writing in Slate. He contends that some politicians might be bullied into releasing their genetic information, while others might selectively release DNA results to curry favor with certain ethnic groups.

Where do we draw the line on disclosure? I shared with you some of my genetic history because my job is writing about these topics, and using a personal illustration makes it more meaningful. But it was an uncomfortable decision (and discussion with my family) before I decided to share even the limited information in this article.

Will we be required at some point in the future to disclose our DNA results to get a job? Will insurance companies routinely take a cheek swab before determining eligibility or rate levels?

What about privacy? Columbia University computer scientists recently completed a study that estimates that nearly 60 percent of Americans of European descent can be identified solely by DNA results and a small amount of biographical information. As more and more consumers subscribe to DNA testing kits, more data will become available, and more people could therefore be forensically identified.

Many DNA test purchasers simply want to know a little bit more about their family history and heritage. But this may be a problem that will haunt us and future generations. Throughout the years, we have seen how technology has no innate morality, it just "is". It's up to us, and how we use these marvels will determine whether we're amazed or horrified.

Also: The startling future of DNA genome editing TechRepublic

DNA research and consumer availability is reaching the point where we're soon going to need to make some big decisions. Will DNA be used as yet more munition in our increasingly hostile political battles?

To be honest, I'm quite concerned. After all, if a progressive like Warren, who actually sponsored the 2016 Genetic Research Privacy Protection Act, and therefore should understand the risks of DNA disclosure, is willing to set a dangerous precedent with DNA, how safe should we feel when policy-makers and candidates decide to wield the DNA hammer in pursuit of their agendas?

Warren's 2016 DNA privacy legislation ultimately didn't pass. Perhaps that's our answer. It seems like we're probably not going to see much privacy protection when it comes to genetic research.

Hey, if broadband providers are going to be allowed to monitor our traffic, and if we insist on publishing every move we make on social media, who cares if another area of our privacy is mined by corporations, politicians, and enemy actors?

That's gallows humor. I care. And I'm betting you do, too.

DNA research is fascinating, but where it's going is pretty troubling. What do you think the scariest scenarios will be? What can we do to make sure DNA data doesn't become a destructive force? Do you have any plans on having a DNA test done? Share your thoughts in the comments below.

One request though: please keep the topic DNA, privacy, technology, and policy. Let's not let this devolve into a political rantfest, okay?

RELATED AND PREVIOUS COVERAGE

In the future, not even your DNA will be sacred (CNET)

Even if you haven't shared your DNA with a genealogy website, chances are you're identifiable now. (Spoiler: Your third cousin sold you out).

Apple to US users: Here's how you can now see what personal data we hold on you

Apple's privacy tools now go beyond Europe, so more now get to download the personal data it has collected.

92 million accounts for DNA testing site MyHeritage found online

The company announced the exposure revealed email addresses and hashed passwords.


Genealogy company Ancestry migrates entire infrastructure to AWS

Ancestry is a 34-year-old company and is rarely mentioned for its technological prowess, but it deals in data at a massive scale.


You can follow my day-to-day project updates on social media. Be sure to follow me on Twitter at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.