Protecting data, protecting truth

A manifesto on data protection, data governance, and internationalism. Adapted from a speech delivered at the United Nations International School's UNIS-UN conference, in the UN General Assembly Hall in New York.
Written by Andrew Brust, Contributor

On February 28th, 2018, I delivered a speech at the United Nations International School's UNIS-UN Conference, held in the United Nations General Assembly hall in New York. This year's conference was titled "Under CTRL: Technology, Innovation and the Future of Work."

Though produced by students at UNIS in New York City, the conference was attended by approximately 750 students from international schools around the world. I myself graduated from UNIS in 1984; it's where I first learned to program computers in BASIC, including a DEC PDP 8 and a Radio Shack TRS-80 Model 1, dating back to 1978.

What follows is an adaptation of the speech, presented in a format more suitable to publication as a post here on ZDNet. If you wish to view the speech in its entirety, you can do so here.


Data is nothing new, nor are databases, or even analytics systems, which themselves date back the better part of 50 years.

What is new, though, is how much data we collect, how much we keep, and what we can now do with it. That has changed a lot. We used to collect data at the level of a single transaction: a purchase, for example, or a single playground inspection.

Now we track every click leading up to the purchase, and the upsell ads that were served. And maybe the NYC Parks Department, where I created database systems in the mid-1980s, starting in my Freshman year of college, is tracking entrances and exits through the park gates, or the number of baskets through a given hoop, in a given court.

With today's Internet of Things - or IoT - sensors, tracking all of that, in real time, is now quite feasible. And maybe there's even a college freshman at the Parks Department building the database that handles it.

Moreover, we can keep so much of this data now. The economics allow it, whereas it was cost-prohibitive previously. The cloud provides for cheap storage...sometimes really cheap, if you're willing to wait a few hours before it's served up. And even in the on-premises world, new distributed file systems make massive and fault-tolerant data storage possible without needing to buy expensive, proprietary storage appliances.

As I said, what we can now do with the data is even more interesting. If we're tracking clicks leading up to a purchase, we can start to predict whether someone is going to buy something, how much they're going to spend and what they're going to buy. In the case of tracking parks usage, we can predict when peak usage times will be, and thus when to deploy more maintenance workers, trash collectors and Urban Park Rangers. That can help with the budgeting process, too. Although I don't think we've yet discovered a data technology that can create much efficiency in the New York City Council.

Data points
What I like to say, when I'm feeling corny, is that "data is life." Every piece of data is a point-in-time recording of something that occurred, involving a person, an organization, a machine or groups of these things. The frequency at which we record these events now is much greater than it was. And so data has become more - shall we say - intimate.

Recording these point-in-time events means that data collection documents objective facts. And in an age when facts are disregarded, disparaged or - worst of all - falsified, this is a key facet of data and analytics that I think we must seize upon. In data lies truth. In analytics lies likelihood. It's the ultimate weapon against distortion and disinformation. It's a resource for doing good.

But data can also be used for more sinister, cynical purposes.

For example, it can be used to determine social media ad placement, targeting specific people with specific political tendencies, with content that isn't data-driven or factual at all, but rather manipulative hyperbole, at best. It can be used for get-out-the-vote efforts and election day ground game management. But it can also provide tactical advice on voter suppression. Data isn't just a resource for objective truth and good. Its predictive power can be a tool for spreading fear, uncertainty and doubt. So data can be, and has been, a tool for malfeasance.

If we look ahead at where predictive analytics may take us, it could be used to forecast opposition behavior. It could become not just a tool for small parts of a political campaign, but for planning every town-hall meeting, diner meet-and-greet, and full-on rally. It could even be used to determine messaging and policy, tailored for a specific locale. This would be policy optimized not for outcomes helpful to society, but simply to manipulate thinking and garner more votes. We might even imagine a time when predictive analytics could be used to automate and run a war. Data would become - almost literally - weaponized. And that's extremely troubling.

Keeping AI honest
Even when used for purported good, though, we have to keep our eye on things. I've been talking about predictive analytics. That's one name for it. Another, older, one is data mining. The newest name is machine learning and that, in turn, get used interchangeably with, Artificial Intelligence, or AI (even if it's not the same thing.)

I actually studied AI in college, from 1986 through 1988. Here, again, the technology is not new. But AI never really caught on then...it couldn't. Computers weren't cheap enough and weren't fast enough. So most predictive models had to be built on a sampling of the data, and even then it took forever to train the models.

Those limitations are mostly gone now. As I already said, we're in a position with storage technology now to keep tons of data, and we're in a place in computing power to build models using all of it.

We have much more powerful central processing units, or CPUs. And, more important, we now have incredibly powerful GPUs - or graphics processing units. Graphics may not sound relevant to AI but, as it turns out, technology that can do numerous complex calculations simultaneously (in parallel), which is what GPUs do, can turbo charge both graphics and AI.

In fact, NVIDIA, which started out as a graphics and gaming company, is now one of the most important companies in AI. Its GPUs are used on all the major cloud platforms, and its technology is becoming the de facto standard. As AI becomes more pervasive, the companies in leadership positions in the tech industry may well change. Keep an eye on it.

Let's demystify this though: AI and machine learning work on a fairly simple premise: by looking at how some data impacts the values of other data, statistical models can be built that predict the latter from the former. That's pretty straightforward - it's not magic. Fitting numbers to a curve allows the generation of a mathematical model that takes a bunch of inputs and returns the predicted value as an output.

A model for disclosure
But how was that model actually built? Was the data that went into it completely valid? Was it based on IoT data from sensors that were placed according to some bias? In the Parks Department IoT example, were the sensors in parks in well-to-do neighborhoods deployed more carefully than those in poorer neighborhoods? Will resources be distributed unfairly because of it?

The reality is that we just don't know. The process of building machine learning models is pretty closed. The models themselves are black boxes.

15 years ago, data mining systems were able to visualize their models, and disclose their content and structure. Today's models are more complex and the need for visualizing them is more acute. Unfortunately, doing these kinds of visualizations seems to be a de-emphasized priority in the industry.

On the one hand, we might say "who cares?" These models are used by data scientists in corporate or scientific settings, so public accountability seems beside the point. And even if the details of the model were shared, how on earth would a lay person be able to interpret it? On the other hand, we have to be vigilant. We have to notice that machine learning models are being trusted more, relied upon with fewer restrictions and are becoming more ubiquitous.

Disclosure of the model's contents, even if not interpretable by the vast majority of people, is something specialists working in the public interest could take advantage of and interpret. Transparency is a deterrent against abuse. If we're lax about it now, then by the time machine learning is pervasive in our lives, we will have ceded our rights and our responsibilities in its management. Neither of these is good.

Data ethics
Not only do we want to know how the models were built, but we need to know what data was used to build them, and we need to know that none of it was illegitimately collected. Data ethics is a real - and urgent - concern. It may be corny to say "data is life," but it does demonstrate how sensitive data can be, and how access to it needs to be restricted and protected.

You guys are probably sick of hearing people say how, by having a smartphone, you are walking around with a powerful computer in your pocket. But it's true. And your phone is also a homing device, tracking where you've been the whole time you've had it on your person.

There are about a dozen different sensors in an iPhone, tracking things like your speed, rotation, face proximity to the device, and more. As long as you have your phone on you, you are essentially an IoT device.

There's a lot of good that can be done with that data, and there's a lot of unsavory stuff too. It's good to be able to prove you weren't somewhere that you shouldn't have been. But you're still entitled to your privacy. Do you want everyone to know what section of the library you were in, and when? When you voted? When you were in a drug store and whether you were at the pharmacy counter? All of that is innocent activity, but the intimacy of it likely isn't something you'd want to share. This shows how simplistic the line of argument is that says if you have nothing to hide then unfettered data collection won't impact you.

Where do we draw the line, and why? Who has access, and under what circumstances? And what if you want to shareyour data? Maybe you want to give a commercial entity access to it, for their own market research work and you want them to pay you for the privilege. Shouldn't you have that right? Data ethics and access controls assure you not only of your right to privacy but of your own right to share and be compensated.

Lazy industry
There's a lot of questions here, and the industry has done a horribly bad job of formulating answers to them. Granted, the questions aren't easy. But they're not really that hard either. If the industry made this a priority, it would get done. And if it got done, most consumers would be a lot more confident and comfortable. Call me naïve, but I believe that, in turn, would enhance trust between companies and customers, and it would be a net positive for commerce and the economy, both online and off.

This isn't rocket science. In fact, it's not very advanced computer science. It's mostly common sense. Policies can be codified, and those codified policies can be enforced through software. The irony here, is that even if the industry encounters challenges implementing this as rule-driven programming, that AI and machine learning could likely be applied to this problem space. Predictive models could be used to detect activity that transgressed policy, right as that activity occurred.

This would help everyone. Because, oftentimes, the transgressor is someone who made an innocent mistake, and someone who would appreciate software-driven guidance on what data access is OK and what access is not. It would also put many companies at ease, as they'd be less worried that they might commit unintended transgressions.

Worrying less about unwitting rule-breaking means that companies could more purposefully pursue analytics based on truly reasonable data access, ultimately helping those companies to be more data-driven. If the industry can focus on this, there will be advantages for everyone. The only thing in the way is the predisposed notion that this problem is intractable. The way through that is to be steadfast in your determination that it is not.

Picking up the privacy ball
My generation is the one that started using all this technology. Yet, despite our pioneer status, and the fact that we care about privacy, we lost our vigilance in protecting it. The next generation - folks who are perhaps in their 30s now - didn't really have that vigilance in the first place. They didn't have a sense of a time in history where privacy protections weren't there and where abuses occurred, and so they saw our concerns about it as unfounded and arbitrary. And we acquiesced. Shame on us.

Those of you who are in high school now, who will go out into the working world in 6 years or so, and become professionally influential perhaps 10 years after that - you are the ones who can, and hopefully will, strike a balance here. You won't have knee-jerk opposition to data collection and analytics like we did at first. But you also won't be naïve to the point that you think data ethics regulations and corporate policy are unnecessary as we effectively did subsequently. You'll see that these protections better-enable the application of data and analytics technology where it's legitimate to do so. It's not just that bad stuff will be curtailed. Good stuff will be better-facilitated - I might even say emancipated.

This all can happen. It's totally within reach. But the prerequisite is that all of you make it a priority.

Global reach
And I'm not saying that generically either. We have approximately 650 students here - that's not really such a big group. But all of you are students at international schools, and your global reach, as you get older and more accomplished in your careers, will be unmatched.

You can take this message everywhere. Not just to numerous countries, but to varied communities within each country. You can take it to urban and rural communities. You can take it to conservative and progressive communities. To secular and religious ones. I'll avoid further enumerations here; I think the point is clear.

Everyone here will spread far and wide, and you can carry this message, hopefully expressing it in practical rather than political terms. At this time in the world and its politics, as so many communities, in so many places, are countering each other tribally, nothing could be more important than to carry with you a message that is apolitical and which, perhaps subliminally, demonstrates a common need and bond.

Today's politics seem to shun factual discussion, reject the notion of objective facts, and deprecate the primacy of truth. That we're having this conversation, in this building, is rather poignant, because the UN, or at least the ideals behind its founding, were fully-based on the notion of the common bonds of humanity. And yet today's politics exploit the fears and insecurities around our differences. As a UNIS alumnus, this makes me sad, frustrated, and really, really angry.

International schools
When you go to an international school, you learn a few things. Certainly, you learn about the things people have in common. But you also learn to appreciate and celebrate people's differences - they're what makes our lives and our friendships rich and interesting.

The notion and the nomenclature of "tolerance" has always really baffled me. I don't "tolerate" people who are different. I seek them out, and I enjoy learning what's unique about them and their cultures. And that's not because I'm this benevolent, evolved man. It's because I'm selfish. It's because going to UNIS made me understand how much richer life is when you have people in it who provide your social circle with variety, whether it be physical features, food, philosophy, music or outlook. And since I want the good life, I want that. The hardest thing about life post-UNIS for me, was how much less of that I had.

I am fond of saying that the most special thing about UNIS is how ordinary it is. I guess I'm being a little coy when I say that. But the point stands. Throw a bunch of people from over 100 countries within the same four walls, and every day of school is just that - a day at school. That's not a political statement. It's just...empirical.

Tribalism and racism and nationalism are arbitrary constructs, concocted by people who are already in segregated circumstances and who fear the unknown of being together with a bunch of people who are different. These fears stem from natural human inclination by the way, so looking down on people who form them probably isn't fair. They didn't go to international schools. You did. So you have a job to do, in simply telling them what you experienced and what you saw. Not with anger. Not with disdain. Not with condescension. But definitely with patience, and matter-of-factness.

Facts and data
This is a fact-based messaging exercise. Which means the absolute best way to carry your message across the world is data. Because data is life. Because it's a set of point-in-time recordings of things that happened, or didn't. They go on the record, and they make a point. Data is not opinion-based, it's just ...empirical.

That's why data protection is important. Protecting data, governing access to it, analyzing it in legitimate ways, and seeking and obtaining consent keeps data clean, and effective and powerful. Protecting data protects truth. It protects people. It protects you. It protects your loved ones. It will protect your children. It protects the ideals behind the United Nations and international schools. Data protection preserves civility and - if we handle it right - will enhance that civility, and peace and good governance too. Data and machine learning are our best allies, as long as we prevent them from being manipulated by people who wish us ill.

That's a fine line, it's a delicate balance, and a sensitive equilibrium. You guys are the ones to fight for it and maintain it. I thank you for listening, and I thank you in advance for doing this important work.

Editorial standards