Sensored to death: How machine learning in the cloud will destroy all privacy

The combination of IoT sensors and data lakes become a powerful tool for pattern analysis, but has serious privacy implications for the consumer and the employee in the workplace.
Written by Jason Perlow, Senior Contributing Writer

In late October, I had the privilege of moderating a panel on the subject of artificial intelligence and machine learning at Unbound Miami, a boutique (but excellent) trade show that specializes in showcasing disruptive technologies such as cryptocurrency and blockchain. 

The show and the panel was highly engaging with some really great speakers, including Dr. Alex Liu of IBM, Anthony DeLima of NEORIS, Prof. Sara Rushinek of University of Miami Business School, and Ylan Kazi at UnitedHealth, and I'm glad I am finally able to show it to you.

Since the show, I have been thinking a lot about the subject of machine learning in the cloud, and how it is likely to impact our lives in the near future. There are many aspects of this technology that have potentially very positive benefits, but as with any disruptive technology, there are pitfalls.

Also: Artificial intelligence, machine learning momentum continues

For example, social media -- particularly services like Facebook, Instagram, and Twitter -- were hailed as powerful tools for enriching our lives, helping us to connect with other people as we have never before, and they heralded an age of news transparency, instant reporting, and citizen journalism. 

And these are legitimate benefits. But these benefits also come with pitfalls: A population of endless "lifestream" distracted people with technology-augmented autism; the dangers of technology-assisted gaslighting; the enablement of hate groups and conspiracists to broadcast their poison everywhere; the subversion of our democracy by enemy foreign actors; and of course, the inevitable mass personal information compromises when these sites have any number of technical API security snafus.

The same could be said for machine learning and artificial intelligence. Yes, the proliferation of sensors in health devices can, and will be, a powerful tool for early diagnosis of serious health conditions and will change the relationship between patient and clinician that will allow for a much more dynamic and real-time monitoring of health conditions before they become potentially life-threatening. 

But as with social media, cloud-based machine learning also has similar potential Black Mirror pitfalls. 

Also: AI startup Petuum aims to industrialize machine learning

Machine learning is an essential part of the digital transformation trend in the modern enterprise. The ability to gain insight into business processes through what is measurable using different types of sensors, and to correlate that data using pattern analysis, is an increasingly important capability that is quickly becoming an essential part of the overall IT toolbox. 

For example, companies like SAP, through Leonardo Intelligent Enterprise products, have brought together IoT along with finished application platforms deployed as cloud-based SaaS, which can be easily customized so that enterprises can create complex data visualizations in order to gain insight when solving complex business problems. 

Understanding patterns and trends through big data is nothing new: The National Security Agency has been doing complex signal intelligence (SIGINT) for many years in order to defend the country from terrorist and foreign threats. The PRISM program revealed as much. Likewise, the research firm NPD uses very large amounts of point-of-sale data supplied from large retailers in order to create reports about important trends in purchases of consumer electronics.

What is new is that intelligent data collection and analysis is not just for the biggest firms and institutions in the world anymore -- any company can now take advantage of it. With platforms like SAP Leonardo or semi-finished cloud services like Azure machine learningAWS Sagemaker, Google Cloud AI, and IBM Watson machine learning, the speed of development and time to market is much faster than it has ever been previously for building a complex machine learning system or application.

Also: What is deep learning? Everything you need to know

Businesses are naturally very keen to improve and streamline business processes. But the very same tools that help businesses become more agile and save money can also be used in an oppressive fashion on their employees. 

For example, Wi-Fi access points can collect data about the devices that are in proximity to them, and thus a common application in retail is to use that information in order to better understand foot traffic coming in and out of a store and where and how long customers linger. 

Must read

But, in an enterprise, this could also be used in combination with mobile device management, keylogging, and activity/presence detection for tracking the whereabouts and activities of employees, using data correlation and pattern analysis, in order to better understand employee productivity on an aggregate or even targeted basis. 

The same marketing information, should it be hosted in a "data lake" at a major cloud hyperscaler like AWS or Azure, would also not necessarily be constrained to a single tenant. That marketing data, collected by one retailer in a shopping mall, could be shared with other retailers as part of a consortium or partnership in order to develop far more complex pattern analysis applications about what and when we buy. 

Information gathered in multiple data lakes in multiple clouds could be theoretically combined to produce extremely sophisticated reports about any number of groups of users, especially if you combine this with what is known from their social media profiles, such as their likes and what they share.

Also: Amazon expands machine learning services ahead of re:Invent

Essentially, what we should be concerned with is what kind of sensor data is collected, how access to that data is given, and how it will be mashed up and used in different ways using machine-assisted analytics.

For example, wearables such as the Apple Watch can provide telemetry to health practitioners about the overall health of their patients and provide alerting and reporting so that doctors can take a more proactive role rather than acting on an acute event, such as an emergency room visit. 

But the same technology is already being used by life insurance companies, such as John Hancock through its Vitality program, in order to issue policies with rates that are influenced by the wearer's overall activity and lifestyle. 

It would not take much to extend the trend analysis that the company is almost certainly engaging in to incorporate data from the GPS receiver on that device or even on the insured's smartphone in order to understand, for example, what kinds of restaurants that person visits and, potentially, what the impact is on that person's overall health. 

So, in the future, you might want to think twice about doing that Yelp check-in at In-N-Out.

Also: How machine learning and data science give Bloomberg an advantage

There are also products on the market, such as Sprint Drive, which use cloud-based services to store GPS trip and other vehicle performance data -- data that could  be accessed by an insurance company, if access to that data was sold by the originating data provider. 

Just as one might opt-in for John Hancock Vitality, you might find yourself being offered (forced) into a dynamic policy by GEICO, USAA, Prudential, or Hartford Insurance for your vehicle if you use an automatic tracking device like Sprint Drive.

Been visiting McDonald's every day for breakfast? Have you been driving too fast down that main road? I think your premiums just went up $3,000 per year. Collectively. Remember, these are not just targeted data collections, but your neighbors are also establishing trends in your town and municipality, too, which also will potentially affect you. 

Yay, machine learning!

All these examples, of course, are just the sort of things that corporations will do in their own self-interest when given access to this kind of data they are collecting on their own for internal applications. 

Must read

We have not even begun to delve into what interested third parties will do with this data if access is sold or granted to them, nor have we explored the possibility of data lake security breaches from bad actors, or even unintentional breaches of the kind we have recently seen using bad API security at Facebook and Google.

Machine learning and data lakes are powerful tools that, like social media, can improve our quality of life, and help us to gain important business as well as personal insights that will allow us to become more agile and responsive. But we need to be extremely careful about how the data is collected, how it is shared, what it is used for, and how it is secured. I cannot think of any other disruptive technology on the market right now that has as much potential for good as it does for evil.

Will machine learning in the cloud herald an age of business process visibility and agility, or will it become our next personal nightmare that destroys our lives and privacy as we know it? Talk Back and Let Me Know.

36 of the best movies about AI, ranked

Previous and related coverage:

What is AI? Everything you need to know

An executive guide to artificial intelligence, from machine learning and general AI to neural networks.

What is deep learning? Everything you need to know

The lowdown on deep learning: from how it relates to the wider field of machine learning through to how to get started with it.

What is machine learning? Everything you need to know

This guide explains what machine learning is, how it is related to artificial intelligence, how it works and why it matters.

What is cloud computing? Everything you need to know about

An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud.

Related stories:

Editorial standards