Privacy in data mining is paramount: University of Newcastle

There are various methods data gatherers can use to protect confidential details of people they are collecting information from, according to University of Newcastle associate professor Dr Ljiljana Brankovic.
Written by Spandas Lui, Contributor on

There is a mining boom happening right now, and it has nothing to do with resources. Data mining has become an exceptionally profitable business as information about individuals are collected, traded, and analysed. Protecting the privacy of the data is, in turn, paramount, and data gatherers along with data miners need to employ new techniques to ensure that privacy of individuals is not compromised, according to University of Newcastle, New South Wales, professor Dr Ljiljana Brankovic.

There is a database for almost everything these days, from personal information collected by hospitals to what is being said on social media. These databases are often packaged up and sold, or handed over for research purposes to extract new insight about the public.

Removing unique identifiers from the information is commonly thought to prevent those that have access to the database from discerning confidential individual information, but Dr Brankovic said that this is incorrect. Often, patterns gleamed from databases can lead to identification of an individual, she said.

"Data gatherers have a big responsibility to make sure data is protected," Dr Brankovic said at the Open Group Enterprise Transformation conference in Sydney on Monday. "Now we have laws coming in, and it's very hard to keep up."

The new law Brankovic is referring to is the 2012 Privacy Amendment Bill, coming into effect in 2014, which will make the private sector more accountable for data privacy. Companies will also need to be concerned with the possibility of the data they collect being stolen by intruders, and this individual information will still need to be protected in some way.

For data gatherers, how can they sell databases to other companies while maintaining a level of privacy for the individuals whose information has been collected? Dr Brankovic said that there are several techniques to do that.

"Everybody should only have access to data that is of interest to them. That's clear because not only is data confidential, patterns can also be confidential," she said. "One set of protection measures is called restriction, where you do not allow full access to the data.

"Some queries will be answered and some will not — there are different ways to do it. A potential problem with this is you may not be answering something that you could, and sometimes it's still not safe."

Another way is to add noise to the data, which also has its limitations, but can be an effective way to ensure that, even if intruders get their hands on the data, it would be extremely difficult for them to extract confidential individual information from it.

The process of adding noise to the data is randomised, so patterns in the data are near impossible to discern, according to Dr Brankovic.

"You can look at the data, then add the level of noise you require," she said. "You can add noise to the data, then sell it, but your own copy will remain clean.

"You will not add noise and lose your copy — it's your data."

Regardless of what techniques data gatherers and miners employ, Dr Brankovic stressed that while privacy laws give individuals some rights to the information that is collected about them, they do not own the data itself.

The onus to protect the data is still on those that collect the data, she said.

Editorial standards