​Clear-cut definition of de-identified data critical in legislation: Pilgrim

Australia's Privacy Commissioner has said the de-identification of data is an area requiring regulation, and that agreed industry standards could be useful to fill the public with confidence.
Written by Asha Barbaschow, Contributor

A successful data-driven economy needs a strong foundation in privacy, and accordingly, good privacy management and great innovation go hand in hand, Australian Information and Privacy Commissioner Timothy Pilgrim has said.

Speaking at a data sharing and interoperability workshop during the GovInnovate summit in Canberra on Wednesday, Pilgrim said that by and large, people do want their personal information to work for them, provided that they know about it. He also noted that when there is transparency in how personal information is used, citizens should feel a sense of clarity, choice, and confidence that their privacy rights are being respected.

For Pilgrim, building trust with the public is key to the challenges big data presents for organisations, including government, and highlighted that trust is further challenged by the nature of secondary uses of data.

"Part of the solution, potentially a significant part I suggest, lies in getting de-identification right," he said.

"This includes ensuring that government agencies, regulators, businesses, and technology professionals have a common understanding as to what 'getting it right' means.

"At the moment, that common clarity is not evident."

While Pilgrim said that de-identification can be a smart and contemporary response to the privacy challenges of big data, which he said aims to separate the "personal" from the "information" within data sets, the commissioner highlighted that there was no clear-cut definition of how far-removed personal identifiers needed to be before the dataset is considered de-identified.

"I stress as privacy commissioner that de-identification is not the only approach available to manage the privacy dimensions of big data, but we are keen to explore its potential when done fully and correctly," he said.

"That potential could include the ability to facilitate data sharing between agencies, and unlock policy and service gains of big data innovation, whilst protecting the fundamental human right to privacy.

"That is a great prospect, and one worth pursuing."

The Pilgrim-hosted discussion comes after Australian Attorney-General George Brandis introduced legislation into the Senate last month that criminalises the re-identification of de-identified datasets that are collected and published by the Commonwealth.

"De-identification may prove to be an effective way of protecting the personal information of individuals in large data sets," Pilgrim said. "In doing so, de-identification could support large data-gathering projects by building community confidence that personal information will be protected."

Pilgrim said a common understanding of de-identification standards is yet to be reached, a view shared by all seven on his panel colleagues. However, to Gemma Van Halderen, this is part and parcel of her day-to-day duties at the Australian Bureau of Statistics (ABS) as the GM of Strategy and Partnerships.

Van Halderen is working in an area she calls "official statistics", where de-identification means removing personal identifiers like names or addresses. She said, however, that removing names or addresses is not enough for her business.

"In the statistical land, we actually call that secrecy or confidentiality. In other sectors it's called anonymisation," she said. "In the case of the ABS, we actually not only uphold and respect the Privacy Act, but we also have our own legislation. We also have to protect secrecy ... we actually have this whole gamut of things that we have to do."

She said as part of the information the ABS publishes on a daily basis, the organisation is required to assure that it is not able to reasonably ascertain an individual or a business from that data.

"What we have to do is make sure that our methods and technologies, our capabilities and our skills, constantly keep up with the increasing risks of re-identification," Van Halderen said.

"Our methods today are not the same as we used 10 years ago, 20 years ago, so even though we may now be doing data integration as part of our statistical suite of tools, our re-identification requirements are the same."

Additionally, Van Halderen said not making datasets available to the public would be a step back for Australia.

The Privacy Amendment (Re-identification Offence) Bill 2016 [PDF] will be retrospectively applied from September 29, criminalising the re-identification of de-identified personal information and the disclosure of re-identified personal information.

The laws, however, allow the attorney-general to declare a particular entity is exempt for the purposes of public interest. Specifically mentioned were cases of research involving cryptology, information security, and data analysis, or "any other purpose that the minister considers appropriate", but Wednesday's panel brushed off such exceptions when mentioned by the chair, saying there were in fact not any.

Brandis flagged in September that the government would be introducing such legislation to amend the Privacy Act for the purposes of protecting anonymised datasets, and said at the time that "privacy of citizens is of paramount importance" to the government.

In September, the Department of Health said it had pulled a public dataset from data.gov.au after it was revealed that certain information regarding the Medicare Benefits Schedule and Pharmaceutical Benefits Scheme was not encrypted properly.

Health said in a statement that the decision to remove the dataset containing de-identified medical data it released in August came after the department was alerted by a team of researchers at Melbourne University, who said it was possible to decrypt some service provider identification numbers from the data openly available to them.

At the time, Minister for Health Sussan Ley apologised for the breach, reaffirming that no patient information had been compromised in the process. She also pointed to Brandis' legislation amendment, saying the government had worked swiftly to tighten privacy laws, moving a day prior to make it illegal to re-identify de-identified government data.

The de-identified data was uncovered by panellist Dr Vanessa Teague, who was working at Melbourne University in her capacity as a cryptographer. Teague was also one half of the team that discovered the electronic voting system developed by New South Wales Electoral Commission was vulnerable to the FREAK attack early last year.

Speaking on Wednesday, her cryptography experience was drawn on to explain that it is not necessarily certain identifiers -- such as name or address -- that would allow a malicious attacker to identify someone; rather it is the information itself.

"The dichotomy between personally identifiable data and information about the person is really a false dichotomy, because if you know a few points of information about a person like where they were at three or four different locations and times, then you can use that information about that person, even if their name isn't on that record, to figure out which record is theirs," she said.

"I think we should be very careful with the difference between something that satisfies the legislation or something that allows that particular dataset to be put outside the scope of the Privacy Act for legal reasons, versus the actual mathematical fact of whether that dataset really does protect the privacy of the individuals within it from a sufficiently well-informed attempt of re-identification."

As a researcher, Teague said part of what she does not like about the new legislation and changes to the Privacy Act is that it does not make the distinction between re-identifying data that does not actually harm anybody, and computation that does harm somebody but does not explicitly mention their name.

To Pilgrim, the issue of determining clear-cut definitions and rules is complex, including the question of how organisations get the social licence to use personal data of citizens for reasons which may be for their own profit or how governments can use data in a way that benefits communities.

"They are big questions we have to debate, and we have to debate the use of that information with our rights as individuals to protect our own personal information," he explained.

"I hope you can understand that complexity and have some sympathy for my office ... in order to develop guidance that I hope can help government agencies, the private sector, and the community to understand what we can do to protect personal information in the era of big data, and recognise that it is important to make sure that this data does get out there because there are great social benefits in doing that."

Pilgrim said he will be taking the discussion public to involve those that want to have their voice heard in developing guidance on the de-identification of data.

Editorial standards