NSW Data Analytics Centre sets goal to create de-identification data standards

The standards would clarify how to best measure the amount of personal information that is linked to de-identified data sets, NSW chief data scientist says.

NSW Data Analytics Centre CEO and NSW chief data scientist Ian Oppermann has announced that the NSW government wants to standardise the way it measures the amount of personal information used in linked de-identified datasets.

"This measure of personal information will hopefully start to be standardised," he said on Tuesday.

"We will ask the folks who developed the cybersecurity standards, the SE2700 Series cybersecurity standards, to start to work on taking what we've done so far and build it into a standard.

"Two years from now, we will have a standard in this space."

Oppermann said the standard setting would be coming off the back of ongoing work the NSW government agency had undertaken over the last three years alongside the likes of the Australian Computer Society (ACS), Standards Australia, Data61, the Australian Bureau Statistics, and every state and territory government on the mainland to determine whether measuring the amount of personal information in linked de-identified data sets was possible. 

"If we take this dataset and link it with this dataset that are both de-identified, what does that do in terms of the measure of personal information and can we put a measure on it," he said.

"And if we link more and more datasets together, how far that payload of information goes, and what becomes the risk of re-identification, it turns out to be a very, very subtle and complex activity."

Speaking at CeBiT Australia, Opppermann suggested that introducing a standard would sway citizens to be more trustworthy of government, which ultimately would enable agencies such as the NSW Data Analytics Centre to continue to use datasets to generate insights while protecting individual privacy.

"Without getting privacy rights, linking datasets and generating insights without the protection of protecting individual privacy is something that creates a whole world of pain when we're not doing the right thing with the datasets we're linking together," he said.

Ultimately, Oppermann said the end goal would be to put the centre in a position to create "evidence-driven policy, and real transparency about what [the NSW government] is trying to achieve and how we're measuring what we're trying to achieve. A focus on outcomes, not a focus on activity."

See also: Australia's open data approach lands in a security and privacy minefield (TechRepublic)

Linking data to improve out of home care

Using de-identified datasets linked across Health, Education, Justice, Family and Community Services, Transport, and Department of Industry, the NSW Data Analytics Centre has been able to assist the state government with making data-driven reforms to its Out of Home Care (OOHC) program, which is designed to help young people in risk of harm.

"Data is a record of what's happened in the past. The outcomes we are looking at are a way of describing what we're trying to achieve and understand those factors of risk, which leads to either the outcome we're trying to achieve or adverse versions of those things," he said.

According to Oppermann, once completed, the reforms would become the basis for helping the federal government to build similar datasets which could be modified for the National Disability Insurance Scheme, as part of an agreement that was signed between the NSW Data Analytics Centre and the federal government in September.

"Initially this project will see NSW, Queensland, Victoria, and South Australia linking with the Commonwealth … linking data from NGOs, linking data sets from providers -- all de-identified, all using the gold standard of de-identification, which in NSW is the Centre for Health linkage, with the Commonwealth it's the Australian Institute of Health and Welfare," he said.

Related Coverage