Public Transport Victoria in breach of Privacy Act after re-identifiable data on over 15m myki cards released

Public Transport Victoria found in breach of Privacy and Data Protection Act after a dataset containing a record of 1.8 billion myki events was provided without sufficient de-identification.

Public Transport Victoria leaks re-identifiable data on over 15m myki cards Public Transport Victoria found in breach of Privacy and Data Protection Act after a dataset containing a record of 1.8 billion myki events was provided without sufficient de-identification.

Public Transport Victoria (PTV) has been found in breach of the Privacy and Data Protection Act 2014 (PDP Act) by the Office of the Victorian Information Commissioner (OVIC) for releasing data that exposed the travel history of 15,184,336 myki cards.

The myki dataset contained a record of "touch on" and "touch off" events recorded by the myki system between 1 July 2015 and 30 June 2018, amounting to approximately 1.8 billion events across the 15 million distinct myki cards.

Each event record comprises multiple data points, including date and time, location information, card identifier -- a unique number assigned to each myki card -- and the card type, of which there are 70 spanning student, police, and asylum seeker categories as some examples.

The data allowed for individuals to be re-identified, and their travel activity for the three years exposed.

See also: Melbourne's mobile myki use nears 100,000 users

OVIC on Thursday detailed the activities that led to the data being easily re-identified, publishing a report [PDF] on the disclosure of myki travel information.

In releasing the report, Victorian Information Commissioner Sven Bluemmel said OVIC's investigation into the release of myki data demonstrates that deficiencies in governance and risk management in relation to data can undermine the protection of privacy, even where the project is well-intentioned.

PTV mid-last year released the dataset to Data Science Melbourne for use in its Datathon. Datathon is a competition where participants are encouraged to find innovative uses for a dataset.

The data was provided by the Department of Premier and Cabinet (DPC), which administers the state government's open data platform, DataVic.

While OVIC said some steps were taken by PTV to de-identify the dataset before it was released, a Datathon participant successfully re-identified individuals. The participant raised their concern with a Victorian public sector representative.

Similarly, academics working at the University of Melbourne -- Dr Chris Culnane, A/Prof. Benjamin I. P. Rubinstein, and A/Prof. Vanessa Teague -- the same research team that re-identified the Medicare Benefits Schedule and Pharmaceutical Benefits Scheme data in September 2016 and reported in further information such as medical billing records of approximately 2.9 million Australians were potentially re-identifiable in the same dataset, in addition to previously finding flaws in the NSW voting system -- had also located the dataset online and were able to identify themselves, and persons known to them.

Both instances were reported appropriately, OVIC said.

The University of Melbourne researchers similarly published [PDF] their findings on Thursday, demonstrating the ease with which they were able to re-identify individuals.

Offering further information on the availability of the dataset, the researchers said access to it was unrestricted, with a URL provided on the Datathon's website to download the complete dataset from an Amazon S3 Bucket.

They said over 190 teams continued to analyse the data through the 2 month competition period.

See also: Researchers label Australian data-sharing legislation a 'significant misalignment'

In detailing how they were able to identify individuals, two of the authors said it was a straightforward exercise to re-identify themselves as both have their myki cards registered, however, knowing for certain one trip undertaken by a friend, the researchers were able to find previous trips made by this individual.

"This type of re-identification is particularly concerning, since it allows an individual to leverage the ease of re-identifying themselves to re-identify others, and from potentially only a single co-travel event," they wrote.

"This presents a risk for anyone who has co-travelled with someone in the past, for example, an ex-partner, a co-worker, or even just someone they went on a single date with. Due to the large amount of data provided, ie, all touch on and off events, it could allow a malicious party to determine where someone lived, worked, or socialised -- and when they visit these places and for how long."

The researchers also found the identity of a stranger in the dataset, using merely his Twitter account.

OVIC said there were flaws in the process followed by PTV in de-identifying the dataset, assessing the risk of re-identification, and deciding to provide the dataset for use in the Datathon.

As the information contained within the dataset was personal information, it must be handled in accordance with the Information Privacy Principles (IPP) in the PDP Act.

"As PTV is required under the PDP Act to protect personal information in the dataset, it is the Deputy Commissioner's view that PTV breached IPP 2.1 by disclosing personal information for a purpose other than that for which it was collected," OVIC wrote.

"In disclosing the dataset to Data Science Melbourne in or around July 2018, the Deputy Commissioner found PTV contravened IPP 2.1 and therefore interfered with the privacy of the individuals whose personal information was in the dataset. The Deputy Commissioner is also of the view that PTV breached IPP 4.1 in failing to take reasonable steps to protect the personal information contained in the dataset from disclosure.

"The steps taken by PTV in both considering Data Science Melbourne's request for the provision of myki data, and in preparing the dataset for release and use in the Datathon, were inadequate and not reasonable to protect the information contained in the dataset."

OVIC's report also said a request by Datathon in 2015 for the same information was declined because of concerns about ownership of the data.

It handed the data over last year, however, as it thought a thorough privacy impact assessment had already been conducted.

"PTV's decision-making processes were not clear or well documented and appeared to lack both the support of an effective enterprise risk management framework and suitable rigour in the application of a risk management process," OVIC continued.

In conducting its investigation, OVIC engaged CSIRO's Data61, which determined that "the detailed nature of the information in the dataset created a high risk that some individuals may be re-identified by linking the dataset with other information source".

In justifying allowing access to the dataset, PTV said: "PTV does not consider the data extract is personal information as defined in the [PDP Act]. PTV's view is that there has been no breach or contravention of the Information Privacy Principles (IPPs) as result disclosing the data extract to the Datathon".

Further justifications from the state entity included the idea that a myki card may be shared by multiple people and therefore potentially showing movements of people collectively.

"It is significant the dataset was released to Data Science Melbourne without any restrictions on its use or further dissemination," OVIC held firm.

"The Deputy Commissioner is of the opinion that the identity of a substantial proportion of the individuals whose travel movements are recorded in the dataset can reasonably be ascertained.

"The Deputy Commissioner found neither IPP 2.1(a), 2.1(c), nor any other exception to IPP 2 permitted the disclosure of the personal information contained in the dataset. In disclosing the dataset to Data Science Melbourne on or around 12 July 2018, PTV contravened IPP 2.1 and therefore interfered with the privacy of the individuals whose personal information was contained in the dataset."

SEE ALSO