Study shows credit card metadata is not as anonymous as thought

Only four vague points of information are sufficient to identify individuals through credit card metadata, and that doesn't include name, address, or credit card number, a study by MIT has revealed.
Written by Aimee Chanthadavong, Contributor

Research by the Massachusetts Institute of Technology (MIT) has revealed that individuals can be identified through credit card metadata.

MIT researchers have reported that they do not need an individual's name, address, or credit card number, which are typically thought of as personal information, to identify people within a data set.

Instead, they say that using four vague pieces of information, such as the dates and locations of four purchases, are enough to identify 90 percent of the people in a data set recording three months of credit card transactions by 1.1 million users.

"We are showing that the privacy we are told that we have isn't real," study co-author Alex "Sandy" Pentland of MIT said in an email.

Going further, the researchers looked at whether it was enough to identify an even larger percentage of people in the data set if it used three data points, which included at least one that revealed the price of the purchase. In the end, the research revealed that there was a 94 percent chance of extracting credit card records from those of a million other people.

The MIT researchers also looked at whether they could preserve anonymity in large data sets by intentionally making the data less precise, in order to examine whether preserving privacy would still enable useful analysis. But the researchers found that even if the data set was characterised as each purchase having taken place in the span of a week at one of the 150 stores in the same general area, four purchases would still be enough to identify more than 70 percent of users.

The study shows that when we think we have privacy when our data is collected, it's really just an "illusion", said Eugene Spafford, director of Purdue University's Centre for Education and Research in Information Assurance and Security. Spafford, who wasn't part of the study, said it makes "one wonder what our expectation of privacy should be anymore".

"It is not surprising to those of us who spend our time doing privacy research," said outside expert Lorrie Faith Cranor, director of the CyLab Usable Privacy and Security Laboratory at Carnegie Mellon University.

"But I expect it would be surprising to most people, including companies who may be routinely releasing de-identified transaction data, thinking it is safe to do so."

This research by MIT plays into the ongoing debate on whether the Australian government's intentions to introduce mandatory data-retention legislation are justified, and whether it will be a breach of individual privacy rights.

Currently before the parliament, the legislation would require telecommunications companies to retain an as-yet-undefined set of customer data for a minimum of two years, which law-enforcement agencies will be able to access without a warrant. The set of customer data that telcos may potentially be required to obtain includes call records, IP addresses, and billing information.

The Parliamentary Joint Committee on Human Rights raised its concerns in a recent report, saying that the proposed Bill is "very intrusive of privacy".

"A requirement to collect and retain data on every customer just in case that data is needed for law-enforcement purposes is very intrusive of privacy, and raises an issue of proportionality," the committee said in its report.

Similarly, the AIMIA and the Australian Information Industry Association (AIIA), which both represent tech companies including Apple and Google, have questioned whether the legislation would be a breach of privacy of all Australians, and whether it is the most effective means to go about protecting public safety and security.

The telcos have also already warned the government that such a scheme would not only be costly, but would also put its employees in a position where they will be responsible for responding to requests by law-enforcement agencies and deciding what requests to approve or reject.

Additionally, Telstra has cautioned the government that the centralised system where the metadata will be stored will be a potential goldmine for hackers. It is a similar warning that Australian Privacy Commissioner Timothy Pilgrim issued last year, when he indicated that the retention of large amounts of data will inevitably increase the chances of privacy breaches occurring.

In the federal government's defence, the Australian Attorney-General's Department stated that the leaks on the US National Security Agency's (NSA) surveillance operations by whistleblower Edward Snowden led to the need for the introduction of the data-retention legislation.

"Telecommunications data is becoming increasingly important to Australia's law-enforcement and national security agencies as they lose reliable access to the content of communications. This threat has increased significantly since the Snowden disclosures. As such, even where agencies cannot obtain the content of the communications, they have historically often been able to use metadata to determine how and with whom a person has been communicating," said Anna Harmer, the department's acting first assistant.

"The ability of agencies to map networks through metadata is an important investigative tool."

Further, Attorney-General George Brandis has previously said that the legislation is needed because there aren't any pre-existing metadata laws, adding that the regime will only apply to the "most serious crime. Only to crime, and only to the highest levels of crimes".

The regime has been backed up by Australian law-enforcement agencies, which claim that access to the data without the need for a warrant will be valuable for criminal investigations.

However, when the Australian Federal Police (AFP) and the Australian Security Intelligence Organisation (ASIO) were questioned during a Senate Estimates hearing in December, both agencies were reluctant to reveal exactly what data set they want. Instead, jumping to their defence, Brandis said during the hearing that the data set would be defined after negotiations with telcos.

With AAP

Editorial standards