In case you missed it, the highlights of a research study by Northwestern University published on Harvard Business Review revealed Dropbox had given them "access to project-folder-related data" over a two-year period from about 400,000 users across 1,000 universities.
The researchers initially claimed Dropbox gave them raw data, which they anonymized, but their report was updated after ZDNet reported Monday that Dropbox said it anonymized the data before handing it over.
Dropbox said in a statement that its anonymization process prevented the researchers from seeing any personal information, but it allowed them to analyze the anonymized data for patterns and insights.
It's a confusing situation -- and one that has academics rightfully angry any of their data, even anonymized, was shared in the first place. Given that academics often work on highly sensitive projects, keeping data in the cloud can be risky.
We contacted Adam Pah and Brian Uzzi, who authored the article, prior to publication, who later responded through a public relations firm, saying: "There was no data privacy issue."
"Before providing researchers the data and to protect users' privacy, Dropbox anonymized the data by rendering any identifying user information permanently indecipherable. The article now clarifies this issue," said the resesarchers' joint statement.
Given there were few concrete answers and concerns about misinformation from all sides, we reached out to Dropbox earlier today to relay some of the issues that aggrieved the academics.
We asked Dropbox to respond "on the record" to about a dozen questions. Dropbox offered a background briefing -- which would effectively prevent ZDNet from directly quoting the company or its representatives. ZDNet declined that briefing, and requested again that our questions to be answered on the record -- by either email or phone.
Dropbox asked we print their answers in full -- which we have in quotes below.
In explaining how data was shared in the first place, Dropbox said:
Granted, Dropbox isn't the first company to share its vast stores of anonymized data with academics But even anonymized data can be flawed.
We also asked how the apparently incorrect report was published in the first place.
The paper was co-bylined by Rebecca Hinds, Dropbox's own enterprise insights manager, and was also posted in a shorter post on Dropbox's own blog, which to our knowledge hasn't been changed since it was first published on Friday.
"While we approved the initial study, there were breakdowns in the editing process pertaining to the [Harvard Business Review] article. It has been partially corrected, and we are working to make further updates."
Efforts to contact Hinds were unsuccessful; she had deleted her Twitter and LinkedIn accounts earlier this week.
We also contacted Julia Poncela-Casasnovas, a Northwestern postdoctoral fellow, who said in a tweet she was the original author of the paper, but did not say why her name wasn't included as a byline on the report.
Although she could not answer many of the questions we put to Dropbox, Poncela-Casasnovas confirmed a formal correction to the report was being drafted.
"I can however confirm that none of us at Northwestern ever saw the non-anonymized Dropbox data," she said. "Also, I was the one who did most of the analysis for this research while working at Brian Uzzi's lab."
The report said the researchers were able to see information about "every Dropbox folder" tied to any given researcher and how often the folder was accessed by anyone associated with it -- but Poncela-Casasnovas refuted that in an email to ZDNet.
"As I said, the data was anonymized and aggregated before they gave it to us. This means there was no way of reverting it to know who the subjects were," she said."
"The study began in 2016 and used anonymized data from May 2015 to May 2017. Dropbox used a combination of aggregated sharing activity as well as publicly available information provided by NICO to generate a dataset of 16,000 researchers for the study."
"No researcher, at Dropbox or [Northwestern Institute on Complex Systems], had access to any user content at any time. For this project, the Dropbox researcher had restricted access to a limited subset of metadata that was relevant to the study. This access was audited and reviewed by our security team."
Dropbox did not answer some of our questions: Were free, paid, or business accounts affected? How did Dropbox determine if an anonymized account belonged to a researcher? And, are any other research projects currently underway by Dropbox or its partners?
It's yet another reminder that companies collect and store but also generate tons of data on its customers -- simply by offering the service -- and that their privacy policies often allow them to do almost anything they want with it.