Apple stores your voice data for two years

Apple stores your voice data for two years

Summary: The iPhone and iPad maker holds on to the data from Siri and Dictation for two years, so long as it abides by its own privacy policy — which, as you might expect, is fairly vague.

SHARE:
TOPICS: Privacy, Security
17

Apple disclosed today that it stores the data created when people use Siri and Dictation, two voice-driven services found on its mobile devices, for two years.

The disclosure comes after one civil liberties group warned that Apple isn't doing enough to inform its customers of their privacy rights.

cnet-dictation
Siri, Dictation: Same thing, different platform (Image: Josh Lowensohn/CNET)

The Cupertino, Calif.-based company's privacy policies on its Siri and Dictation services do not disclose exactly how their technology works or how for long the company stores customer data. Both services were first made available to the public as standard features in October 2011.

Some companies, such as IBM, banned use of the services in their workplaces because they could not guarantee the security of their data.

This morning, Wired's Robert McMillan lifted the lid on how long Apple stores the data: up to two years, according to Apple spokeswoman Trudy Muller.

Siri (found on iOS devices) and Dictation (found on both iOS and OS X devices) take voice-input data and send it, over the air, to Apple servers. A random number is generated to anonymize the user, and the data excludes a user's email address, phone number and Apple ID. 

After six months, the data is "disassociated" from that random number. Apple then uses it to "generally improve [Siri/Dictation] and other Apple products and services." The company says the data may include "related diagnostic data, such as hardware and operating system specifications and performance statistics."

These files are held for up to 18 months.

Speaking to Wired, Apple said that turning off Siri immediately deletes the random number identifier and "any associated data."

"Dictation is turned on as part of Siri," according to Apple support documentation. Both Siri and Dictation fall under the same service-level agreement and thus the same privacy policy.

However, if Siri and Dictation remain on and are not disabled, any voice-input data and corresponding personal information will be retained for up to two years from the time it was first entered into your compatible iOS or OS X device.

Google takes a similar approach. The search giant offers similar voice services, and the company anonymizes data after two years to use it to improve its speech recognition service. Google says it has "no way" of telling who spoke a particular query.

But the devil is in the details. 

Apple, its partners and its peers may not be able to tell who spoke a particular query after a set amount of time — for Google it is never; Apple has a six-month limit — but the contents of that voice data still remain on their servers.

Such content could range from an innocuous Siri request such as "What was the score of that game last night?" to a sensitive, legally regulated dictation that reveals the precise time a company plans to file for an initial public offering. It is the latter scenario that is of greatest concern.

For this reason, IBM last year banned Siri on its corporate network. Chief information officer Jeanette Horan said the computing giant is "extraordinarily conservative" about computer security, and suggested that "spoken queries might be stored somewhere." 

Nicole Ozer, the technology and civil liberties policy director at the ACLU of Northern California, told ZDNet in a phone conversation that Apple "may be storing confidential business information on its servers."

"Apple can be collecting personal information about who you are, who you know, where you go and what you do," she added. 

Siri's privacy policy: Clear as mud

Apple's Siri and Dictation privacy policy [PDF] explains that customer voice data is sent to the company for server-side conversion, and says that personal data may be recorded:

When you use Siri or Dictation, the things you say will be recorded and sent to Apple in order to convert what you say into text and to process your requests. 

Your device will also send Apple other information, such as your first name and nickname; the names, nicknames, and relationship with you (e.g., "my dad") of your address book contacts; and song names in your collection (collectively, your "User Data").

The statement "User Data" is key — this is the content stored on your phone that can be physically read, including text-based details, contacts, notes, and calendar entries.

Though it is publicly available, it is not easy to find this privacy policy. It can only be found on devices that support the services, and the online version does not mention anything about data deletion, storage duration or any other elements about which the ACLU is concerned.

"A broader issue is right now there's no link to Siri's privacy policy from Apple's website, and it's really important for Apple to make its policies clear for those who not only own an compatible device but those who are purchasing a device with Siri preinstalled," Ozer said.

The only way to access Siri's privacy policy is by accessing it directly the iPhone 4S, iPhone 5, and certain iPad models.

Here's a snippet from what it says:

If you turn off [Siri/Dictation], Apple will delete your User Data, as well as your recent voice input data. Older voice input data that has been disassociated from you may be retained for a period of time to generally improve [Siri/Dictation] and other Apple products and services. This voice input data may include audio files and transcripts of what you said and related diagnostic data, such as hardware and operating system specifications and performance statistics.

(Because the two policies are otherwise identical, [Siri/Dictation] are interchangeable.)

Information collected by Apple will be "treated in accordance with Apple's Privacy Policy," the company says.

That privacy policy states:

Apple makes it easy for you to keep your personal information accurate, complete, and up to date. We will retain your personal information for the period necessary to fulfill the purposes outlined in this Privacy Policy unless a longer retention period is required or permitted by law.

There is also this following nugget to consider, which comes at a time where California is mulling over a "right to know" data law that would go above and beyond what EU citizens have. It has yet to be implemented, but other states may follow suit.

More:

For other personal information, we make good faith efforts to provide you with access so you can request that we correct the data if it is inaccurate or delete the data if Apple is not required to retain it by law or for legitimate business purposes. We may decline to process requests [...] for which access is not otherwise required by local law.

According to Ozer: "In the U.S. there isn't a comprehensive privacy law that control what companies can do with the data that they collect. Apple is only required to do what it says in its privacy policy."

"Its privacy policy requires that if somebody uses Siri then they agree that their voice data and user data will be collected — even after they turn it off, their older voice input data can be used to 'improve' Apple and Siri services."

Can Apple hold onto this data for as long as it wants?

Under most major jurisdictions, including the U.S. and in the EU, Apple can. It just chooses not to.

While the U.S. doesn't have data retention laws, the EU does — albeit controversially. But EU law doesn't apply in this instance. 

According to a European Commission spokesperson, the EU Data Retention Directive does not apply to Apple. Vague as it may be, Apple and Google and its peer companies are not classified by EU authorities as a "telecommunication service providers or operators."

ISPs and telecom firms must hold onto communications data for a period of six months to two years under under the EU Data Retention Directive. This allows governments from around the world to request access to that data, including IP addresses, time and date of emails, phone calls, and text messages, so long as a court requests it.

The EU forces companies to hold onto your data for a set amount of time, but it also dictates that it must be destroyed after a certain amount of time or when it is no longer needed. This policy has been controversial, and there are ongoing disputes over the legality of it, and whether it has even been fully implemented in EU member state law.

Recently, EU regulators warned Google that it must clarify how long it stores user data for under its new merged privacy policy. The search giant was told to modify its privacy policy after regulators found that it may not be in compliance with EU law. Further, Microsoft, Yahoo, and Google were told by the European Commission's privacy group, the Article 29 Working Party, to limit how long they store identifiable information.

Apple may also be under this requirement. To date, the company has not been targeted because it has not been the subject of any complaints.

Bottom line

A lack of clarity and spotty availability of otherwise public information make it difficult for businesses and consumers to make informed choices about their purchasing decisions. There is little upside to this inconsistency. Though for the enterprise, the truth may be a bitter pill to swallow.

Topics: Privacy, Security

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

17 comments
Log in or register to join the discussion
  • Makes me want to go out and get an iPhone 5s...

    You can imagine* what people will not say into a phone just to poke fun at Apple and who they lobby to get special entitlements from...

    Oh, customers not having all the details upfront is to be expected when making purchases. If people knew it and actually gave a darn about it, this economy would COLLAPSE in no time.


    * great song, at least at superficial value - maybe its listeners will realize that one's actions are what make dreams and imaginations come alive... but then, a self-obsessed wife-beater that beat both of his wives at one point or another wrote that song in 1971 so whatever... humanity is strange. But like I said, it's what people know and do, and anyone can be hornswaggled at any time. This is why we have laws and regulations to adhere to, because not everyone can be omniscient, omnipotent, and omnipresent, all the time... but who says we have a society?
    HypnoToad72
    • oops, "true" not "alive"

      More about that sad little songwriter here:

      http://listverse.com/2012/05/12/top-10-unpleasant-facts-about-john-lennon/
      HypnoToad72
  • *Shrug*

    They anonymize and use the data, presumably to improve service. How's this different than Google using your search data to improve searches?
    edelbrp
    • The difference is that you are not Apple's product

      while you ARE Google's product. In other words, I'm more comfortable letting Apple cache or store my data than I am Google.
      baggins_z
  • Siri

    The wire tap that talks back...
    eMJayy
    • Don't worry, Verizon, AT&T and the likes store ALL of your SMSs and talk,

      ... location data for as long as SEVEN YEARS.
      DDERSSS
    • It's only news because it's apple.

      It's anonymous from the moment you send it; it's not linked to your apple if, phone, email... It's not being collected to sell. But it is being collected.

      Do you think the big fruit don't use safari data to analise what users do? Or draw metrics based on iCloud usage? Of course they do.

      If you use iCloud or Dropbox or webmail a company is holding your data.

      I don't use Siri because it's slower than just using the phone without it, and dictation's pretty poor compared to dragon, but I'd not, not use the service because of this.

      If I use google services who exist to sell your data, I'd use apple's.

      If you use hotmail,gmail or yahoo and don't delete your account they have your emails and contacts forever... Who cares?

      This isn't about selling of data, just retaining it. If you don't trust a company with your data there's a lot you should stop using before voice controls. How interesting do you think it is to listen to you saying in slower and slower tones "call Tim" while Siri doesn't get it?

      Called a call center? Heard the message all calls are recorded?
      MarknWill
      • Did you read the Wired article?

        It is NOT anonymous. Apple keeps your info associated to the voice data for SIX MONTHS.
        I guess you don't care about privacy issues.
        Gisabun
        • Huge assumption

          Yes, Apple does keep your data for six months and it is obscured for privacy. So to claim that the parent doesn't care about privacy is a huge leap.
          JScottA44
  • The intentional omission here...

    ...is the word anonymous.
    Englishmole
  • Some advice

    Next time, put Google in the subject line, as well. I am quite tired of bloggers seemingly targeting Apple first over this kind of thing.

    Oh, and the corporate world places all kinds of limits on all kinds of data all the time. The snippet about Siri being kept off the IBM network is not, at all, news.
    chrisanderson1973
    • I don't believe this is a 'targeted' attack on Apple

      The article seems to be to be right on target. Apple is storing voice recordings for two years...as they themselves stated. However, since Apple does not sell your information, I do not see a huge problem with it. And after six months is it further obscured.

      Contrast Apple's policies with Google or the carriers and it is pretty obvious that Apple comes out much better. However, this article is not about comparisons. As best as I can tell, it is just relating discovered information.
      JScottA44
  • Once again, Apple tries to up the antee on Google...

    .... To see who doesn't give a sh?t more about users' privacies.
    Did we give our approval for them to store our voice for 2 years? Probably not.
    Why does it take Apple 6 months to disassociate your contact info from the voice data.
    If Apple needs it for "sampling", why so much?
    Gisabun
    • Apple Holds My Voice Data, Although

      Why should I care? Apple most likely uses this data to either; help improve their services, or to keep back up of user data for someone wanting to use it in places such as the court of law. It is basically the same thing phone services do. At least I feel more safe with Apple holding my data than some other companies, and before you assume Google, I do, in fact, have quite some faith in Google holding my data, just not all of it.
      Mega Pony
  • Wow 2 years worth of

    "Siri tell me a poop joke"
    fer.paredesb@...
  • Why does this matter

    With Congress about to pass CRISPA this whole discussion will be moot. Be very scared...
    eye4bear
  • Voice Samples

    Traditional Text to Speech implementations involve the constant sampling of speech patterns. Over time, the collection of samples may be reduced to individual words spoken in a way that makes them versatile in structuring new phrases from those previously recorded samples.

    Collecting and storing voice data over that period of time should produce enough data to form a new digital voice. Your random number need not identify you, so long as it keeps your samples associated to each other.

    Legally, just how much can they do with our voice data without consent? Can they take our voice and freely make a new computer voice from it, without compensation, without consent?

    What if they did identify the sample source? For example, Obama, who seems to be a fellow consumerist? It's not that we don't have tons of speeches with his voice. It's that through Siri, etc. that we get higher fidelity recordings which provide a wider dynamic range, thus a more realistic voice once digitized. (Another reason why 32/64 bit Text to Speech sounds more realistic than old 16 bit TTS did.) Regardless of their intent, we may very well be looking at a risk to national security from this alone.
    ct2193@...