ie8 fix

Between the Lines

Larry Dignan, Andrew Nusca and Rachel King

How Apple's Siri really works

By | November 3, 2011, 4:00am PDT

Summary: How does Apple’s Siri really work? A SmartPlanet article lays out how voice recognition on a smartphone really works, step by step.

Apple’s Siri is sassy, clever and occasionally useful.

But how the hell does it really work?

“Voice recognition” is what Siri does, but those words alone don’t reveal how the system actually gets your words right when you say, “Send message to Jason Perlow: Go get a shave, Linux Lover.”

But a lengthy feature article over at our sister site SmartPlanet has the dirt, step by step:

The sounds of your speech were immediately encoded into a compact digital form that preserves its information.

The signal from your connected phone was relayed wirelessly through a nearby cell tower and through a series of land lines back to your Internet Service Provider where it then communicated with a server in the cloud, loaded with a series of models honed to comprehend language.

Simultaneously, your speech was evaluated locally, on your device. A recognizer installed on your phone communicates with that server in the cloud to gauge whether the command can be best handled locally — such as if you had asked it to play a song on your phone — or if it must connect to the network for further assistance. (If the local recognizer deems its model sufficient to process your speech, it tells the server in the cloud that it is no longer needed: “Thanks very much, we’re OK here.”)

The server compares your speech against a statistical model to estimate, based on the sounds you spoke and the order in which you spoke them, what letters might constitute it. (At the same time, the local recognizer compares your speech to an abridged version of that statistical model.) For both, the highest-probability estimates get the go-ahead.

Based on these opinions, your speech — now understood as a series of vowels and consonants — is then run through a language model, which estimates the words that your speech is comprised of. Given a sufficient level of confidence, the computer then creates a candidate list of interpretations for what the sequence of words in your speech might mean.

If there is enough confidence in this result, and there is — the computer determines that your intent is to send an SMS, Erica Olssen is your addressee (and therefore her contact information should be pulled from your phone’s contact list) and the rest is your actual note to her — your text message magically appears on screen, no hands necessary. If your speech is too ambiguous at any point during the process, the computers will defer to you, the user: did you mean Erica Olssen, or Erica Schmidt?

There’s a whole lot more to learn in the article, including a history of research around the technology and exploration into what Google, Microsoft and others want to do with it. (What are you waiting for? Go read it.)

Voice recognition has been around in some form for years, but it’s pretty neat to see exactly what happens when you press that button.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Andrew J. Nusca is associate editor of ZDNet and editor of SmartPlanet.

Disclosure

Andrew Nusca

Andrew J. Nusca does not hold any investments in the technology companies he covers.

Biography

Andrew Nusca

Editor

Andrew J. Nusca is an associate editor at ZDNet and editor of SmartPlanet. As a journalist based in New York City, he has written for Popular Mechanics and Men's Vogue and his byline has appeared in New York magazine, The Huffington Post, New York Daily News, Editor & Publisher, New York Press and many others. He also writes The Editorialiste, a media criticism blog.

He is a New York University graduate and former news editor and columnist of the Washington Square News. He is a graduate of the Columbia University Graduate School of Journalism. He has been named "Howard Kurtz, Jr." by film critic John Lichman despite having no relation to him. He lives in his native Philadelphia with his wife, cat and Boston Terrier.

Follow him on Twitter.

30
Comments

Join the conversation!

Just In

RE: How Apple's Siri really works
non-biased 10th Nov
@toddybottom Name first versions of Apple products that were very bad. Don't think you can honestly name more than maybe a couple but most certainly not most.
0 Votes
+ -
Imitating Kinect
Tim Acheson 3rd Nov
So, it's true!

Apple's "Siri" is in fact just copying one of the features of Kinect, as launched way back in 2010!
Google's Rubin was especially clear about that.

However, the same Rubin said that original, genuine Android UI is the best thing -- only to "change mind" about it after Apple announced iPhone.

Then Rubid (Schmidt, Page, etc) decided to commit an IP theft, throw away "perfect" genuine Android UI and use Apple's finger-based UI as core principle.

Having this story, we can conclude that Rubin obviously lied and Google will offer Android version with the build-in AI assistant (not just "Voice Actions", as Android has now) sooner than later.
And it is certainly not an AI assistant.
0 Votes
+ -
@DeRSSS
Then Siri does not count. Apple is licensing voice recognition from Nuance so Apple is out of the picture here.

Care to reconsider your stance on licensed technologies?
@Tim Acheson
As you've pointed out, the Kinect blows the doors off of anything that apple has ever created, but truly gets no credit from the masses. They simply use it and take it for granted. My Android phone blew me away when I started using it's voice command feature, and yet apple is in the process of convincing the world that it started the revolution. This is what apple has always done... No revolution here.
0 Votes
+ -
@Steve@... Google voice is crap. That comes from many people o use android devices. Now your just making yourself out to be a fanboi.
0 Votes
+ -
@Steve@...

Sorry but the Android voice capability is on par with what Windows Mobile 6.x series offered four years ago.

I like Android, but lets get real.
0 Votes
+ -
@Steve@... I am a big fan of WebOS, which outdoes Android in many respects, but the voice feature on my Sprint phone is outstanding, and will do all of the things that apple is bragging about. PERIOD... So don't contradict what I've said, unless you have a Sprint Android phone. If you have apples latest piece of copyware, you haven't got a clue what mine will do.
0 Votes
+ -
RE: How Apple's Siri really works
non-biased 10th Nov
@Steve@... So to turn it around on you unless you have the 4S you don't know crap about what it can do. See how that works.
0 Votes
+ -
RE: How Apple's Siri really works
samzbest@... 3rd Nov
Interesting and based on the funny but smart AI responses it gives I think Apple and AI agenst have a the future
http://thetechnologycafe.com/siri-and-its-jokesmore-****-and-funny-stuff-siri-says/
0 Votes
+ -
RE: How Apple's Siri really works
Gabriel Hernandez 3rd Nov
@samzbest@... I believe SRI natural language processing (from SIRI) system is as good as IBM's Watson, too bad both Apple and IBM depend on a third technology vendor called Nuance to covert speech to text, if this could be developed internally by Apple or IBM, you would reduce network bandwith, since you would only do one transmission, and all processing should be done in one place. This could improve the speed of the response.
0 Votes
+ -
RE: How Apple's Siri really works
cowboys2000 4th Nov
@Gabriel Hernandez

Perhaps it is done this way to save the device battery as well?
0 Votes
+ -
RE: How Apple's Siri really works
tonymcs@... 3rd Nov
@samzbest@...

Wow. Apple finally managed to copy ELIZA wink
0 Votes
+ -
RE: How Apple's Siri really works
Peter Perry 3rd Nov
This should be titled how Siri doesn't work!

I have set appointments, found directions and asked humorous questions successfully with the App but I have only had very limited success sending text messages or calling friends with it!

Here's an idea you bozos developing the product should really try (I believe V Lingo works tis way)...

Limit your dictionary to the address book when somebody says any of the following...

Text
Send message
Call

The immediate word / words should at least weigh the address book more heavily than the normal dictionary!

When I say text Ann, it should not search the damn dictionary and return an or and!

V Lingo gets my wife's name first time, every time but siri struggles with the most basic of names!
0 Votes
+ -
@Peter Perry It learns names as you go, maybe you are not correcting it but just abandoning a task if it was incorrect? If you have a particularly difficult name, you can also speak it into a phonetic pronunciation field on the contact.
@teetee1970

I send texts, call people, make appointements, change appointements, make notes, create reminders, set alarms with no issue.

The best part is I had to learn very little to interact with it. There are dozens of ways you can interact with Siri to get similar responses.
0 Votes
+ -
@teetee1970

He tried it on his wife's phone but Siri knew exactly who he was and toyed with him just for kicks.
0 Votes
+ -
@Peter Perry None believe you own an iPhone. Let me tell you this me and my wife have the 4s and my puerto rican accent is deep and my wife's Filipino accent is strong as well. We have no problems send text message at all.

Your post sounds like your regular post history. Spreading FUD about apple because you don't like them. I have gotten a little to used to Siri now to do my appointments send texts and do web searches for me.
0 Votes
+ -
@Peter Perry

Then you've accomplished something. Sending SMS is one of its easiest functions, very difficult to screw up. Good work.
@Peter Perry.. to be frank.. i think you're fibbing and have never used Siri..
@Peter Perry
I will admit to not owning an iPhone 4S. I only have an iPhone 4. I will not be getting an iPhone 4S because here is my experience with Siri:
I know 3 people who bought an iPhone 4S and when I ask them how they like Siri, their response is that they love it. They then proceed to show me how Siri can tell them what the meaning of life is, what Siri thinks about opening the pod bay doors, and that it can call their wife without actually having to say their wife's name.

I think Siri has promise, don't get me wrong, but it is not currently living up to that promise. It might get there by version 3. If you look at most Apple products, their first few versions are very crippled and very bad. Apple typically does eventually get there though. I have confidence that Siri will be useful one day. That day just isn't today.
0 Votes
+ -
@toddybottom

You only gave examples of Siri working well and no examples of what was wrong. I think you forgot the paragraph on why it failed when those people tried to do useful things.
0 Votes
+ -
RE: How Apple's Siri really works
non-biased 10th Nov
@toddybottom Name first versions of Apple products that were very bad. Don't think you can honestly name more than maybe a couple but most certainly not most.
0 Votes
+ -
RE: How Apple's Siri really works
non-biased 10th Nov
@Peter Perry You being the ultimate Fandroid Siri was so good it knew to mess up on anything you tried to do so you couple post about it. I am impressed Siri will go out of it's way even to make itself look bad just to make the user happy happy
0 Votes
+ -
That's how voice recogognition works..
doctorSpoc Updated - 3rd Nov
Which is only one aspect of how Siri and every other voice recognition works.. Even the one that predated Siri. This is just how input gets into Siri ..Siri also uses AI (none of which is explained here) to try to understand meaning rather than simply taking ther presence of predefined words in a predefined order and assigning a command to it.. Natural language is WAY more complex than simple voice recognition explained here.. It would take many thick text books to explain the technique use for this sort of AI.. Siri doesn't require you to use predefined words or order, understands slang, understands context from one request to another.. This is the sort of stuff that distinguishes Siri and none of this is explained here..

This article is the equivalent of writing an article name "This is how your car really works" and writing.. You put gas in it and it goes.. You haven't explained anything here...
0 Votes
+ -
"what letters might constitute it."

Phonemes, not letters.
0 Votes
+ -
It's speech recognition...
GrizzledGeezer 3rd Nov
...not voice recognition.

There is a difference.
0 Votes
+ -
RE: How Apple's Siri really works
clokverkorange 3rd Nov
The fact that this all happens in seconds is possibly the most fascinating part of the entire procedure.

At what point can we say that the AI (which is what Siri could arguably be called) is actually thinking?
0 Votes
+ -
RE: How Apple's Siri really works
cowboys2000 4th Nov
What about Wildfire that Pac Bell Wireless had back in 1999? It was a voice response digital assistant? This isn't "new". It may be better, but it isn't new.
0 Votes
+ -
Background
Bookworm2000 7th Nov
We do ignore the things that may be listened to (in this case literally) or picked by third or fourth parties or many others here - we are not talking secure lines and servers here, are we?
I am an IT addict, as in competent user - but that seems a bit OTT - call me paranoid or a conspiracy-theorist but this is getting as close as it can get bar some guy actually sitting on your lap.

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix