Beyond Cortana: What artificial intelligence means for the future of Microsoft

Beyond Cortana: What artificial intelligence means for the future of Microsoft

Summary: Microsoft Research’s hyperscale computing artificial intelligences are about to change the way we think about computing.

Cortana in action. Image: Microsoft

While Satya Nadella might claim to not know how Microsoft's new Skype translator technology works, the AI research teams at Microsoft Research have a pretty good idea: the neural nets they use have come close to showing that the deep grammar theories of linguists like Noam Chomsky are the basis of how we communicate. But that's only part of the story of how MSR's AI research is powering the new Microsoft.

At the recent Future in Review conference, the head of Microsoft Research (MSR), Peter Lee, discussed the future of personal assistant technologies and the current state of practical artificial intelligence. His thoughts provide some interesting insight into the importance of AI to Microsoft, and why Bing is such a core technology for its future products.

The AI work is the basis of the contextual ambient intelligence that's at the heart of Nadella's mobile and cloud vision for Microsoft — and it's through tools like the Skype translator and Windows Phone's Cortana we're seeing just how MSR's research AIs are becoming products.

Lee's vision is one of AI helping humans, so it's perhaps not surprising that the face of Microsoft's AI is Cortana, a virtual personal assistant. It's not the endpoint, it's as Lee says: "[A] part of the evolution of AI, showing what it can be."

That's why MSR is deeply involved in the development of what's at heart a consumer product. While much of MSR's work is blue sky, looking at the future of computing, it's also part of many of Microsoft's cloud scale projects, such as Bing and Cortana.

Cortana is, at heart, a user experience for an artificial intelligence, as Lee says. "What the user sees is a UI that's intended to be like a personal assistant, built using the basic building blocks for natural interaction," he says.

It's the AI behind it that's handling much of what it does — or rather, a series of different neural networks and rules engines that are the closest to AI we have today. They're designed to make inferences from your personal data, to be prescient (not creepy). It's here that Nadella's ambient intelligence comes in to play, As Lee asks: "Can we mine all the data we have access to, and then extract intelligence?"

Cortana just one of many user experiences for this type of AI research, as Lee's researchers work on. "Knocking down 20 to 30 year old problems in AI, understanding intent," he says. The aim is to produce systems that can see, hear and understand, by focusing on correlations in large amounts of data.

Scale is important here, as the more data the better, but it's essential to be careful that the resulting rules and neural nets don't lead to incorrect correlations. For one thing, it's easy for neural nets to fall into the fallacy of correlation implying causation. We know that people on the streets with umbrellas don't cause rain, but it's hard for a neural net to make that distinction.

A big part of MSR's AI research is around understanding what causes what, the process that AI scientists call causal inference. It's a complex problem, and it's why despite its conversational Chit-Chat rules engine, understanding things about situations — situational inference — is still beyond Cortana. Though while commercialised AI may not be there yet, research projects are getting close. In MSR's Redmond offices, Building 99, there's a robot receptionist that aims to understand when two people are talking to each other.

Lee told his audience about another live experimental AI in Building 99. As you walk to the elevators, the doors open before you get there. The lifts are controlled by yet another neural net, this one trained on the behaviour of people in the hall near the lifts. Here cameras watched people in the building's atrium for months.

The neural net correlated the behaviours of people going to the elevator, a problem complicated by the fact that the elevators were directly on the route to the building's restaurant and that the open space of the atrium was used for impromptu meetings. Over the months of watching, the neural net learnt to understand the intent of people in the atrium, building a model of their actions and comparing the model to their actual behaviour. Once the system had enough confidence in its intent model, it automatically switched over to controlling the lift.

What had evolved in the AI's neural net was something new, something that couldn't be developed manually: there were too many variables, too many paths individuals could take through the atrium. The system had needed to develop positional awareness, and phase out undesired stimulus.

That's part of what's interesting about the neural nets that power Cortana's speech recognition, and the Bing translation tools — and now the Skype real-time translator. They're things we don't really understand, but that do exhibit behaviours that tell us something new about the way the world works.

Speech recognition and real-time transcription have been lab-grade technologies for some time. They work well in limited circumstances, but need engineering before they're released in the outside world — especially considering the wide range of contexts and environments they work in.

When Satya Nadella talks about not understanding just how the translation neural nets work, he's being accurate — but only in a limited way. MSR has been investigating what it calls "transfer learning" for some time now. With hyperscale systems and lots of English sources it's possible to get past what was known as the "over-fitting problem" where too much data made neural nets unreliable.

With MSR's current generation of neural nets you can just keep training with more data, and the results get better. Where things get interesting, is when the same neural net is trained with Chinese as well: not only does it learn Chinese, but its English performance improves. Training the same net with French means it learns French faster, and both English and Chinese recognition get better.

This where transfer learning comes in: the neural net for one language makes it easier to generate the net for another. The effect isn't just being found in MSR's AIs; other researchers are seeing the same thing. Lee notes that this consistent effect is a matter of the net's lower layers "discovering the structures of human language". It's a fascinating set of discoveries, and as Lee says: "This could have big implications for understanding human discourse. I can't overstate the excitement in the field."

Has AI unraveled one of the longest running debates in modern linguistics? In the 1950s, when Noam Chomsky suggested that all languages were based on common deep structures it sparked a series of arguments that have run for much of the last half-century. But now with neural nets like those that power Skype's translator, we're starting to see a deep statistical linkage between related terms, terms that might only be linked by gender relationships.

The more data we have, the better those translations get. We're seeing that with Bing's automated translations of Twitter. As Lee says, some things, like humour, are hard. But the result is a treasure trove of data that's making our phones better assistants, and that's starting to break down the barriers of language.

This then is the value of MSR's blue-sky, large-scale research, and of the massive collection of data at the heart of Bing. It might take years to get results, but when it does it can do some rather world-changing things with the devices on our desks, and in our pockets. It's the engine that drives Nadella's ambient intelligence, and the future of Microsoft.

Further reading

Topics: Enterprise Software, Microsoft

Simon Bisson

About Simon Bisson

Simon Bisson is a freelance technology journalist. He specialises in architecture and enterprise IT. He ran one of the UK's first national ISPs and moved to writing around the time of the collapse of the first dotcom boom. He still writes code.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • First they have to catch up to IBM.

    And that will take quite a bit more work.
    • "What artificial intelligence means for the future of Microsoft"


      Sorry, I couldn't resist with a headline like that......

    • IB Who?

      Are they even still in business?

      I cant remember the last time I saw anything about something coming out of IBM r&d - not since some stuff about holographic storage.

      MS is a leader in the AI stuff, and the article even mentions that Cortana is just the start. With the size of the r&d budget (over 9 billion a year) its actually no surprise MS has so much cool stuff in the pipeline.
      • Re: MS is a leader in the AI stuff

        Citation please!

        What has Microsoft lead with in AI?

        I know IBM, Google and others have been very active in supporting events and competitions to further AI, but what has Microsoft done besides tag along?

        You have to admit that a driver-less car is quite a leap over where Microsoft is right now.
        • look under machine learning

          just off the top of my head, a random assortment:
          Xbox 360 and One speech recognition and 3d gesture recognition
          the Bing Satori entity platform
          Bing translation
          the forthcoming voice translation
          ClearFlow traffic prediction
          WindFlow wind speed prediction
          Power BI natural language Q&A
          etc etc etc
          • Since all of that is under a restricted license...

            It isn't very easy to verify the reports...

            Science requires verification... otherwise it could just be faked.
          • Yes it does.

            That's why people publish paper on their research. No one has ever been required to essentially give you the final product to verify the methods you are using. I regularly publish work in optimization and hpc stuff, but I don't give away my source code for you to evaluate. If you want to reproduce it, then put in the work. I'm happy to help answer questions and such, but no you can't have the actual code.
          • Callous Fool you are

            The evidence is in the usage, of course. Give you are a self-confessed loather of MS tech, you have only shown your own wilful blindness to this issue.
          • Calling any of that AI is a bit of a stretch

            those are largely media parsing systems, except for the semantic natural language technologies.

            The virtual assistant technologies (Siri, Cortana, Now, etc.) are basically giant if-else decision trees combined with audio vectorization. Don't get me wrong, they work really well and are getting better - but these are commercial products, and they don't need AI, a technology which is still in its experimental phase. None will pass a Turing test, nor do they attempt it.

            AI - at least interesting AI - is the stuff the robot competitions are doing, the stuff IBM is doing with agents. You won't find it in commercial vendors' products because it just isn't reliable enough to be viable there.
          • Hahaha

            If that's what you think that is, then you have no idea how these systems work. The core of Cortana is neural networks.
      • IBM built Watson


        IBM built Watson, the computer who played on the game show "Jeopardy" in the US. and did fairly well againt the brainiacs. This is their definition;

        "Watson is a cognitive technology that processes information more like a human than a computer—by understanding natural language, generating hypotheses based on evidence and learning as it goes."

        IBM is still the go to company for super computing.

        Ask Cortana or Siri to look it up for you.
  • I love my privacy

    User's data means great business opportunities. So digital assistants mean business, but are they more good than bad for users?.

    The lure of social networks was a first step. They know our lives, opinions, photographs, likes, relationships, etc.

    The lure of the cloud came next, so they also have our personal files.

    The next step is about personal assistants and their infinite advantages (like Siri, Google Now, Cortana and others). We have to give away more personal information and they have to know where we are and what we do 24 hours a day.

    The lure of wareables is also here. They measure and store our health state, biometric parameters, sport habits and achievements, sleeping ours and so on.

    There is also another step forward: the internet of things. It is not only we who are connected to the internet through computers and mobile devices, but our posessions too: door locks, fridgers, thermostats, etc.

    Most of the technologies I have mentioned are quite useful, but have we asked ourselves their implications in our lives? Do we ask ourselves what we are paying free social networks, clouds and digital assistants with?
    • They also have our personal files

      ...they had your files the day you installed an operating system on your computer and connected the computer to the Internet, even if only from time to time. At least, they had a gateway they could exploit anytime they chose to. Cloud makes it more convenient at best.

      You can try to limit the information you provide (for instance, not post certain things on Facebook), but it's a lot more complicated than the big categories.
    • Cortana personal info stays on the phone

      interests and results sync to your other devices, but the personal information - home, quiet hours, inner circle, flights etc - are kept on the device, for privacy reasons. you can have the aggregate information without the personal information if you don't need to target ads back to the user...
      • No it doesn't.

        If it did, then it would be unable to understand the questions...
        • Obviously bits and pieces have to be either sent

          to be processed or somehow processed onboard. But Microsoft does seem to have come up with a good balance between what they store on you and what you allow to be stored locally. Nothing is perfect, but at least they allow the user some control.
  • AI translation

    The first company to be able to produce a voice controlled computer that actually does everything the PC does will rule the next generation of electronics.
    • I though Kurzweil already did that...

      and some years ago at that.
  • The virtual assistant is already a major feature in two major platforms

    Cortana is a late entry to the virtual assistant market on the third place phone ecosystem, and unlike the cool Halo based character it comes from, the virtual assistant market is a poor place to look for Artificial Intelligence, as this industry uses rather primitive AI techniques... semantic search and statefulness is about as sophisticated as it gets.

    Anyone interested in real AI should follow the real events in the industry such as the various challenge prizes, and the work being done in the graphics trades, such as image recognition and seam cutting.
    • I doubt the research is as limited as your vision

      for it seems to be. Cortana is/will be an early manifestation. But all this work is not being done for a phone assistant app alone.

      But. that said, even this "late entry" on the "third place phone ecosystem" appears to have taught Apple a thing or two given the new talents Siri is supposed to acquire this fall that my WIndows phone could do even before Cortana arrived.