Exclusive: Why Microsoft is betting its future on AI
Inside Satya Nadella’s plan to outsmart Google114
Satya Nadella bounded into the conference room, eager to talk about intelligence. I was at Microsoft’s headquarters in Redmond, WA, and the company’s CEO was touting the company's progress in building more intelligent apps and services. Each morning, he told me, he puts on a HoloLens, which enables him to look at a virtual, interactive calendar projected on a wall of his house. Nadella appeared giddy as he described it. The system was intelligent, productive, and futuristic: everything he hopes Microsoft will be under his leadership.
No matter where we work in the future, Nadella says, Microsoft will have a place in it. The company’s "conversation as a platform" offering, which it unveiled in March, represents a bet that chat-based interfaces will overtake apps as our primary way of using the internet: for finding information, for shopping, and for accessing a range of services. And apps will become smarter thanks to "cognitive APIs," made available by Microsoft, that let them understand faces, emotions, and other information contained in photos and videos.
Microsoft argues that it has the best "brain," built on nearly two decades of advancements in machine learning and natural language processing, for delivering a future powered by artificial intelligence. It has a head start in building bots that resonate with users emotionally, thanks to an early experiment in China. And among the giants, Microsoft was first to release a true platform for text-based chat interfaces — a point of pride at a company that was mostly sidelined during the rise of smartphones.
In January, The Verge described the tech industry's search for the killer bot. In the months that followed, companies big and small have accelerated their development efforts. Facebook opened up a bot development platform of its own, running on its popular Messenger chat app. Google announced a new intelligent assistant running inside Allo, a forthcoming messenger app, and Home, its Amazon Echo competitor. Meanwhile the Echo, whose voice-based inputs have captivated developers, is reportedly in 3 million homes, and has added 1,200 "skills" through its API.
Microsoft is proud of its work on AI, and eager to convey the sense that this time around, it's poised to win. In June, it invited me to its campus to interview some of Nadella's top lieutenants, who are building AI into every corner of the company's business. Over the next two days, Microsoft showed me a wide range of applications for its advancements in natural language processing and machine learning.
The company, as ever, talks a big game. Microsoft's historical instincts about where technology is going have been spot-on. But the company has a record of dropping the ball when it comes to acting on that instinct. It saw the promise in smartphones and tablets, for example, long before its peers. But Apple and Google beat Microsoft anyway. The question looming over the company's efforts around AI is simple:
Why should it it be different this time?
Microsoft has already had more success building bots than perhaps any other US company. But you probably aren’t aware of it, because its success started in China.
In January 2016, one of Microsoft's artificial intelligence creations appeared on the Chinese morning news show Dragon TV when the newscaster cut away to its weather forecaster, Xiaoice. Pronounced "SHAO-ICE," it’s a bot whose name is Chinese for "little Bing." That's Bing as in Microsoft's perennial also-ran search engine. But this version of Bing is way more talkative.
The camera cut to an animated circle hovering in front of a virtual podium. The face transformed into an image of a microphone, and in a soft female voice, Xiaoice shared her forecast, even answering a question from the anchor.
If you want to know why Microsoft has become so bullish on bots, Xiaoice is a big part of the answer. "I’m not going to go so far as to say we’ve found the killer bot — but we’ve found a bot that works in a new way that fulfills many of the promises of conversation," says Derrick Connell, head of search engineering at Bing.
Xiaoice, which Microsoft introduced on the Chinese messaging app WeChat in 2014, can answer simple questions, just like Microsoft's virtual assistant Cortana. Where Xiaoice excels, though, is in conversation. The bot is programmed to be sensitive to emotions, and to remember your previous chats. Going through a breakup? Xiaoice may check in to ask you how you're doing.
After it was available for three days, Xiaoice had been added to 1.5 million conversations on the Chinese mega-messenger app WeChat. It was later made available on the Chinese micro-blogging service Weibo, where it became one of the most popular celebrity accounts to follow. Today the bot has been used by more than 40 million people, and the average conversation takes an impressive 26 turns between speaker and bot.
For Connell, Xiaoice points the way toward the next generation of search. Web queries traditionally returned a page with 10 blue hyperlinked results; the perfect conversational bot will simply return the correct answer.
Of course, success in China may not translate to the United States. (Microsoft’s first English-language bot experiment, Tay, was a fiasco.) Two years after Xiaoice's debut, there's still no English-language equivalent, and none is imminent. But Microsoft executives say the infrastructure behind Xiaoice represents a significant opportunity for the company.
"We want it to be an ecosystem."
"It's the modern era — you don't have to be an expert in speech and language understanding," Connell says. "Just use our tools. Go build your branded bot with our tools and put it on whatever canvas — it might be Slack, it might be Facebook Messenger. We hope it might be Skype or Windows. But you choose."
And with fears mounting among developers that a war could emerge over bot standards, Microsoft has been uncharacteristically diplomatic. It organized a conference in San Francisco in June to promote cooperation among bot-makers. "We're really interested in it being interoperable — we want it to be an ecosystem," says Lili Cheng, a senior engineer at Microsoft who helped organize the two-day event. (It was called Botness.) "It's more like, what are the problems and challenges that we are finding that we can work on together?"
But by taking the lead with events like Botness, Microsoft hopes to position itself at the center of the shift to bots. If the company succeeds, it will have a fresh start in the mobile era. Bots powered by the company's technology could show up inside each of the world's most popular messaging apps, giving Microsoft a lucrative foothold in the new world.
Of course, Microsoft isn't alone in trying to build the defining platform for the next generation of computing — if conversation even turns out to be that platform. Every major tech company and a host of startups are building AI divisions, often with impressive results. But here it's worth saying that comparing AI across companies is difficult to the point of being impossible. Much of what companies like Google, Facebook, and Amazon are working on remains unreleased. And executives are often opaque when asked what distinguishes their AI — Google CEO Sundar Pichai, for example, has taken to simply saying that the company has been working on it "for a very long time."
Benedict Evans, the resident futurist at venture capital firm Andreessen Horowitz, said in a recent blog post that the future of AI remains opaque. "This field is moving so fast that it's not easy to say where the strongest leads necessarily are, nor to work out which things will be commodities and which will be strong points of difference," he wrote. "Though most of the primary computer science around these techniques is being published and open-sourced, the implementation is not trivial — these techniques are not necessarily commodities, yet."
Qi Lu is happy to make the case for Microsoft’s competitive advantage. Lu is one of the dozen people on Nadella’s senior leadership team, overseeing the company’s applications and service groups. He’s also a computer science PhD with 20 patents to his name, and is revered among the colleagues of his I speak to. After a few minutes, I start to understand why — he’s ferociously intelligent, tapping his feet impatiently as he talks, as if frustrated he can’t speak as quickly as he thinks. When we meet he is wearing socks with sandals, cargo shorts, and a T-shirt emblazoned with three words: "Make epic shit."
Lu begins by running down the disadvantages presented by the first wave of the mobile internet. The percentage of web traffic from mobile devices has never exceeded desktop traffic he says, reflecting users’ frustration with the experience. "We know web doesn’t really work on the phone," Lu says. And outside a handful of major categories, users are resistant to downloading apps. Seattle residents might be asked to download an app just to check the fare of a ferry they take a couple times a year — surely there’s a better model. "Our industry hasn’t found an experience platform that can unleash the entire value of mobile and the cloud," Lu says. "Apps, fundamentally, are not the right model."
Apps arose as an interface in lieu of the HTML-based web because they were the best we could do at the time. You couldn’t just yell what you wanted from the internet into your phone, so developers built sophisticated hidden plumbing and let you interact with it via big graphical buttons. And buttons remain the most efficient path for getting lots of things done. But thanks to advancements in natural language processing, now you actually can just yell what you want from the internet into your phone. Lu says the next-generation "experience platform" is going to start there, with conversation. It fits more naturally with how humans behave anyway. And if you get it right, you can always start layering those big shiny buttons back on later. "We see a full spectrum of using language as the baseline, but using graphical interactions in a thoughtful, meaningful way, to elevate the experience," he says.
Piece it all together and you can see why Microsoft is feeling so optimistic
But to win, Lu says, a company needs five "key assets." The first is a "conversation canvas" — a place where people are doing lots of talking and texting. Microsoft has Office, Outlook, Skype, and Cortana. The second is that AI "brain" — a sophisticated mental model of the world. Microsoft says its own AI efforts date back nearly 20 years. The third is access to a social graph — people’s activity on the internet often involves their friends and coworkers. Not coincidentally, a few days after I met Lu, Microsoft announced it would spend $26.2 billion to acquire LinkedIn, and its 433 million registered users.
The fourth piece is a platform for the artificial intelligence to operate on. Microsoft has Windows and a family of devices, notably the Xbox. The final piece is a network of developers eager to build on your platform, and to pay you for the privilege. Stoking that interest had been the primary goal of the Microsoft Build developer conference in March.
Individually, Microsoft’s assets have strong rivals. Facebook arguably has a stronger conversational canvas with its family of messaging apps, for example; certainly it has the largest social graph. Google’s "brain" might be smarter, and it has broad access to hundreds of millions of Android devices. But piece it all together and you can see why Microsoft is feeling so optimistic. "Adding all those assets," Lu says, "I believe we have what it takes to lead the future."
Microsoft’s total embrace of AI became apparent two years ago at the inaugural Code Conference. (The conference was acquired the next year by Vox Media, which owns The Verge. ) Nadella, who had become CEO just three months before, appeared on stage to discuss Microsoft's future. At the end of his talk, he demonstrated a new feature inside Skype. Two Microsoft employees spoke on stage — one in English, the other in German — and Skype translated their speech in real time, allowing them to communicate despite the language barrier. It was an impressive demo — and Nadella announced that by the end of the year it would be a working product.
To the Skype team back in Redmond, Nadella's timeline landed like a bombshell. "It was a complete surprise to me," says Peter Lee, a corporate vice president at Microsoft Research. "Satya really put us in jail with this Skype Translator thing." Initially, the team had two major concerns. One was that Microsoft Research historically has not been tasked with bringing products to market, and researchers worried they would suddenly have less freedom to pursue scientific breakthroughs.
"Satya really put us in jail with this Skype Translator thing."
The other concern was that at the time of the demo, Skype Translator wasn't very good. The company's language models had been built using a large body of formal speeches — testimony from the United Nations, for example. But two-way communication of the sort that Skype needs to translate is much different. There are more "disfluencies" — moments when the speaker trips over a word, or backs up to start a sentence over again. There's "code mixing" — when speakers use multiple languages in a single sentence, which is very common outside of English. Then there was the singing — apparently people are constantly singing to each other, and it turns out that computers have a very difficult time parsing it.
"Basically, nothing worked," Lee says. "What we had to do is re-train all our models." But Lee's team rallied, cheered on by Nadella, and released a preview that December. The product became widely (and freely) available the next year. Lee, who approvingly calls Nadella "an activist," says the project was exhilarating — eventually. "Imagine the dips in morale and fear when you realize none of this stuff is gonna work — you have to somehow get people past that," Lee says. "And when you do, you see amazing new things appearing."
This doesn't feel like hyperbole. Microsoft can now translate conversations between eight different languages — 56 different combinations. And the underlying technology has implications that go beyond translation. You want to hear about a bot that's incredibly, even magically useful? Microsoft is beta-testing software that records business meetings and produces transcripts in real time. The same software can also, say, take an audio recording of an interview between two people and produce a transcript that distinguishes between the speakers — perhaps the single most desired piece of technology for any journalist who ever lived.
"I can’t tell you how dismaying it was when we found out stuff wasn’t working well for Skype Translator when we first embarked on that," Lee says. "But now that we’re climbing that mountain, we’re in possession of these speech and translation models, especially the speech models — they’re shockingly good."
In the meantime, Microsoft is pouring AI resources into some of its biggest franchises: Windows and Office. One of the promises of AI is that it can anticipate your needs — it’s the foundational idea of Google Now, which presents you with traffic, weather, and sports scores the moment you unlock your phone.
Microsoft is working to build this kind of AI into the desktop. Marcus Ash, who oversees the development of Cortana, showed me a mocked-up version of Windows that draws heavily on cloud-based inferences about what I might want to know. When Ash accesses the Start menu, Cortana appears with a series of suggested actions: names that are meaningful to you, documents you’ve used recently, and suggested translations for common French words. (The user here has an upcoming trip.) With your permission, Cortana incorporates data about your contacts, web search history, and app usage into its recommendations.
And it changes based on the time of day — app developers can signal that they’re useful in the morning, or around dinner time, for example. "This idea of using conversation, using contextual information about you, with your permission, to make you speedier and make you feel like you’re in control, that’s the stuff we get really excited about," Ash says. "A lot of our user experience work is around simplification, removing friction, and really showing the power of intelligence."
One of Ash’s favorite examples is called "commitments." With your permission, Outlook can take note of the fact that your boss asked you to send her something by the end of the week — and automatically remind you if you fail to respond. "My life is pretty complicated, and I tend to forget things — especially in emails," Ash says. Recently he forgot to respond to a request from his own boss, he says, but Cortana notified in time for him to address it.
I see more of this kind of thing when I meet with Kirk Koenigsbauer, corporate vice president of marketing for Office. He shows me a range of ways where intelligence is making Office easier to use. In September 2014 Microsoft introduced Delve, a kind of Fitbit for productivity that is included with Office 365. The app analyzes how much time you spend in email and in meetings, and highlights times on your calendar where you have extended periods of time to do more complicated, meaningful work. It tells you what percentage of people you sent an email to actually read it, and how quickly. It will suggest reaching out to colleagues that you haven’t emailed in a while. It even shows you response times for your colleagues, and for yourself.
If your organization lives in Google Apps, as do many big Silicon Valley companies, browsing Delve felt like a revelation. You don’t have to be a numbers nerd to find this kind of information useful. If you’re a manager, Delve can tell you at a glance how much time you’ve spent with each of your employees over the past week. This kind of intelligence isn’t as sexy as a general AI that anticipates your every need — but it’s here today, it works, and it makes Google Apps look like a neglected backwater by comparison.
After six months of searching for a killer bot, I'm still bullish on the concept generally. The interactions they enable are vastly richer than the 1-800 numbers and forgotten small-business websites they will eventually replace. But I've been disappointed by much of what we’ve seen on platforms like Facebook Messenger and Telegram: at times they have felt like the slowest way to use the internet. Most seem barely more functional than SmarterChild-era bots on AOL Instant Messenger, and all the typing they require sends me screaming back to button-based graphical interfaces. For now the discussion around bots and AI remains driven by the industry's desire for a profitable new platform, rather than consumer demand for the services they provide.
When bots do their work in the background, they can feel a little bit like magic
Companies' response to that problem so far has been, essentially, that they're working on it. "Like many of these advanced technologies, people assume it’s all here today," said Mike Schroepfer, chief technology officer at Facebook, when I asked him about it in May. "And there's a lot more technology and work to be developed. I think this will improve month over month, year over year."
And yet visiting Microsoft made me wonder if I hadn’t been thinking about the subject in the wrong way. Chat-based interfaces are generally tedious. But the machine learning that powers them, applied to tools you’re already using, is really quite powerful. If Microsoft can infuse Delve-like intelligence into a wider range of services, it can reasonably say that it offers the most powerful productivity suite in the world. There will be tremendous value in that even if its vision of a massive platform for powering chatbots never materializes. And it may not — at this early stage, bots in the foreground too often feel frustratingly dumb. But when they do their work in the background, they can feel a little bit like magic.
Illustrations by Pete Ryan. Edited by Dieter Bohn.