For decades, visions of the future have played with the magical possibilities of computers: they'll know where you are, what you want, and can access all the world's information with a simple voice prompt. That vision hasn't come to pass, yet, but features like Apple's Siri and Google Now offer a keyhole peek into a near future reality where your phone is more "Personal Assistant" than "Bar bet settler." The difference is that the former actually understands what you need while the latter is a blunt search instrument.
Google Now is one more baby step in that direction. Introduced this past June with Android 4.1 "Jelly Bean," it's designed to ambiently give you information you might need before you ask for it. To pull off that ambitious goal, Google takes advantage of multiple parts of the company: comprehensive search results, robust speech recognition, and most of all Google's surprisingly deep understanding of who you are and what you want to know.
With Android 4.2, launching alongside the Nexus 4 and Nexus 10 on November 13th, Google has updated the feature with new information cards in new categories. And yet, the amount of engineering effort that makes Google Now possible is out of proportion to what it does — it's a massive, cross-company effort for what seems like a relatively small product. That difference is a clue. Google Now isn't important for what it does, well, "now," but the building blocks are there for a radically different kind of platform in the future.
We sat down with the teams responsible for some of the technology that went into Google Now to find out what makes it tick today and discover some hints about what it could be in the future.
A deeper understanding
You may not be familiar with Google Now, primarily because it's only available on the sliver of Android devices running Jelly Bean (and up) — a situation that sadly won't change with the latest version. It's essentially an app that combines two important functions: voice search and "cards" that bubble up relevant information on a contextual basis.
Actually, Google Now technically only refers to the ambient information part of the equation, a branding kerfuffle that distinguishes it from Apple's Siri product yet still causes confusion. Those cards might contain local restaurants, the traffic on your commute home, or when your flight is about to take off. They appear automatically as Google tries to guess the information you'll need at any given moment.
While it seems like a relatively simple service, it's only really possible because of the massive amount of computational power Google can leverage alongside the massive amount of data Google knows about you thanks to your searches. It's "precisely what Google is best at," Android's director of product management, Hugo Barra, tells us. "It really feels like we’ve been working on Google Now for the past ten years. Because Google Now touches every back-end of Google, every different web service that’s been developed over the last ten years or so is part of this service."
The breadth of that backend and the simple cards it enables is what makes Google Now so intriguing as a product. One of Barra’s favorite examples is a voice search for something that pulls from all those multiple sources and turns it into a comprehensible and useful result. Searching for “Directions to the museum with the William Paley exhibition” causes Google to 1) find that exhibition, 2) understand you care about the museum where it is being shown, 3) know your location, and finally 4) present you with a simple map card to the museum itself along with a button to immediately get directions.
Taking all of that complex data and turning it into a relatively simple and useful interface is a gargantuan undertaking, but Google has started with a somewhat small set of categories for the types of cards it shows. With Jelly Bean, you'd see calendar alerts, weather, flight times, sports scores, transit directions, local restaurants, and a few more categories of information.
Even within that limited set of data, Google has to make choices about which cards to show you and when. It uses a few different signals — location, time, and all of your recent searches figuring prominently among them — to decide what to show you in any given moment. "It’s essentially a ranking problem, and it’s a very complicated one," according to Barra, but Google has perhaps more experience at solving ranking problems than any other company after years of delivering search results.
In my experience, Google is able to get you the "right" information you want a relatively small percentage of the time, but that low hit rate doesn't actually hurt the experience all that much. That's mainly thanks to the fairly small number of categories cards fall into, but also to the fact that when Google Now gets it right, it really feels magical. The sort of thing you might manually search for — like your commute time home — is simply waiting for you.
With the latest update, Google is expanding Now into new categories, increasing the different kinds of information it's able to provide. The new additions aren't radically ambitious, but that's in fitting with the overall feel of Google Now. What it shows you is more about serendipitous information than structured data.
The first category involved Gmail integration. With your permission, Google will keep an eye on your inbox and recognize flight confirmations, hotel reservations, restaurant bookings, event tickets, and package tracking emails. It will take that knowledge and give you a relevant card when appropriate — say, giving you your hotel information when you land in the right city or letting you know when it's time to leave for a concert.
The new features are part of Google’s growing efforts to provide relevant results based on the knowledge it’s accumulated about you. As search gets better, so do people’s expectations for what it provides. “Of course Google’s going to access more than just the public information on the web,” Scott Huffman, Engineering Director for Search Quality at Google tells us, “Google’s going to know when my flight is, whether my package has gotten here yet and where my wife is and how long it’s going to take her to get home this afternoon. [...] Of course, Google knows that stuff.” If you’re willing to opt in to letting Google know so much about you — and increasingly, opting in is the default — then Google wants to return the favor by using that information to your benefit. It requires you to trust Google quite a bit, but the company hopes that your trust will be rewarded.
"Google’s going to know when my flight is, whether my package has gotten here yet and where my wife is."
These new cards are actually similar to a feature that Google added to its web search results this past August, both in content and in style. That's probably not an accident — if you assume Google has already won the battle for search, the next battle is giving you information before you even search for it. When it comes to deciding which data to give you, Barra tells us that Google has "a pipeline [...], possibly in the hundreds of cards” from its many engineering teams. Rather than flood users with all of those new cards, Google is taking a slow and steady approach to adding those new features — if only because right now it can only add those cards with a software update.
Some of the other new categories of cards are relatively minor additions: stocks, news, local concerts, movies, and local attractions. It also has a basic exercise tracking card that utilizes the phone’s accelerometer and location data: every month it will let you know how far you've walked or biked and also tell you how it compared to the month previous. Another new card lets you know that you're near a "photo opportunity," as Product Management Director Baris Gultekin told us. It uses data from Google's Panaramio service, noting when you're close to a place that has a "high density of pictures taken at a spot." You can see photos that were taken at the landmark and, Google hopes, take one yourself.
Just as Google Now's ambient information is backed by a massive and unseen engineering effort, Google's voice search is a simple feature that belies the effort that goes behind it. Huffman points out that getting voice search right actually involves more than just turning spoken words into textual queries, "speech recognition, natural language understanding, and understanding entities and knowledge in the world [all] really have to come together."
Voice search is the sort of feature that we take for granted on smartphones — Apple’s Siri and even Windows Phone both use the feature to offer up search results that go beyond basic web searches. What used to be a "hey neat" kind of feature is increasingly becoming an expected feature, and Google is well aware of that, "As you make search better, people’s expectations go up." To meet those expectations, Google is attacking all three of the areas Huffman delineated in equal measure.
Speech recognition is a very difficult problem to solve, as anybody who has dealt with voice search knows all too well. Recently, Google has changed its approach to making it work in a fundamental way, replacing a system that was the result of years of effort with a new framework for understanding the spoken word. Google has shifted to using a neural network that's much more effective at understanding speech.
A neural network is a computer system that behaves a bit like the actual neurons in your brain do. Essentially, the computer is designed with layers of software-based "neurons" that do the same thing actual neurons do: take input in and "fire" off to other neurons based on the data they receive. Over the summer, the results of research led by Google Fellow Jeff Dean's on neural networks made some waves: Google had taught a computer to recognize cats in videos. The interesting part is that the neural network essentially created the concept of "cat" on its own without direct human intervention.
Here's how it works: The first layer of neurons looks for very simple things, like angled lines or colors. If it sees something that matches, it fires off a signal. There's then a second layer of neurons, which simply pays attention to sets of neurons firing from the first layer. As you add in more and more layers with the same behavior, you essentially add in layers of conceptual abstraction until, at the very top layer, there's a neuron that has trained itself to recognize cats 15.8 percent of the time.
Of course, that doesn't mean that the computer "understands" cats in a conscious way, but the effect of it being able to recognize something like a cat without direct human training is what's important. "With a lot of other machine learning techniques," Dean explains, "you often have to do a lot of work to hand-engineer exactly the right features [...] Whereas with a neural network you can feed in much rawer forms of data."
Google's research scientists took this method and essentially applied it directly to speech recognition, fellow researcher Vincent Vanhoucke told us. "We picked up the kind of work that Jeff’s team was doing and just changed the input of the system." Google used the neural network at a very basic level of speech recognition: understanding and interpreting the basic sounds of speech: phonemes.
The approach "led to about between 20 to 25 percent reduction in the error rate in our system," according to Vanhoucke. The neural network turned out to be exceptionally good at solving what used to be very thorny problems in speech recognition. Accounting for "different environments, [...] different accents, different tones of voice, different pitches, different background noise, different microphones, [...], people talking in the background, different audio conditions" became much easier because the network was able to automatically learn how to account for each situation.
Just understanding the words you've spoken isn't enough, obviously. Just as a neural network trades in increasing layers of abstraction, Google itself needs to move beyond basic web queries. In a very real way, Google is trying to get its computers to actually understand what it is you're asking them. Part of that comes from a relatively new initiative called the "Knowledge Graph," the company's effort to compile a database of "entities" in the world.
Today, Google's servers are aware of 500 million such entities, and "knowing" those things means that the company is able to act on them in interesting ways. For example, if you search for a “Tom Cruise,” Google knows you’re referring to a person instead of a vacation and can then tell you specific facts about him instead of simply crawling the web for related words. In truth, Google only knows those details because it is so adept at crawling the web — but the additional layer of abstraction created by putting that information into the structured Knowledge Graph means that Google can do more with search results. It "allows voice search, in some sense, [to] give me something to talk about," says Huffman. In Tom Cruise example, returning an “entity” instead of just a search result means you can contextually ask for more information, like “what movies has he been in?” or “how tall is he, really?”
Having something to talk about and talking to somebody are two different things, and with regard to the latter Google is again taking a Google-esque approach. As opposed to Apple's Siri, which you could say has a distinct personality, Huffman says that Google has "shied away from the idea of kind of a human persona for search or for the entity that you’re interacting with and instead tried to go for, in some sense, ‘hey, you’re interacting with all of Google.’"
All the different parts of Google are finally working together
The Google that you're interacting with in Google Now is very different than the Google you used even a year ago. The company's products have often felt fragmented, serving small niches and launched without feeling fully thought-through — and then in too many cases simply killed off. That may have been a function of the fact that Google is so large and does so much — but Google Now is a sign that all the different parts of Google are finally working together in a cohesive way.
"Google Now actually started as a twenty percent project," Barra told us. Google famously encourages its employees to work on "side projects" for some portion of their time, and what's interesting about Google Now is that although it started two years ago as one of these side projects, it's become a catalyst for integrating so many different parts of Google. Barra tells us that “we literally have dozens of teams working with us right now,” and the achievement with Google Now is that it feels like those teams are integrated, not fragmented.
In a single app, the company has combined its latest technologies: voice search that understands speech like a human brain, knowledge of real-world entities, a (somewhat creepy) understanding of who and where you are, and most of all its expertise at ranking information. Google has taken all of that and turned it into an interesting and sometimes useful feature, but if you look closely you can see that it's more than just a feature, it's a beta test for the future.