"When Kirk asks the computer some complex question and it answers him intelligently, drawing from a bunch of different sources, that's the vision," Google spokesperson Jason Freidenfelds told me last week. He then introduced me to John Giannandrea, Director of Engineering at Google, and the man tasked with making the Star Trek computer come to life. Giannandrea and his team recently launched Knowledge Graph, a informational meta-sidebar you may've noticed in Google Search. Depending on whether you search a person, place, or thing, Google now delivers contextual results that often provide the answers you're looking for, no clicking involved. Knowledge Graph is a database of "common sense" factual data tied together by 500 million entities like Leonard Nimoy, Rockefeller Center, and even our own Joshua Topolsky.
An astounding 3.5 billion attributes tie all these things together, so Google can know that Mount Kilimanjaro is a mountain and the Siberian Husky is a breed of dog. In a way, by taxonomizing the internet, Google hopes to give you answers without asking you to click through to other pages. Giannandrea and company don't think it's far off to imagine information boxes popping up over landmarks while you're wearing Google Glass. But that's just a dream, for now.
Cataloging the world
"JG has been chewing on the problem of cataloging the world for 15 years," Freidenfelds said. Giannandrea's search curiosity ignited while he was CTO of Netscape, where he worked on "related browsing," a novel idea that purported to surface relevant webpages for users. "We thought it was weird that you couldn't go sideways on the web; you couldn't find all of the related things," he said.
Interest piqued, Giannandrea moved to MetaWeb where he helped build Freebase.com, "the largest database of e-knowledge in the world," he boasts. Freebase is a collaborative database built to represent relationships between things – something database tool MySQL wasn't capable of at the time. When Google acquired Metaweb in mid-2010 to kickstart its own ideas about cataloging the world, Freebase contained 12 million "entities," everything from people to places to movies and books. At that time, Wikipedia only contained three to four million pieces of information, Giannandrea said. Freebase is still live, and now contains over 24 million entities, but it has taken on a new role — as the backbone of Google's Knowledge Graph initiative, which launched in mid June.
Knowledge Graph attempts to provide results for search queries like "Leonard Nimoy" using answers like an image, life blurb, birthday, height, notable works, and related people. The data comes from dozens of places like Wikipedia, Freebase, animal taxonomy websites, the FDA and other open sources of information . "We want to have every highly regarded data source in our database. Unlike Wolfram Alpha, our system is a database of common knowledge and not computational knowledge," Giannandrea said.
"We're trying to tell you about what humanity is looking for when they search."
While Knowledge Graph might on the surface seem to merely generate a Wikipedia widget inside Google, the real depth is the way each entity is connected. Tons of user data helps Giannandrea and company figure out what you're trying to find, even if you don't know it. "You might be interested in Albert Einstein because of his work in physics, or because of his peace activism – we sometimes have to put Einstein in the same bucket as Gandhi," he said. "We're not trying to tell you what's important about Einstein – we're trying to tell you about what humanity is looking for when they search," he added. For example, Google knows that when you search for an actor, it's likely that you're either searching for a recent film of theirs or a classic film of theirs. Hence, searching for Leonard Nimoy surfaces Star Trek II: The Wrath of Khan (1982) as well as the TV show Fringe (2008-current) in the Knowledge Graph sidebar box.
Google also uses user data to determine how people are related in interesting ways. For example, searching for Robert De NIro surfaces movies he's been in, but also "Related People" like Al Pacino, Joe Pesci, Martin Scorcese, and Jack Nicholson. Giannandrea hopes these "retated" modules will help users with tip-of-the-tongue searches where they can't remember somebody's name. Perhaps somebody is searching for Sylvia Plath, but can't remember anything besides the fact that Plath reminds them of Virginia Woolf. Searching for Virginia Woolf surfaces Plath as a "related person," even though the two have few actual facts in common. Google updates its database every day using feedback cycles from users.
The problem with names
Building a 500-million-page catalog filled with an entire planet of related entities isn't easy. "No entity is duplicated in our catalog," Giannandrea said. "Barack Obama is the same entity – book author, senator, president – that turns into a very hard reconciliation problem." But what about all the notable John Smiths that have ever lived? "It's mostly algorithmic for trying to detect who's who," he said. The amount of statistical data in common between two John Smiths is a lot smaller number than you'd think – it's likely that all two John Smiths have in common is their name, and to a computer, that's just one data point. "It turns out that names are fairly unambiguating," Giannandrea said. "People name people after names and name companies after nebulas, so names themselves are not enough."
"People name people after names and name companies after nebulas, so names themselves are not enough."
There are two million notable people in Knowledge Graph's catalog today, which seems like a lot to handle, but what's even tougher to handle is indexing every notable book, movie, and album ever produced. "If a book was sent to the Library of Congress, that's a reason for inclusion," Giannandrea said. "The goal is to include anything that's notable," he said. So how does the Knowledge Graph database stay up to date? "Freshness is something we care a lot about," Giannandrea said. "We update our database every day, so when something happens, we make sure we're on top of it."
"We are becoming symbiotic with our computer tools, growing into interconnected systems that remember less by knowing information than by knowing where the information can be found," a 2011 study by Harvard University concluded. I cited the study and asked Giannandrea if Google is considering the ramifications, good or bad, of making so much information so easily accessible. It ends up that he doesn't buy into the study at all. "I don't think our human memories are atrophying," he said. "Search in general makes us all a few IQ points smarter. We all end up knowing more stuff. You might know a few things about Leonard Nimoy, but now you might know a few more things." You'll learn these things about Nimoy from Wikipedia, but also from IMDb and dozens of other sources all working together to agree on how tall Leonard Nimoy is. So Captain Kirk is a smart guy, even if he has to ask the computer for things every five minutes. If the information is that easy to access, then really, what's the difference?