clock menu more-arrow no yes

Filed under:

Why Google is learning about goats

New, 42 comments

The Knowledge Graph is beating back the chaos of the web, one random fact at a time

Danny Chapman / Flickr

A few weeks ago, I discovered that Google knows the lifespan of a goat. Search for "how long does a goat live" and you'll see it displayed in a special card above the search results. 15 to 18 years! It's not an important fact, and I can't imagine people ask it very often — but there it is. I couldn't tell you where they got the answer (it's surprisingly hard to nail down, as I'll get into later) but I'm pretty sure it's right. It's the kind of accidental discovery that Google loves to serve up. I went looking for a fact, and there it was. You come away feeling as if the engine knows the answer to any question you could ask.

The official name for this feature is the Knowledge Graph, Google's project for converting information on the web into easily managed cards. The sudden appearance of the goat data says a lot about the piecemeal way Google has been building it. How long had they known about goats? I made a few calls and Google got back with an answer: the card was added a year ago, as part of a broader animal expansion that also included a goat's mass (45 to 300 kilograms) and height (40 to 58cm), with similar specs for other beasts. Unless you'd thought to Google "how long does a goat live", you would never have known.

The Knowledge Graph has been running for two and a half years, and gradually expanding the whole time. It was baked into Android from the beginning, which also fitted into the gradually improving Voice Search. That means you can ask your phone a question about your goat out loud, and the voice processor will route you through Search and back to the Knowledge Graph. This particular card was the result of lots of minor updates together — one establishing "Animals" as an entity, others establishing specific genuses and specific characteristics.

There's a lot of money riding on Google's knowledge structure

There's a lot of money riding on that structure, bolstering Google’s search revenue and the Android ecosystem at large. Apple and Microsoft are also in the game, with Siri and Cortana respectively, although Google is still generally accepted as the industry leader. In a recent test, the Knowledge Graph was able to answer 88 percent of an array of questions, with Siri and Cortana clocking in at 53 and 40 percent, respectively. But the test only hints at the larger question: When you ask your phone or computer a question, how sure are you that it will know the answer?

Google’s incremental approach is part of the reason it’s gotten so far ahead. Siri gets new features with each iOS upgrade, but they tend to come in big chunks, like this summer's song-identification update. That gives Apple's engineers less chance to iterate, and puts more pressure on each update to add something big and meaningful — not just small stuff like goat lifespans. Cortana has opted for the Google approach, using a web-based architecture to add new features and data every two weeks, but given Google's two-year head start, it may be a while before Microsoft catches up.

The Knowledge Graph's biggest head start is the simple Google Search. If you search for "goat+lifespan," an answer is already pretty close. The Knowledge Graph just gives it a little more structure. "We understand that there are people, and people have parents and children and spouses," says Emily Moxley, the lead product manager for the team building out the Graph. "They have a height, they have a birth date, they have a death date. That's the kind of semantics that the Knowledge Graph contains, and you can combine that with knowledge from the web and from user queries to actually answer questions."

"We understand that there are people, and people have parents and children and spouses."

Sometimes that can mean really tricky epistemological work, like last month when Moxley and her colleagues taught the Knowledge Graphs about fictional worlds. Lots of people were Googling "Who plays Wolverine," but building in the answer meant establishing a robust definition for what it means to play a character. Certain Persons were already classified as Actors (Hugh Jackman, et al), but now those actors were associated with other entities known as Fictional Characters (Wolverine, Jean Valjean) which in turn could be associated with other Actors, and so on and so on. There are still some problems with compound questions ("how tall is the actor who plays Sherlock?"), but Moxley says she thinks the Graph will be able to tackle them eventually. "It seems like we should be able to answer it, but we can't today because it's multiple hops," Moxley says. "It compounds the chance of getting it wrong."

There's also no specific place that Google trusts as a source for actor information, or animal information. It's just that when you look for lengths of time that are associated with "goat + lifespan," they tend to fall between 15 and 18 years. It's ambient data, coming from everywhere and nowhere. If you get enough of a consensus, it turns into something that looks an awful lot like a fact.

Goats, like people, are complicated

That can cause problems, like the controversy earlier this month when television host Stephen Colbert insisted his height was 5’11", not the diminutive 5’10" listed on his Google Card. (Eventually they settled on 5’10½".) It was mostly a joke, but there was a real anxiety behind it. What if Google churns up the wrong fact? Regarding goats, even "15-18 years" isn’t as perfect an answer as you might want. The American Dairy Goat Association lists eight to twelve years as their standard, although the answer may be dairy specific. Pygmy goats have even fewer years to live, if you want to get into distinct breeds. Goats, like people, are complicated.

You can seek out more definitive answers, but that comes with other problems. If you ask Siri how far away the moon is, the data will come from Wolfram Alpha, a more manicured database that deals in verified mathematical facts, performing intricate calculations from a relatively small knowledge base — but it only works for answers you can calculate. There’s also Wikipedia, a slightly less reliable source with vastly more information to draw on. If Google isn’t sure about a fact, it will skip the card entirely, in part because of the Colbert factor. But verification is still mostly a matter of finding the same answer in lots of different places. If everyone thinks goats live for 200 years, then so will Google.

Google usually talks about the Graph as a kind of Star Trek computer — a database that has the perfect true answer to any question you might ask — but it’s a much squishier kind of truth than they let on. We've spent decades figuring out how to represent knowledge in a computer, and we still don't have an answer that comes close to the way human beings actually think about the world. Our minds work more on impressions than facts (the sky is bluish, except sometimes when it’s gray, and so on), which becomes a problem when you’re trying to code out a map of all the world’s knowledge. But Google seems to have found away around it, pulling the web’s ambient knowledge into something harder and more immutable. It’s not perfect, mathematical truth in the style of Wolfram Alpha, but it’s close enough that you can trust it and broad enough to usually have an answer. Maybe that’s as close to perfect truth as we need to get.