Nearly a decade ago, SoundHound founder Keyvan Mohajer took an idea to a group of investors. He wanted to make a system that let people talk to computers casually, as if speaking to another human. That was not a new idea of course; 1968's 2001: A Space Odyssey had a talkative computer as one of its main characters. But Mohajer believed such a thing was no longer science fiction and could become commonplace. The only problem? It might take 10 years to build it.
This is what SoundHound was originally supposed to do
Investors were enamored with the idea, but not Mohajer's timeline. They said, "Ten years is a long time, can you show me something that will happen in three years?" he recalls. With that, Midomi was born, a service that would let you hum the tune of a song to identify it. Two years later, in 2009, he launched SoundHound, which did the same thing for music overheard on the radio or in the background of a TV show.
Now, nearly a decade after that pitch to investors, Mohajer's original vision is here in the form of Hound, a voice search app that can handle incredibly complex questions and spit out answers with uncanny speed. Right now, you have to ask those questions inside the Hound app, but the company hopes to get the technology everywhere — even your toaster. That may never happen, but the company's demonstration of Hound — which was fairly scripted in our case — is astonishing enough to make me believe it's a possibility.
Mohajer started with a zinger. "What is the population of capital of the country in which Space Needle is located?" he asked briskly. It's an oddly worded question, but intentionally so, meant to show how well it can extract and process what's being said. Ask it on any other service (even Wolfram Alpha), and you'll get the digital equivalent of a head scratch. But here, a robotic voice instantly replied, "The population of Washington, DC is 601,723." There were two Washingtons there, and it got the right one. In another test, he asked, "How many days are there between the day after tomorrow and three days before the second Thursday of November in 2022?" The app nailed it again.
Hound feels a lot like Google's Voice Search
Hound the app functions and feels almost exactly like Google's Voice Search, but seems much faster at identifying words and delivering answers. In our demo, which contained several dozen scripted questions but also some impromptu ones, the words coming out of Mohajer's mouth popped up on screen nearly as fast as he was saying them, and Hound would pipe back with an answer faster than seemed possible.
Mohajer says the speed comes from SoundHound combining two technologies that are typically separated on competing services. Hound is doing both voice recognition and natural voice understanding in a single engine, whereas rival services break them up into separate steps, first transcribing your question, then extrapolating what you were asking about. That said, our test also took place over Wi-Fi, and in a perfectly quiet room, making it impossible to tell whether Hound maintains these speeds in the real world.
This is a personal assistant without a personality
Unlike Siri or Cortana, Hound doesn't have a personality. Instead, it's a sass-free robotic voice. One other area where it's different is the number of sources it's pulling from. From the outset, Hound will have about 50 domains, or services it's tying into through APIs; things like currency converters, news sites, flight status information, and navigation. Mohajer says the plan is to ramp that up into the millions. "Siri launched with 10 domains, and three years later it's at about 22 new domains, so it takes a long time," he says.
For example, with Hound's deal with Expedia, you can ask Hound to find you a hotel in Seattle that costs less than $200 a night, that has free Wi-Fi, parking, and a continental breakfast. It's the same information you could get on Expedia's site, of course, but here, there's no need to click on a bunch of filters. There are other simple tools it's linked up to as well, things like a mortgage calculator (from a real estate site Mohajer would not disclose) and a speech-based game of Blackjack where you can place bets with your voice.
For everything that doesn't get picked and assigned to one of Hound's sources, the app defaults to Microsoft's Bing. That means web results, including videos and images, are all shown in an integrated browser. Sometimes that's just fine, but in similar tools like Siri and Cortana, web results are a sign the system couldn't keep up with what you're asking of it. Mohajer contends that by kicking people to web results, nobody ends up feeling disappointed, though I'd argue that if it happens enough you'll just stop using the app entirely and forget about it. I wasn't quite able to push the boundaries of Hound beyond our demo, something users will get a chance to do once the service launches today.
This has been designed to replace Google, but it can't just yet
That brings up one of the weaknesses of Hound in its current form: it's not available as a replacement to other voice assistants. Developers will be able to integrate it into their own apps and hardware creations through a development platform called Houndify, something Mohajer believes will be widely adopted.
"Our vision is that everything can be enabled to have this interface, from millions of phones to billions of other types of devices like consumer electronics and cars," Mohajer says. "We can't be the company to build this for every company — we need to enable them to do this for themselves."
But until that happens, most will know Hound for its app, which will be available only as an invitation-only beta on Android to start, followed by iOS where it will exist as a stand-alone app. That's a lot like how Siri was a third-party app before Apple bought it, and how Google still is on iOS. It also means that you need to have a very specific reason to use Hound over those built-in options on both platforms.
You still have to go out of your way to use this
It's worth noting that Hound is arriving at a time when Google and Apple are stepping up efforts to add context to the things people are looking at on their phones, often using voice interfaces, which could almost entirely remove the usefulness of Hound for simple searches. Last week, Google unveiled Now on Tap as part of its upcoming Android M release, a feature that brings its Now service inside of every app and gives the company an incredible amount of context for why you're looking for something. It hopes it will be good enough that you never even need to leave an app to pull up something you might search for. Apple is also rumored to be working on a feature called Proactive that attempts to put relevant apps and information in front of users without them having to search for it in the first place.
That hurdle of having to find and launch Hound could change if app developers build the voice search into their apps, or if SoundHound and its technology get snapped up by one of these larger players. In the meantime, Mohajer believes that Hound's performance and experience will be enough for people to go that extra step of launching it before they ask, what they've been doing with the company's audio recognition apps for years.
"Just because it's easier to get to something is not enough for me to choose it. I don't use Siri for food, I use Yelp, even though Siri uses Yelp data, because they have a better experience. I use Google Maps on iOS instead of Apple Maps, even though Apple Maps is more integrated," he says. "I think if you deliver something that is substantially better, people will use it."