On Thursday morning, Amazon announced its acquisition of Ivona Software, a text-to-speech company. Ivona already powers the "Text-to-Speech," "Voice Guide" and "Explore by Touch" accessibility features of Amazon's Kindle Fire, so bringing those in-house creates some obvious synergies there — but its primary business is Speech Cloud, a software-as-a-service infrastructure that companies use to automate call centers or add e-mail-to-speech functionality. Ivona is eleven years old, and was founded in Poland by engineers Łukasz Osowski and Michał Kaszczuk. The company is very big in Poland and Eastern Europe, but has global reach, which is important in obvious ways both for the thorny problem of text-to-speech in multiple languages, and for a global company like Amazon. In particular, one of the problems for text-to-speech on the Kindle Fire is that it's currently only available in a female voice, speaking English, in the United States. Surely Ivona can do something about that.
I mention all this because one of the best and worst things to happen to speech and voice technology has been Apple's Siri. Siri's voice-navigated artificial intelligence functions give such good demo and have so much potential, both as an alternative UI and for accessibility, that speech and voice have much more visibility now than they did two years ago. The tradeoff is that everyone tries to map anything touching speech and voice to Siri. We did this a little over a year ago when Amazon acquired speech transcription company Yap. Yap wasn't much like Siri, and Ivona, which isn't even on the input side, is even less like Siri. But we're seeing the same connecting of unconnected dots in the tech media that we saw then. (There's far too many, and most of these stories are far too foolish, to list.)
So let's review:
- Text-to-speech: reads text that's already been written into something approaching a human-sounding voice (Ivona, AT&T, Microsoft, many others);
- Speech-to-text: transcribes what you say word-for-word into text (Dragon, Yap);
- Voice recognition: biometric that knows who you are based on your voice (like in Sneakers);
- Natural-language AI: transcribes speech and/or parses text, looking for keywords and structure to turn ordinary sentences into computer queries (Siri's core technology).
To their credit, Amazon and Ivona haven't dangled the Siri carrot at all. (An Amazon spokesperson was also not able to comment on the purchase price or future plans for Ivona's technology.) But a visible company like Amazon investing heavily in voice naturally gets people to wonder. Most of the companies Amazon's acquired have been retail competitors like Zappos, media/data companies like IMDb, or infrastructure companies for its cloud or warehouse businesses, like Kiva Systems. So what is the Ivona acquisition about?
Ivona boosts Amazon's Kindle and cloud products
Good text-to-speech has obvious value to a company making consumer electronics devices, which is why the Amazon partnered with Ivona in the first place on the Kindle Fire. Your devices become usable for the blind and visually impaired. Besides the inherent good in being accessible, it's also a necessity if you want enterprise and government implementation. In fact, as Laura Hazard Owen points out, accessibility lawsuits from the National Federation for the Blind and others have kept Kindles out of schools and universities, and may have helped sink a giant deal with the US State Department last year.
Now clearly Amazon saw something inherently valuable in Ivona's technology and something more valuable if it could bring text-to-speech in-house, giving Ivona access to future prototypes both to speed up development and smooth the implementation.
Maybe there's a way to bring back audio and text-to-speech support for new E Ink Kindles
In particular, if you're a company who sells a lot of e-books, good text-to-speech is even more powerful. Not only can blind and visually impaired readers enjoy your books, you also potentially have instant, automated audiobooks. This was a key feature of the early Kindles. But with this generation, Amazon cut audio support for its E Ink Kindles. No speakers, no headphone jack, no text-to-speech. Text-to-speech is for the Kindle Fire only. This saves some hardware and software costs, but limits the use of an E Ink Kindle, especially for older readers. It also perturbed many of the Kindle's longest supporters, who fell in love with audiobook and MP3 support as well as text-to-speech. Maybe there's a way to bring it back, and Ivona is a part of that. Amazon isn't likely to do anything to cannibalize its audiobooks wing Audible (except apparently drop audio support on inexpensive Kindles), but there's a long tail of books in multiple languages that haven't been given the professional reader treatment.
Finally, in case you've forgotten, Amazon isn't just in the consumer electronics business. It's a giant cloud and infrastructure company, and so is Ivona. In particular, Amazon sells plug-and-play infrastructure to many, many businesses, and so does Ivona. It gathers a lot of data, and so does Ivona. You don't think Amazon doesn't want to use Ivona to offer automated voice interfaces and Yap to offer automated transcription on the side? Seems like a win to me.
Look, Siri's great. But that's because its AI is pretty good and pretty distinctive, which makes it very different from these other companies' services. But Siri is also great because voice and speech are pretty great. Maybe they seem natural and simple, because most of us just talk and listen and don't have to think about it.
Siri is great because voice and speech are great — but they're also very hard
At a computing and interface level, voice and speech are still great, but they're also very hard. There are many different problems that need to be solved in order to make them as reliable as text, mouse, and touch input have become. It's unlikely that any one company is going to be able to put all the pieces together with equal fluency. Microsoft is trying, a few people are trying. But even with Siri, Apple focused on what it could do (build a great iOS app), acquired what it couldn't (AI software), and outsourced the rest (Siri's search backend). Amazon appears to be doing the same thing, but to a very different end.