First Click: Siri and Alexa aren’t speaking my language
March 24th, 201618
You can’t swing a cat in a San Francisco co-working space these days without hitting someone halfway through a passionate article expressing their love for the Amazon Echo, the voice-enabled speaker released by the retail giant in 2014. Alexa, for that is the name of the Echo’s voice assistant, has only grown in power along with the speaker’s popularity, making it that rare example of a tech product where the hype builds gradually after an understated launch.
I sure would love to check the Echo out, but I’m not sure I’ll ever get the chance to. And that’s not just because it’s not available in Japan, where I live and where it hasn’t been released — of course I could always import one. The reason I don’t think I can ever use the Echo properly is because I speak a second language.
The appeal of the Echo, and smart voice assistants in general like Google Now and Apple’s Siri, is that the primary interface is the most natural and effortless one available to humans: our voice. At their best, you should be able to use them without thinking.
Turn on the lights.
What time is the next Southampton match?
Add maraschino cherries to my shopping list.
I use the Amazon website nearly every day in Japanese; the company probably has a pretty good idea by now of who I am and what I’m into. Last month I even used an Amazon service that offers expert wine advice over the phone, which is quite understandably not available in English. But if the Echo came out here, I would likely have two choices at best: use it in English and miss out on the deepest integration with Amazon’s fast-growing ecosystem — I wouldn't be able to ask about most Japanese products or services in English, for example — or use it in my second language and miss out on the ability to interact with Alexa as naturally as possible.
Granted, the Echo sounds cool enough that I’d still probably buy one even if I could only use it in Japanese. But its functionality is one-way and focused on the home; it's not a mobile communications device. Similar issues plague smartphone assistants like Siri, on the other hand, to the point where I just never use them. I could set my phone to English, but then I can’t ask for directions to places with Japanese names because the map data isn’t there. I could set my phone to Japanese, but then I can’t reply to a message sent to me in English. Both options have enough compromises to undermine the point of using voice in the first place.
The main problem here is that the assistant can only listen for one language at a time, and anything that falls outside the patterns it recognizes is considered unintelligible. Even adding new languages as a feature for people that only speak that language can be supremely tricky, as Apple services SVP Eddy Cue explained on a recent episode of John Gruber’s The Talk Show podcast.
"Apple TV presents an interesting problem compared to just Siri itself in that a lot of the things that you search for are not in the native language that you’re speaking," said Cue. "So let’s say you’re speaking in Spanish but you’re searching for an English title. Siri has to be aware that it’s actually able to speak multiple languages and understand when it is that you’re asking for a title versus when it is that you’re giving a verb or a noun to it."
Solving this for something like the Apple TV, with a relatively straightforward library of content, is difficult enough. Solving it for something like Apple Maps, which has millions of irregular entries with idiosyncratic names, seems nearly insurmountable. But even our current versions of Siri and Google Now would have seemed like impossible science fiction a decade ago, considering the groundbreaking natural language processing and machine learning required to get to this point.
The world is getting more multicultural, not less
And the great thing about machine learning is that new services that rely on it are only ever going to get better. Still, I would like to see a little more consideration of end users in the way these systems are designed from the ground up. It doesn’t seem like it’d be unfeasible to let me select both Japanese and English as voice options for certain situations, so that when I receive a message in either language, for instance, I can reply by talking into my watch without having to reboot my entire phone.
You could call all of this a first-world problem, and on some level I’d agree: it’s a problem indicative of a first-world mentality to product development. Not everyone in the world speaks native American English; not everyone in the world speaks a single language at home; not everyone in the world speaks the dominant language of the country they live in. And even if you do, you're still likely to run into some of these issues when you travel. The world is getting more multicultural, not less, and this is something that technology should reflect.
Of course, it’s easy to see why that these services are developed first for English and English only; that’s the simplest route to the largest and most affluent audience. There’s a reason why Amazon hasn’t even released the Echo outside the US. But you can’t solve a problem if you don’t acknowledge it exists, and here’s a problem: perhaps the most important interface of the future currently doesn’t work well for multilingual people or countries.
Five stories to start your day
Apple's iPhone SE and 9.7-inch iPad Pro are now available for preorder
It’s March 24th around the globe which means Apple’s 4-inch iPhone SE and 9.7-inch iPad Pro are now available to preorder ahead of their March 31st availability. The iPhone SE shares its looks...
Sony forms new company to make PlayStation mobile games
Sony Computer Entertainment, to be known as Sony Interactive Entertainment from next week, has announced the formation of a new company called ForwardWorks that will focus on smartphone games. In a...
New Zealand's flag isn't changing after all
New Zealand has voted to keep its current flag, rejecting an alternative design that was selected following a ten-month process. In a two-part referendum concluded on Thursday, 57 percent of voters...
Google may be working on a Periscope competitor called YouTube Connect
Google may be developing a way to make live streaming on YouTube much easier. VentureBeat reports that the company is working on a new mobile app called YouTube Connect that would offer similar...
Apple is selling Microsoft Office 365 as an accessory for the iPad Pro
Apple wants the iPad Pro to replace Windows, and to convince customers it's bringing in a familiar face or two: Microsoft's Office suite. As part of the ordering process for the new iPad Pro,...
Voice technology user of the day
When her mic came back on, @HillaryClinton was pretty excited about it. pic.twitter.com/ybaw3mwQeY— Sarah Parnass (@WordsOfSarah) March 23, 2016