A day ahead of its annual fall hardware event, Amazon is making a big partnership announcement: it has created the Voice Interoperability Initiative, which is sort of a statement of intent from over 30 different companies that they will strive to ensure devices will work with multiple digital assistants at the same time. For example, you could talk to either Alexa or Cortana on the same smart speaker simply by saying the appropriate wake word.
“As much as people would like the headline that there’s going to be one voice assistant that rules them all, we don’t agree,” says Amazon’s SVP of devices and services Dave Limp. “This isn’t a sporting event. There’s not going to be one winner.” Limp argues that if there will always be multiple voice assistants, they should work better together.
A wide array of companies that build both software and hardware for voice assistants have signed on to the initiative. I’m just going to quote Amazon’s press release directly to give you some of the companies on the list because it’s clear Amazon is going for some shock and awe here, especially since the list includes a few major players. I’ve bolded some notable ones:
More than 30 companies are supporting the effort, including global brands like Amazon, Baidu, BMW, Bose, Cerence, ecobee, Harman, Logitech, Microsoft, Salesforce, Sonos, Sound United, Sony Audio Group, Spotify and Tencent; telecommunications operators like Free, Orange, SFR and Verizon; hardware solutions providers like Amlogic, InnoMedia, Intel, MediaTek, NXP Semiconductors, Qualcomm Technologies, Inc., SGW Global and Tonly; and systems integrators like CommScope, DiscVision, Libre, Linkplay, MyBox, Sagemcom, StreamUnlimited and Sugr.
It’s a very long list, and three very prominent companies are missing from it: Google, Apple, and Samsung.
The companies that are on board seem pretty chuffed, if the quotes they provided for Amazon’s press release are any indication. Intel said its 10th Gen chips will work with “multiple assistants this year,” and Qualcomm said its chipsets are capable of doing multiple wake words already.
If you read between the lines of this statement from Andrew Shuman, CVP Cortana at Microsoft, you’ll find the gentlest possible nod to how Google and Apple have made their platforms unfriendly to third-party assistants: “We expect the initiative to help us expand this vision to even more companies and foster a balanced ecosystem that empowers companies to create and make their assistants available, on all platforms.” (Emphasis mine.)
More intriguingly, other companies seem eager to get their voice assistants on Echo devices. Salesforce CEO Marc Benioff writes that “We look forward to working with Amazon and other industry leaders to make Einstein Voice, the world’s leading CRM assistant, accessible on any device.” Meanwhile, Spotify’s R&D officer is quoted as saying, “We are excited to join the Voice Interoperability Initiative, which will give our listeners a more seamless experience across whichever voice assistant they choose, including the ability to ask for Spotify directly.” (Emphasis mine.)
Baidu’s participation is also notable. The Chinese company’s DuerOS voice assistant has over 400 million users, which is more than Alexa but fewer than Google Assistant. Baidu trails only Amazon as the second-largest maker of smart speakers, according to research firm Canalys, having overtaken Google recently, despite only serving the Chinese market.
The idea, these companies hope, is that there will be two kinds of assistants. One type will be broad in its knowledge and capabilities (think Alexa, Siri, and Google), but others will be narrow and deep, context-specific to their domain of knowledge. The goal is to make it possible to directly talk to any of them on a smart speaker without the need for an intermediate skill.
It’s a strategy already playing out on PCs. Amazon’s voice assistant is being more tightly integrated into Windows 10, allowing locked PCs to respond to general queries when someone shouts “Alexa” from across the room. Microsoft’s Cortana is being refocused on interactions with the company’s software and services.
Limp likens his vision for voice assistants to browsers: you can use whatever browser you want to go to whatever website you want, so why can’t you use whatever speaker you want to talk to any assistant you want? “We are a web 1.0 company,” Limp says, “and the reason that this building exists that I’m sitting in right now is a function of the interoperability of the web.”
It’s a very high-minded ideal, but it also may be strategically savvy. Amazon already has a strong position in the home with Alexa, so allowing other assistants to work on its Echo speakers doesn’t seem like a big problem. To be clear, Amazon is committed to allowing that to happen. The company has previously announced that Orange customers in France will be able to buy Echo speakers that support both Alexa and Orange’s assistant Djingo.
However, Alexa has not had as much success on phones, despite several attempts at partnerships with Android manufacturers and headphone makers. An industry-wide initiative where everybody is involved except the three most influential companies in smartphones seems custom-designed to put pressure on those companies. (It also may help Amazon make the case that it’s not monopolistic since it’s so willing to play well with others and open up its voice platform to competitors.)
Whether you see it as altruism or strategic 4D chess, the initiative may put some pressure on Google at least. It has been more reticent to allow Google Assistant to work with other software — though perhaps for reasons related to privacy rather than market dynamics.
When asked specifically about Google, Apple, and Samsung, Limp says that “those three companies, we would love to have part of this initiative.” That makes it sound very much like they have declined, but Limp declined to elaborate on that.
He says that though he’s been talking with other companies about this idea for some time, it was only in the past “six weeks” that it coalesced into something more formal. Knowing how quickly (or slowly, as the case may be) companies like Google and Samsung move, six weeks doesn’t seem like much time. Harman is technically a Samsung subsidiary and Samsung phones already run Bixby and Google Assistant concurrently, so it’s not clear why it hasn’t signed on. As for Apple, well, it is not known for being a joiner.
Google gave a statement to us, noting that it only heard about this initiative over the weekend:
We just heard about this initiative and would need to review the details, but in general we’re always interested in participating in efforts that have the broad support of the ecosystem and uphold strong privacy and security practices.
We are reaching out to Samsung and Apple for comment.
To be clear, Limp won’t cop to believing that this initiative will put pressure on those companies: “If they don’t want to do it, this is not going to change their mind.”
From a technical perspective, there are a thousand questions about implementation, software, privacy, and more that we don’t have answers to yet. The Voice Interoperability Initiative isn’t meant to be a standards body, nor does it seem to be prescriptive about how its members should approach the complicated issues surrounding making a single speaker support multiple assistants at once.
Amazon is giving away its “wakeword engine” for free so that other companies that want to build their own assistants can use Amazon’s research to get started. But companies in the consortium are free to use whatever technology they like.
To date, there haven’t been many devices that can “support multiple simultaneous wake words.” Facebook’s Portal, some cars, and a few Android phones come to mind. More prominent devices, like the Sonos One, make users choose between either Alexa or Google Assistant on a per-speaker basis.
But there’s not really a technical limitation there. Antoine Leblond, vice president of software at Sonos, demoed a Sonos One speaker working with both the “Alexa” and “Hey Google” wake words active for me over a video conference call yesterday. It worked perfectly fine, including Sonos’ “continuity” feature that lets you start music with one assistant then control it with the other.
I tried to pin down Leblond on the reason why this isn’t the way the Sonos One works, as I have several times over the past couple of years. Specifically, since Amazon has repeatedly said it is happy to have Alexa coexist with any other assistant, is Google disallowing it? Leblond demurred, but he did bring up the fact that there are lots of things that could go wrong with two active assistants on a single speaker. For example: if you set an alarm with one assistant and aren’t around when it goes off, how will your family know which assistant to tell to shut up?
Figuring out how to implement multiple assistants from a technical perspective isn’t even the hardest problem. If there’s anything the past year has taught us, it’s that few people realized the full extent to which voice assistants were collecting our data. Rolling scandals have hit Amazon, Google, and Apple over their practices of having human reviewers check the quality of transcriptions. All three have changed course significantly, increasing transparency and making it easier to opt out, delete your data, or both.
A 25-company consortium wanting to make it easy for multiple assistants to coexist doesn’t sound like a great recipe for privacy, either. But Limp emphasizes that he wants to be deliberate with how these systems are structured.
For example, he believes there ought to be strict rules where one assistant would never be allowed to “listen in” on a conversation with another assistant. That seems simple, but there are tougher problems. Should the majority of the work involved in listening to different wake words be handled by hardware or software? When Limp says that he envisions “voice assistants [could someday] collaborate in the cloud in a private way on behalf of customers in a way that preserves context and continuity,” how exactly will that privacy be ensured?
And it gets even thornier: a common issue over the past year has been the realization that these assistants are accidentally recording without hearing their wake word. So in a world where a speaker could have two or a dozen different assistants ready and waiting, what happens to those accidental recordings?
There are no clear answers to these questions yet, six weeks after discussions about forming the initiative got serious, only a commitment to figuring them out. I asked Sonos if there are meetings or contracts or even dues, and the answers were nope, nope, and nope. It’s all very early.
Amazon, especially with Alexa, has a reputation for moving quickly to broaden its ecosystem, sometimes at the expense of clarity or software quality. Just think about the early days (and some more recent ones) of using skills with Alexa, which often require stilted, specific commands. This time around, at least, Amazon doesn’t seem to be rushing.
“We’re five years into this,” Limp says. When he looks at the technical and privacy issues here, he believes that “it’s a tractable problem, but not a trivial problem. It is going to take many, many years to solve.”