In a restaurant in Mountain View, California yesterday, Google gave several small groups of journalists a chance to demo Duplex. If you don’t recall, Duplex is the AI system designed to make human-sounding voice calls on your behalf so as to automate things like booking restaurant tables and hair appointments. In the demo, we saw what it would be like for a restaurant to receive a phone call — and in fact, each of us in turn took a call from Duplex as it tried to book a reservation.
The briefings were in service of the news that Google is about to begin limited testing “in the coming weeks.” If you’re hoping that means you’ll be able to try it yourself, sorry: Google is starting with “a set of trusted tester users,” according to Nick Fox, VP of product and design for the Google Assistant. It will also be limited to businesses that Google has partnered with rather than any old restaurant.
The rollout will be phased, in other words. First up will be calls about holiday hours, then restaurant reservations will come later this summer, and then finally, hair cut appointments will be last. Those are the only three domains that Google has trained Duplex on.
The demos we saw had many of the same elements that made the original demonstration at Google I/O so impressive: the voice sounded much more human than normal, complete with “umms” and “ahhs.” It also featured something we didn’t hear in May: each call started with an explicit statement that the call was being recorded.
There were a few variations on the disclosure, but they all included some indication that you were talking to a machine and the call was being recorded. For example, one call began with “Hi, I’m calling to make a reservation. I’m Google’s automated booking service, so I’ll record the call. Uh, can I book a table for Sunday the first?”
A few things to note about that call. The voice sounded just as natural as in the video above, not at all like a robot. There were several variations on the robot disclosure — Google seems to be testing to see which is most effective at making people feel comfortable sticking with the call. The other thing to know is every variation I heard definitely said that it was recording, usually followed by a quick “umm” before jumping in to making a request for the reservation.
The more natural, human-sounding voice wasn’t there in the very first prototypes that Google built (amusingly, they worked by setting a literal handset on the speaker on a laptop). According to VP of engineering for the Google Assistant Scott Huffman, “It didn’t work ... we got a lot of hangups, we got a lot of incompletion of the task. People didn’t deal well with how unnatural it sounded.”
Part of making it sound natural enough to not trigger an aural sense of the uncanny valley was adding those ums and ahs, which Huffman identified as “speech disfluencies.” He emphasized that they weren’t there to trick anybody, but because those vocal tics “play a key part in progressing a conversation between humans.” He says it came from a well-known branch of linguistics called “pragmatics,” which encompasses all the non-word communications that happen in human speech: the ums, the ahs, the hand gestures, etc.
“Google has invented a lot of things” Huffman said, “but we did not invent ums and aahs.”
If you take a Duplex call and want to take that initial “um” as an opportunity to say “yeah no, I don’t want to be recorded,” Duplex can recognize that and end the call with something like “‘Okay, I’ll call back on an unrecorded line’ and then we have an operator just call back,” Fox says.
There are a few states where Duplex won’t work — Fox says Google doesn’t yet have the permitting for Texas, for example — but it should start making calls in the vast majority of the US soon. Duplex only works in English for now, but Google has worked to ensure that it is able to understand lots of dialects and accents.
Fox emphasized that the behavior of Duplex arises out of Google’s recently published “core AI principles.” “We’re going to be very slow, very careful, and very thoughtful as we go here,” said Fox. That’s part of the reason why the initial testing will only be with businesses that Google has partnered with. Google will also allow businesses to opt-out of being called by Duplex — likely through the “Google My Business” portal. Of course, if you’re the sort of business that doesn’t have online bookings, you’re probably the sort of business that has never identified yourself to Google using its tools.
”We want to be very respectful of the businesses we’re working with,” Fox stressed. Google will ensure that businesses won’t receive too many calls from Duplex — say, for example, from people who might use it to prank restaurants with fake reservations.
When you set up Duplex on your Google Assistant, you’ll give it a few pieces of information and some permissions. In one call, Duplex told the human on the phone it wasn’t authorized to share an email address but could share a phone number, for example.
As for haircuts, Fox noted that Google hadn’t worked out all the details of what Duplex would need to know, but he imagined a case where at the least it might be able to ask for something like your “usual” haircut.
Duplex conveyed politeness in the demos we saw. It paused with a little “mmhmm” when the called human asked it to wait, a pragmatic tactic Huffman called “conversational acknowledgement.” It showed that Duplex was still on the line and listening, but would wait for the human to continue speaking.
It handled a bunch of interruptions, out of order questions, and even weird discursive statements pretty well. When a human sounded confused or flustered, Duplex took a tone that was almost apologetic. It really seems to be designed to be a super considerate and non-confrontational customer on the phone.
But Duplex can’t handle everything, and so it will be paired with a bank of human operators who can take over a call if it goes sideways. Valerie Nygaard, product manager for Duplex, emphasized that “this is a system with a human fallback.” Those operators serve two purposes: they handle calls that Duplex can’t complete and they also mark up the call transcripts for Google’s AI algorithms to learn from.
None of the phone calls we listened to required human fallback, however. Huffman says that right now four out of five calls that Duplex makes can be handled without the human operator. That’s either a very low or very high percentage, depending on your attitude toward the technology.
If you’re skeptical of all this, I don’t blame you. Google got quite a bit of blowback after its Google I/O demo, both from people who were wondering about disclosure and from those who thought it might have not been a real call. Fox insists that it was, though it was edited to take out “personal information.”
Another reason to believe the demos we saw in that Mountain View restaurant were real? My own demo totally flopped. I played the role of a busy, crabby bartender who kept interrupting Duplex. The system handled the interruptions fine, but it got flummoxed when I told it that I could go ahead and make a reservation for seven, but the full kitchen closed at six that day so it would have to settle for bar food.
My call should have been handed off to a human operator at that point, but instead Duplex misunderstood my meaning about the kitchen closing. When I said there would only be bar food in a harried and snippy tone, it replied “Oh I see. Bye, thank you,” and hung up.
It was a very human thing to do.