Google is updating its Gboard keyboard on Pixel phones with AI-powered offline dictation. The update means users will be able to dictate emails and texts faster and more reliably, says Google, without worrying about whether they’re connected to the internet.
“Imagine you’re walking out of your building and you want to send a message to someone saying ‘I’m running late,’” says Françoise Beaufays, a research scientist and team lead at Google’s speech recognition and mobile input group. “This is exactly the moment where you don’t have connectivity because you’re moving off Wi-Fi towards a cellular plan.” With the upgrade to Gboard, Beaufays tells The Verge, “that problem is not there anymore.”
This might sound like a trivial use case, but Beaufays argues that improvements to speech recognition will slowly revolutionize how we interact with our mobile devices. She notes that although speech recognition has improved in recent years, it’s still an immature technology. It’s computationally intensive, meaning most speech recognition systems have to send data over the internet, and the result is dictation that’s slow and unreliable.
Taking dictation offline makes it more reliable and maybe more popular
“Imagine if you had a keyboard where you couldn’t click on the keys whenever the connectivity is lousy,” says Beaufays. “You just wouldn’t use that keyboard.” But by taking the system offline, she says, dictation will become a more natural choice.
To achieve this transition, Google’s team spent five years investigating the problem and simplifying the AI systems the app uses for voice recognition. For example, while old versions of Gboard’s dictation software use three separate components to model audio waveforms, match sounds with phonemes, and then combine those phonemes into written output, the updated version integrates all of this work into a single step.
The new model also slims down a part of the system known as the “decoder graph,” a component that functions like an index in a book, matching audio waveforms with written words. In the old version of Gboard’s dictation model, this decoder graph was 2GB in size, far too big for on-device processing. The new version, by comparison, is just 80 megabytes, 25 times smaller.
The rollout of this upgrade is limited to American English dictation and Pixel phones for now, but Beaufays suggests it’ll become more widely available in the future — spreading the reach of AI voice recognition. “From a technology viewpoint, I would say we can afford to do this on more phones than just Pixel,” says Beaufays. “I think what will happen is that we will probably be able to launch it on more devices [and] in more languages.”