Today, you can ask Alexa to turn on the lights or have Siri tell you the temperature in your bedroom, and sometimes they’ll get it right. Or you may hear, “You have 15 devices named lights; which one would you like to control?” or “The current temperature in Kathmandu is 53 degrees.” But what if your voice assistant was not only always accurate but could also respond to nebulous comments like “I’ve had a rough day; what’s a good way to unwind?” with “intelligent” responses? For example, by lowering the shades, dimming the lights, adjusting the thermostat, and queuing up some goodies on Netflix?
That’s the potential of voice assistants powered by new AI language models, according to Alex Capecelatro, co-founder of the Josh.ai home automation system. Josh.ai has already started working on a prototype integration using OpenAI’s ChatGPT. This proof-of-concept video shows Capecelatro asking the Josh assistant to open the shades, turn off the music, and tell him the weather (controlling three things at once is a capability Josh already has). He then goes on to use more natural voice commands for the smart home, like “I’m filming a video; it’s kind of dark in here,” to which the voice assistant responds — slightly clumsily — by turning up the lights in the room.
The possibility of improving smart home control by using AI language models to parse natural language is tantalizing. Capecelatro thinks it’s the future. ”We’re trying to figure out how good we can get it in controlling your environment in a more natural and intuitive way,” he says.
Today, voice assistants usually require precise language and often confuse basic smart home commands with requests for information, which results in frustrating and sometimes useless responses. This was the problem Josh.ai set out to solve when Capecelatro and Tim Gill (founder of Quark) started the company in 2015. Its eponymous voice assistant aims to be excellent at controlling your connected gadgets, no matter how you phrase the request.
“If we don’t adopt ChatGPT-type technology, businesses like mine won’t exist in a year. It is critical to the future of anyone doing voice control in the home.”
Using extensive knowledge graph models, Josh can parse when it hears “satellites” instead of “turn on the lights” and do the appropriate thing. “Open the drapes” may sound like “Get some grapes,” but Josh is smart enough to know you don’t live in a vineyard. “We spend a lot of time working under the hood to fix mishearing, work with different accents, understand imperfect sentences and the like, so even when you say ‘turn on the goddam lights,’ we know what you mean,” says Capecelatro.
Currently, Josh is only available as a voice control layer in custom smart home installations powered by the likes of Crestron, Control4, or Josh’s own standalone smart home control system. In that more protected environment, where the system is set up and largely controlled by a professional installer and using Josh.ai’s proprietary hardware, Josh has built a reputation for being a more reliable, more private voice assistant — albeit with a higher cost of entry. (While there is a cloud component to Josh, most requests are processed locally on the Josh Core or the Josh Micro, and identifiable information is stripped out when using cloud-based APIs, says Capecelatro.)
The company, which recently announced a partnership with Amazon, is now betting big on the new generation of large language models (LLMs) used by ChatGPT and other chatbots. Capecelatro believes that these systems will transform today’s voice assistants into something much more useful. “A year from now, no one’s going to be willing to tolerate the old way that Alexa, Google, Siri, and even Josh, operated. It’s just not going to be enough,” says Capecelatro. “If we don’t adopt ChatGPT-type technology, businesses like mine won’t exist in a year. It is critical to the future of anyone doing voice control in the home.”
For Josh.ai, which doesn’t have the depth of general knowledge that its competitors do, the knowledge base a ChatGPT integration adds to the voice assistant is a huge leap forward. “We’ve always wanted to make Josh as smart as possible, but we’re a small team,” says Capecelatro.
But for the smart home in general, the promise lies in combining the conversational abilities of AI language models with the context a home automation system can provide. For example, by knowing what smart devices you have in your home and details about how you use them, Josh could parse natural language commands into actions in your home. Say, “Hey Josh, it’s almost time for the kids to get home, and it’s getting dark. Can you make sure everything’s ready?” and the voice assistant could switch on the porch lights, start preheating the oven, lower the shades, and turn the lights on in the kitchen, for example.
Josh has also worked to use ChatGPT for media discovery in the smart home. Something that has been a missing link to date. “Voice control is not ideal if you don’t know what you want,” Capecelatro says. “We built out an integration with the Ava remote that you can use to browse the content you want to watch. By adding ChatGPT into the mix, you can say, ‘What are some really good shows on Netflix that are maybe romcoms and feature (this) actor.’ ChatGPT can compile a list and present it to you on the screen of the remote.” That’s family movie night sorted, then.
Josh’s AI upgrade isn’t live yet, and Capecelatro says the company is keeping a close watch on other companies’ burgeoning tech in this space in case they can offer a better model. Besides ChatGPT currently being very slow (the video was edited to speed it up), there is the very real issue of AI generating, well, bullshit. (And the fact that the dataset that ChatGPT was trained on ends in mid-2021. It’s worth noting that in the demo video when Josh is asked, “What are some shows to watch on Netflix,” the newest show it listed debuted in 2019.) But Capecelatro says some form of generative AI voice assistant is coming to the smart home.
Caution is definitely warranted. No company wants a racist, homophobic, homicidal voice assistant spewing its “opinions” into people’s homes through their hardware, a distinct possibility given examples of generative AI essentially regurgitating content with no filter. “We’re being extremely cautious. We could have gone live with the ChatGPT integration immediately,” says Capecelatro. “We’re not doing that. Because we don’t want to give people really bad data. We don’t want to lie.”
It’s not a leap to assume that Google, Apple, and Amazon are all looking at how to incorporate new AI language models into their voice assistants
The company plans to take its time to figure out how to put the right guard rails in place, which will be essential for this technology to translate into the smart home. “I think Microsoft and Google jumped the gun a bit [with their search ChatBot models], and they’re now seeing the consequences,” says Capecelatro of recent high-profile launches that went sideways quickly.
It’s not a leap to assume that Google, Apple, and Amazon are all looking at how to incorporate new AI language models into their voice assistants (heck, maybe Microsoft will bring back Cortana), and smart home enthusiasts have already figured out ways to use Siri Shortcuts to get ChatGPT into their smart home. It’s a lot easier to talk to a smart speaker than type into a web browser.
But do we really want this type of artificial intelligence in our homes? Is our desire for a voice assistant that “just works” so great we’d be happy with one that might also try and teach my eight-year-old about quantum physics? Personally, I think a reliable, voice-controlled smart home system that knows what I mean when I say “Turn off the goddam lights,” is the Holy Grail here, not an omniscient intelligence running my home.
While the promise of an inherently competent, eminently intuitive voice assistant — a flawless butler for your home — is very appealing, I fear the reality could be more Space Odyssey than Downton Abbey. But let’s see if I’m proven wrong.
Updated, Monday February 27, 10:25AM: Added clarifying points about how the Josh.ai. system works.