We saw Spot run, jump, and even dance... but now we can see Spot talk. In a somewhat unsettling video posted by Boston Dynamics, we see its robot dog outfitted with a top hat, mustache, and googly eyes as it chats with staff members in a British accent, taking them on a tour of the company’s facilities.
“Shall we commence our journey?” Spot asks. “The charging stations, where Spot robots rest and recharge, is our first point of interest. Follow me, gentlemen.” As shown in the demo, Spot is capable of answering questions and even opens its “mouth” to make it seem like it’s actually speaking.
To make Spot “talk,” Boston Dynamics used OpenAI’s ChatGPT API, along with some open-source large language models (LLM) to carefully train its responses. It then outfitted the bot with a speaker, added text-to-speech capabilities, and made its mouth — er... gripper — mimic speech “like the mouth of a puppet.”
Matt Klingensmith, the principal software engineer at Boston Dynamics, says the team gave Spot a “very brief script” for each of the rooms at its facilities. The bot then combined that script with the imagery it gets from the cameras on its gripper and body, allowing it to “get more information about what it sees before generating a response.” According to the company, Spot uses Visual Question Answering models to essentially caption images and answer questions about them.
“Generator hums low in a room devoid of joy. Much like my soul.”
The “fancy butler” is not the only persona Spot assumes during the video. The four-legged bot also takes on the personality of a 1920s archaeologist, a teenager, and a Shakespearean time traveler. It even assumes a sarcastic personality, which, when asked to come up with a haiku, said: “Generator hums low in a room devoid of joy. Much like my soul.”
Boston Dynamics says it uncovered a few surprises when experimenting with Spot as a tour guide. In one instance, the team asked Spot who its “parents” were, and it went over to where the older Spot models are displayed in the company’s office. The company also notes that it still ran into some instances where the LLM made things up, such as suggesting that Stretch, its robot designed to move boxes, was made for yoga.
“We’re excited to continue exploring the intersection of artificial intelligence and robotics,” Klingensmith writes in a post on Boston Dynamics’ site. “These models [LLMs] can help provide cultural context, general commonsense knowledge, and flexibility that could be useful for many robotics tasks — for example, being able to assign a task to a robot just by talking to it would help reduce the learning curve for using these systems.”