Skip to main content

Watch this Nvidia demo and imagine actually speaking to AI game characters

Watch this Nvidia demo and imagine actually speaking to AI game characters

/

When gaming and AI wholly collide... it’ll hopefully look this good but sound much better.

Share this story

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

At Computex 2023 in Taipei, Nvidia CEO Jensen Huang just gave the world a glimpse of what it might be like when gaming and AI collide — with a graphically breathtaking rendering of a cyberpunk ramen shop where you can actually talk to the proprietor.

Seriously, instead of clicking on dialogue options, it imagines you could hold down a button, just say something with your own voice, and get an answer from a video game character. Nvidia’s calling it a “peek at the future of games.”

Unfortunately, the actual dialogue leaves a lot to be desired — maybe try GPT-4 or Sudowrite next time, Nvidia?

Here’s the entire conversation I hastily transcribed:

Player: Hey Jin, how are you?

Jin: Unfortunately not so good.

How come?

I am worried about the crime around here. It’s gotten bad lately. My ramen shop got caught in the crossfire.

Can I help?

If you want to do something about this, I have heard rumors that the powerful crime lord Kumon Aoki is causing all kinds of chaos in the city. He may be the root of this violence.

I’ll talk to him, where can I find him?

I have heard he hangs out in the underground fight clubs on the city’s east side. Try there.

OK, I’ll go.

Be careful, Kai.

Watching a single video of a single conversation, it’s hard to see how this is any better than picking from a NPC dialogue tree — but the impressive part is that the generative AI is reacting to natural speech. Hopefully Nvidia will release the demo so we can try it ourselves and get some radically different outcomes.

Screenshot by Sean Hollister / The Verge

The demo was built by Nvidia and partner Convai to help promote the tools that were used to create it — specifically a suite of middleware called Nvidia ACE (Avatar Cloud Engine) for Games that can run both locally and in the cloud. The entire ACE suite includes the company’s NeMo tools for deploying large language models (LLMs), Riva speech-to-text and text-to-speech, among other bits.

The demo uses more than just those, of course — it’s built in Unreal Engine 5 with loads of ray-tracing... and it’s visually stunning to the point that the chatbot part feels lackluster to me by comparison. At this point, we’ve simply seen much more compelling dialogue from chatbots, even as trite and derivative as they can sometimes be.

Click for larger screenshot.
Click for larger screenshot.
Screenshot by Sean Hollister / The Verge

In a Computex pre-briefing, Nvidia VP of GeForce Platform Jason Paul told me that yes, the tech can scale to more than one character at a time and could theoretically even let NPCs talk to each other — but admitted that he hadn’t actually seen that tested.

It’s not clear if any developer will embrace the entire ACE toolkit the way the demo attempts, but S.T.A.L.K.E.R. 2 Heart of Chernobyl and Fort Solis will use the part Nvidia calls “Omniverse Audio2Face,” which tries to match facial animation of a 3D character to their voice actor’s speech.

Correction, 11:25PM ET: It was Nvidia’s Jason Paul, not Rev Lebaredian, who answered my question. I regret the error.