Skip to main content

Nvidia shows off AI model that turns a few dozen snapshots into a 3D-rendered scene

Nvidia shows off AI model that turns a few dozen snapshots into a 3D-rendered scene


Starring: a would-be Andy Warhol

Share this story

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

From 2D to 3D with the help of AI.
From 2D to 3D with the help of AI.
Image: Nvidia

Nvidia’s latest AI demo is pretty impressive: a tool that quickly turns a “few dozen” 2D snapshots into a 3D-rendered scene. In the video below you can see the method in action, with a model dressed like Andy Warhol holding an old-fashioned Polaroid camera. (Don’t overthink the Warhol connection: it’s just a bit of PR scene dressing.)

The tool is called Instant NeRF, referring to “neural radiance fields” — a technique developed by researchers from UC Berkeley, Google Research, and UC San Diego in 2020. If you want a detailed explainer of neural radiance fields, you can read one here, but in short, the method maps the color and light intensity of different 2D shots, then generates data to connect these images from different vantage points and render a finished 3D scene. In addition to images, the system requires data about the position of the camera.

Researchers have been improving this sort of 2D-to-3D model for a couple of years now, adding more detail to finished renders and increasing rendering speed. Nvidia says its new Instant NeRF model is one of the fastest yet developed and reduces rendering time from a few minutes to a process that is finished “almost instantly.”

As the technique becomes quicker and easier to implement, it could be used for all sorts of tasks, says Nvidia in a blog post describing the work.

“Instant NeRF could be used to create avatars or scenes for virtual worlds, to capture video conference participants and their environments in 3D, or to reconstruct scenes for 3D digital maps,” writes Nvidia’s Isha Salian. “The technology could be used to train robots and self-driving cars to understand the size and shape of real-world objects by capturing 2D images or video footage of them. It could also be used in architecture and entertainment to rapidly generate digital representations of real environments that creators can modify and build on.” (Sounds like the metaverse is calling.)

In a paper describing the work, Nvidia’s researchers said they were able to export scenes at a resolution of 1920 × 1080 “in tens of milliseconds.” The researchers also shared source code for the project, allowing others to implement their methods. It seems NeRF renders are progressing quickly, and could start having a real-world impact in the years to come.

Update March 25th, 15:50PM ET: Updated story with link to research paper and source code.