Skip to main content

Here’s how Google’s experimental 3D telepresence booth works

Here’s how Google’s experimental 3D telepresence booth works

/

An 8K screen, four GPUs, four microphones, and a whole bunch of cameras

Share this story

Google’s promotional video for Starline released in May.

In a new research paper, Google has detailed the tech behind its impressive Project Starline demo from this year’s I/O conference. Project Starline is essentially a 3D video chat booth that aims to replace a one-on-one 2D video conference call with an experience that feels like you’re actually sitting in front of a real human being. 

It sounds simple, but Google’s research paper highlights just how many challenges there are in tricking your brain into thinking there’s a real human being sitting just a few feet away from you. Obviously the image needs to be high resolution and free of distracting artifacts, but it also needs to look correct from your relative position in the booth. Audio is another challenge, as the system needs to make it sound like a person’s words are coming from their actual mouth. And then there’s just the small matter of eye-contact.

But, eventually, the hope is that Project Starline could offer a similar feeling of presence as virtual or augmented reality, without users needing to wear bulky headsets or trackers.

An image from Google detailing the sensors included in Project Starline.
The display unit and its various tracking hardware.
Image: Google

The paper details exactly how much hardware is needed to start to solve these problems. The system is built around a large 65-inch 8K panel which runs at 60Hz. Around it, Google’s engineers have arranged three “capture pods’’ which are capable of capturing both color imagery and depth data. The system also includes four additional tracking cameras, four microphones, two loudspeakers, and infrared projectors. In total, color images from four viewpoints are captured, as well as three depth maps, for a total of seven video streams. Audio is captured at 44.1 kHz, and encoded at 256 Kbps.

Obviously all of this hardware generates a lot of data that needs to be transmitted, and Google says that transmission bandwidth ranges anywhere from 30Mbps up to 100Mbps, depending on “the texture detail in the user’s clothes and the magnitude of their gestures.” So it’s significantly more than a standard Zoom call, but nothing a typical office in a metropolitan area couldn’t handle. Project Starline is equipped with four high-end Nvidia graphics cards (two Quadro RTX 6000 cards and two Titan RTX’s) to encode and decode all this data. End-to-end latency reportedly averages 105.8 milliseconds.

The system is made up of a backlight unit and display unit.
The system is made up of a backlight unit and display unit.
Image: Google

The way Google’s research paper tells it, employees who’ve used Starline across the three sites where it’s been installed think it beats traditional videoconferencing when it comes to creating a feeling of presence, personal connection, as well as helping with attentiveness, and reaction-gauging. The company says that over nine months, 117 participants held a total of 308 meetings in its telepresence booths, with an average meeting time of just over 35 minutes.

It all sounds very promising, but as yet there’s no indication of when, or even if, the system might one day be commercialized. There’s also very little information about how much Starline’s extensive array of hardware will cost in reality (although Table 4 in the research paper outlines the tracking and display hardware it uses, if you fancy doing some math). For now, Google says it’s expanding Project Starline’s availability “in more Google offices around the United States.”