Skip to main content

How Microsoft built its smart Surface camera

Microsoft always wanted to upgrade the Surface Hub 2 camera

Share this story

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

“From day one of Surface Hub 2, we knew we were going to make our cameras smart,” explains Steven Bathiche, who oversees all hardware innovation for Microsoft devices, in an interview with The Verge. Microsoft’s surprise $799.99 Surface Hub 2 Smart Camera debuted last week, offering automatic reframing without the warping and distortions you might typically see on other conference room cameras.

It can detect faces and bodies, in an effort to make sure everyone in a room is visible during meetings whether they’re close to the camera or up to eight meters away. The Surface Hub 2 Smart Camera is able to pretty much see an entire conference room thanks to its 136-degree field of view, which keeps the people at the front in focus alongside those in the back.

Microsoft had always planned to upgrade its Surface Hub 2 camera before the pandemic put hybrid meetings into focus, that’s why it’s modular and can be detached from the top of the 55- or 85-inch displays. “We knew we were going to evolve the experience. We didn’t know exactly how, but we knew that was going to change and needed to change with people’s needs, the evolution of the conference room, and even how our culture will essentially adapt towards meetings,” says Bathiche.

Large devices like the 85-inch Surface Hub 2 presented challenges for capturing everyone in a meeting room with a traditional camera. “We needed a camera to handle bigger rooms,” says Bathiche, so Microsoft got to work.

Bathiche and his team have created Microsoft’s own optics, AI model, and edge computer to go into the Surface Hub 2 Smart Camera and power its computational photography. “It has onboard compute, 1 teraflops of compute that essentially houses a really large AI model that we’ve built,” says Bathiche. “It includes the autoframing application, it resides in the camera, so what comes out is just a 4K image so it literally looks like a webcam to the Surface Hub.”

That means that all of the AI work is done on the camera itself, and never sent to the cloud or even over the wire to the Surface Hub 2 to process. The camera runs the AI model, processes all the data, and makes the decision to crop the image accordingly. While the automatic framing can capture everyone in a room automatically, the Smart Camera will also use tilt compensation to adjust the image for the camera position and create more natural eye contact instead. It’s also able to remove the fish eye effect from wide-angle lenses so people don’t look distorted or stretched inside meeting rooms.

“We designed an 11 element, completely glass lens with super sharp focus and basically close to the refraction limits,” explains Bathiche. Behind the lens is a 12-megapixel sensor (4000 x 3000) with an f/1.8 aperture that all generates the 4K cropped image. “The actual lens is a 184-degree field of view, so the camera can look behind itself.”

Microsoft built custom parts for its Surface Smart Camera.
Microsoft built custom parts for its Surface Smart Camera.
Image: Microsoft

All of this hardware is nothing without the AI models that power the Surface Smart Camera, though. Microsoft started this project before the pandemic, but it had to train its AI models during the pandemic, which presented the obvious challenges of filling meeting rooms with people.

“We went to New Zealand because they had zero COVID-19 cases and we had offices there,” explains Bathiche. “We hired actors and actresses to do data collection in all sorts of rooms. Our data set is absolutely massive.”

Microsoft trained its AI model on faces and bodies to ensure it’s fully inclusive and will detect people that aren’t always facing the camera. It even used synthetic people and faces to improve its diversity across situations and people. “We have a really cool internal technology that can generate synthetic data, so we were able to generate synthetic people and faces,” adds Bathiche.

The Smart Camera isn’t trained to detect pets or animals, though. So that should mean it won’t try to automatically reframe a meeting if an office cat or dog steps into view. Microsoft has also applied its responsible AI principles to this project, which include a committee and set of tools to ensure the fairness and inclusiveness of AI.

The Surface Smart Camera has on-board compute.
The Surface Smart Camera has on-board compute.
Image: Microsoft

“If you look at our data set, it’s absolutely amazing across the board in terms of disparity between the different groups: race, gender, skin tone, hair styles, etc,” explains Bathiche. “I think one of the things that’s built into the camera that people might not see on the box is that robustness and inclusiveness that the model has.”

Bathiche says Microsoft has “sat there and tuned the heck” out of the autoframing capabilities of its Smart Camera over the past year to make sure it’s not too jumpy or too slow to miss content. “Every frame that the camera gets, it decides whether it’s worth it to move or recrop the image.”

You might be wondering if you could use this $799.99 camera on a regular Windows PC, but it’s not quite that simple. While all the compute and AI models are housed inside the Surface Hub 2 Smart Camera, it’s not really designed to be a regular webcam. “Its design point was specifically for Hub. The elevation, the angles, and the AI was designed for multiple people near and far,” explains Bathiche. “While you could technically design a mount and plug it into a PC, I don’t think it’ll work as good as you want it to.”

This isn’t the first time Microsoft has focused on improving its webcams and cameras, either. The Surface Pro X already has an AI-powered eye contact feature that makes it appear like you’re always making eye contact no matter what you’re looking during a video call. Apple added a similar FaceTime Attention Correction feature to iOS 13. “The algorithms we used in eye contact [for the Surface Pro X] are the same algorithms for the faces that we use inside this camera,” says Bathiche.

Microsoft clearly designed this Smart Camera for the Surface Hub 2, but with persistent rumors around Surface-branded webcams, it’s possible we’ll see a powerful webcam from Microsoft one day instead of the affordable ones that exist today. “This area of using computation to bring people together and make people feel like they’re in the same room... I think is something we’ve always been passionate about and will continue being passionate about, and we’ll continue evolving our devices as you see in the Surface Pro X,” says Bathiche.