Skip to main content

Waymo is making some of its self-driving car data available for free to researchers

Waymo is making some of its self-driving car data available for free to researchers


12 million 3D labels, 1,000 driving segments, and a partridge in a pear tree

Share this story

Photo by Vjeran Pavic / The Verge

The data collected by self-driving cars used to be a closely guarded secret. But recently, many companies developing autonomous driving systems have begun to release their data to the research community in dribs and drabs. The latest to do so is Waymo, the self-driving unit of Alphabet, which today is making some of the high-resolution sensor data gathered by its fleet of autonomous vehicles available to researchers.

Waymo says its dataset contains 1,000 driving segments, with each segment capturing 20 seconds of continuous driving. Those 20-second clips correspond to 200,000 frames at 10 Hz per sensor, which will allow researchers to develop their own models to track and predict the behavior of everyone using the road, from drivers to pedestrians to cyclists.

“To me, it’s a bit of a labor of love.”

The data was collected by Waymo’s fleet from four cities: San Francisco, Mountain View, Phoenix, and Kirkland, Washington. It includes images captured by each vehicle’s sensors, which includes LIDAR, cameras, and radar. Those images with vehicles, pedestrians, cyclists, and signage have been carefully labeled, presenting a total of 12 million 3D labels and 1.2 million 2D labels.  

“To me, it’s a bit of a labor of love,” Drago Anguelov, Waymo’s head of research, said in a briefing with reporters on Tuesday. “I think that also creating such a data set is a lot of work. And it takes many months to label the data, ensure that all the relevant parts are to the highest standards that one expects, making sure that the right utilities are available for researchers to be able to make progress without being hamstrung.”

Waymo isn’t the first company to release an open dataset. In March, Aptiv became one of the first large AV operators to publicly release a set of its sensor data. Uber and Cruise, the autonomous division of General Motors, have also released their AV visualization tools to the public. In June at the Computer Vision and Pattern Recognition conference in Long Beach, Waymo and Argo AI both said they would release their datasets, too. Argo’s is out. Today’s announcement is Waymo making good on that promise.

Waymo also claims its dataset is more detailed and nuanced than those released by other companies. Most of the previous datasets have been limited to camera data. Aptiv’s NuScenes dataset included LIDAR and radar data in addition to camera images. Waymo is providing data from five LIDAR sensors, compared to just one in the Aptiv data set.

And this is just the first step; Anguelov says Waymo intends to release further data sets in the future.