If you ever wanted to take a look at raw data produced by the Large Hadron Collider, but are missing the necessary physics PhD, here's your chance: CERN has published more than 300 terabytes of LHC data online for free. The data covers roughly half the experiments run by the LHC's CMS detector during 2011, with a press release from CERN explaining that this includes about 2.5 inverse femtobarns of data — around 250 trillion particle collisions. Best not to download this on a mobile connection then.
Despite the intimidating nature of the data, CERN has made it as digestible as possible. The information is available for download in two formats: "primary datasets" used by CERN researchers, and lightweight "derived datasets" intended to be accessed by a wider audience. CERN says the latter "requires a lot less computing power [to process] and can be readily analyzed by university or high-school students." To help with this, the agency has also made software based on its in-house data modeling tool, CernVM, free to download.
An "event display" of a particle collision in the LHC. (Image credit: CERN / CMS)
"Once we’ve exhausted our exploration of the data, we see no reason not to make them available publicly," said CMS physicist Kati Lassila-Perini, who leads the detector's data-preservation efforts. "The benefits are numerous, from inspiring high-school students to the training of the particle physicists of tomorrow. And personally, as CMS’s data-preservation co-ordinator, this is a crucial part of ensuring the long-term availability of our research data."
Releases like this are about more than just CERN's commitment to transparency and data preservation. Back in 2014, 17 terabytes of LHC data covering experiments in 2010 were published, leading to physicists around the world examining aspects of particle collisions that CERN's own researchers had not had time to cover. In a press release, CMS physicist Salvatore Rappoccio said these publications "[provide] a scientific benefit to our field as a whole. While it is a difficult and daunting task with much left to do, the release of CMS data is a giant step in the right direction."