The vast storage space the web offers, as well as the ease with which files can be moved or copied, has made archiving public documents far easier in some ways, but it also creates new problems. Paramount among these is link rot: the slow process by which addresses change, breaking hyperlinks from other pages and turning online bibliographies into useless strings of dead URLs. For the past five years, the Chesapeake Digital Preservation Group has documented link rot in four online legal archives, mapping the rate at which links to thousands of digital items become obsolete.
The Chesapeake Digital Preservation Group stores duplicate copies of documents from Harvard, the Maryland State Law Library, and other places, so the study pulls up a list of files and then checks to see if the original links still work. It looks at two sets: one data sample of about 600 that was archived in 2007 and 2008, and another set of 800 created from 2007 to 2012, representing the current state of the archive. When they looked at the latter set, they found that 214, or nearly 26 percent, of the links no longer worked. The most recent ones — those from 2012 — were fine, but link rot apparently began to set in after a year: it was seen in 8 percent of pages stored in 2011 and increased by 5 or 10 percent for every subsequent year. In the list of older links, about 38 percent had stopped working by 2012.
This study is limited to one small section of the web, and almost all the links were from state, government, or nonprofit top-level domains. This means it's not representative of the wider online world, or even of all online archives. Nonetheless, it's a great look at how much the web can change in just a few years and a reminder of the work that goes into making sure information is actually accessible once it's been put online.