clock menu more-arrow no yes

Filed under:

Researchers develop Bookworm-Arxiv searchable scientific journal database

New, 1 comment

Researchers at Harvard and Princeton are working on a new breed of tools to help us categorize and analyze our collective body of scientific literature and other written works.

Paper book 1024 marginalia
Paper book 1024 marginalia

Moving our literature and written content to the digital realm brings with it tremendous possibilities in terms of access and collective insight, and the New York Times takes a look at several sets of individuals working on tools to let us take advantage of the shift. At Harvard, a group called The Cultural Observatory is preparing to deploy a searchable database of over 740,000 scientific papers. The system, called Bookworm-Arxiv, allows users to search for a handful of keywords, and then returns a graph of the usage of those terms over time, providing insight into the origins and dissemination of certain schools of academic thought. Even better, because the repository itself is free and open, users can drill down into the various papers themselves for instant access. The Cultural Observatory previously worked with Google on the Ngram Viewer, which provides similar search capabilities for Google's vast array of scanned books, but given copyright concerns Ngram Viewer cannot provide access to many texts, limiting its overall utility.

The founder of Arxiv is also incorporating "probabilistic topic modeling," courtesy of Princeton University's David M. Blei. Blei helps craft computer algorithms that digest massive amounts of literature, books, and scientific journals to determine their focus and themes. Blei sees these new breed of tools as ones that can help us gain greater understanding from our own collective written output than we could on our own. "We don't have the human power to read and tag all this information," Biel told the Times. "Human categorization can only go so far."