clock menu more-arrow no yes

Filed under:

Here’s how your favorite classic novel made a computer feel

New, 1 comment

Where big data meets high-school English class — plus Harry Potter

Juan Antonio F. Segal / Flickr

As anyone who’s wasted hours reading about plot devices on TV Tropes can attest, breaking fiction into its constituent parts is strangely satisfying. Combine this feeling with the cultural curiosity and anxiety around creative machines, and you’ve got something delightfully addictive: a data mining program that can express Shakespearian tragedy in the form of a line graph.

The data is part of a larger series on happiness at Hedonometer. In this particular happiness-measuring project, which MIT Technology Review discusses in detail, the researchers were trying to break down stories into basic narrative categories like rags-to-riches tales or tragedies. Their system scans prose and plays for positive and negative words, then balances the two categories to create an emotional timeline. In Romeo and Juliet, for example, "weep" and "hurt" suggest sadness, while "love" and "friend" push the needle toward happiness. Unsurprisingly, it thinks Romeo and Juliet starts fairly happily, offers a brief high, and then rolls straight down to its bitter climax.

"Love" and "friend" are upbeat, "weep" and "hurt" are tragic

The overall results, based on 1,737 popular titles from Project Gutenberg, are interesting and worth a look — even if they’re aiming at an incredibly amorphous target and require a lot of caveats. But by far the best part is that there’s a searchable interactive database containing every single entry. What does a computer program think about Frankenstein, Tess of the d'Urbervilles, or Hamlet? Now is your chance to find out.

Our first impulse at The Verge was to try to stump the tool, which seems eminently plausible — trying it out on unreliable narrators, avant-garde metafiction, and (in a long shot) Donald Trump’s Twitter account. But since our examples were too recent for Project Gutenberg’s public domain library, we settled for looking up all the books we vaguely remembered reading in high school and college.

Dracula was my first pick because as I recall, it’s both eventful and fairly straightforward. The chart bears this out to some extent — maybe the first big dip is our introduction to vampires, and the second is a particularly plot-relevant biting incident?

Dracula Emotional Story Arc

I barely remember Jane Eyre at all, but this graph makes it look fascinating. It’s nearly a sine wave of pronounced emotional highs and lows, leading up to a respectably happy ending:

Jane Eyre Emotional Story Arcs

My colleague Kaitlyn Tiffany pointed out that lumping together scripts and novels might be a mistake — the former might have less exposition surrounding the events, and they don’t necessarily follow the same dramatic structures. But what’s really odd is the occasional nonfiction book that sneaks in. Apparently John Stuart Mill’s political treatise On Liberty has a pretty smooth arc:

On Liberty Emotional Story Arc

Most of the charts at least roughly match my memories, but occasionally you’ll get something that doesn’t fit at all. Every problem in A Christmas Carol gets quickly resolved in the end, for example. But according to the emotional arc generator, it’s rather tragic:

Christmas Carol Emotional Story Arc

"We're just looking at bigger groups of words, so we'll miss events that are mentioned only in one sentence where the surrounding text does not reflect the same sentiment," says the study’s lead author, Andrew Reagan. Maybe A Christmas Carol’s ending was a little too quick.

The other wrinkle is a slider called the "lens" at the bottom. As Reagan explained to us, this controls which words the system uses to make its decision. The 1 to 9 scale rates words from negative to positive, and by default, it’s set to only include the particularly "strong" words on each end of the scale. As you pull the sliders all the way to the center, you’ll change the shape of the chart by adding in more neutral and potentially less useful words, including the word "the," rated right in the middle. "The neutral words tend to dampen the sentiment detection," Reagan tells us. You can also click the bars on the right-hand corner if you’d like to look at the specific words that influence a section’s score.

Obviously, fiction is a lot more complicated than this

Fiction is a lot more complicated than a happy/sad binary. But the project is still a fantastic little tool for examining stories in a new way.

And finally, one bonus from outside Project Gutenberg: the entire Harry Potter series, book by book or plotted in one graph.

Harry Potter Story Arcs