Skip to main content

What can Python and the New York Times tell us about gender?

What can Python and the New York Times tell us about gender?

Share this story

The New York Times hq (1020)
The New York Times hq (1020)

How much can a handful of Python scripts reveal about our world? Neal Caren, an assistant professor from the University of North Carolina Chapel Hill, decided to see how The New York Times wrote about men and women. When you open a newspaper, how much of it is devoted to discussing each gender? Which topics are "male," and which are "female?" And what does that tell us about the world we live in?

Caren started out by simply identifying words that would indicate the gender of a given person. Instead of working from scratch, he relied on a well-known browser extension called Jailbreak the Patriarchy, which swaps gendered words on a web page ("he" becomes "she," "wife" becomes "husband," and so forth.) From there, he pulled a full week of New York Times articles from late February and early March 2013 — a total of around 1,400 pieces, excluding corrections and paid obituaries. Using his scripts, Caren picked out both some general metrics and specifics about the kinds of words that were used.

'Bank' and 'governor' were disproportionately male words, compared to 'kids' and 'novel' for women

The results were, in many cases, predictable and depressing. For every sentence referencing women, 3.2 sentences referenced men, or a total of around 29,700 versus 6,200. After determining the thousand most common words used in these sentences, he subtracted things like proper nouns, weighted the numbers to compensate for the larger number of male sentences, and found the 50 that were most disproportionately associated with a gender. For men, that included largely sports or political words: "male" sentences got 61 mentions of "governor," for example, compared to 2 mentions for "female" ones. "Baseball," "teammates," "bank," "economy," and "political" also skewed heavily male.

On the female side, words tended to relate to fashion, entertainment, or women's reproductive capabilities. "Memoir," "novel," "fashion," and "singing" were all female-skewed words, as were "gender," "kids," and "abortion." The words "victim," "cancer," and "violence" were also female words — as with many terms, the actual counts were relatively equal, but that meant these words made up a much larger proportion of women's mentions.

For every sentence that referenced women, 3.2 referenced men

"To be honest, I was a little shocked at how stereotypical the words used in the women subject sentences were," Caren writes. His findings, however, are hardly exhaustive. For one thing, the most disproportionately used words weren't the most common words for either men or women, many of which likely showed more overlap between genders. His data was also shaped by current events: "suffrage," one of the more disproportionate words, likely showed up because it was the hundredth anniversary of the Woman Suffrage Parade.

What these scripts do give us is an interesting look at the most specifically masculine or feminine topics we're likely to see covered, whether because of editorial choice or as a reflection of the real world. Caren's work is similar to that of media theorists who have looked at how often women appear in fiction, or real-world comparisons of women's work or wages with those of men. They're a continuing reminder that we often live in a heavily gendered world — and that in many cases, that comes with a deep imbalance of power.