Skip to main content

Aggregating Twitter data to answer the age-old question: soda or pop?

Aggregating Twitter data to answer the age-old question: soda or pop?

/

Twitter's Edwin Chen aggregates data from the site to map one of the most well-known examples of US linguistic difference: how people refer to soft drinks.

Share this story

As more people move their conversations online and into the public eye via social networks, gathering information about regional dialects becomes easier than ever. Twitter data scientist Edwin Chen collected and analyzed tweets across the world to map one of the most well-known markers of linguistic difference: how people refer to soft drinks. First, he sampled messages tagged with a location, filtering them for "soda," "pop," or "coke." Then, he used other words in the sentence to make sure they referred to drinks ("drink a pop," for example) and to filter out specific references to Coke as a brand. Lastly, he aggregated the tweets by location and mapped them.

The results are much the same as in similar older work.. In the US, "pop" is used most in the Midwest, "soda" is used on the coasts, and "coke" is used in the South. Internationally, "coke" is most popular; Chen's also run some numbers for terms like "fizzy drink" worldwide, though all these charts will be most valid for English-speaking countries. What's particularly interesting, though, is comparing the methodology and results to the popular Pop vs. Soda Page, which relied on users submitting data to the site. Twitter linguistics isn't the future of all regional dialect mapping, but it provides a wealth of data for quick analysis.