Why’d You Push That Button is back for season 3, and our first episode is a relatively serious one. Vox’s Kaitlyn Tiffany and I catch up on our summers and then dive into everyone’s favorite social media platform: Twitter. We need to discuss tweets. Are they worth deleting, or should we preserve our limited-character history? Who needs to worry about their tweets? What happens if a potential employer searches your Twitter? What will they find?
Kaitlyn and I reflect on our tweet history, and we take it to other users and experts. First, we talk to Max Read, an editor at New York Magazine, and then we chat with Brianna Wu, a woman who ran for Congress this year and was previously a target of Gamergate. Then we talk to Alison Green of the Ask A Manager website / book / podcast universe. (She is Ask A Manager!) And we wrap the show chatting with Mark Graham, director of the Wayback Machine, which attempts to archive the web. It’s true: you could think you deleted a tweet only to discover someone else on the internet has already saved it for you. A truly spooky possibility in the spirit of Halloween.
You can listen to the show below and follow along with Mark’s interview transcript. Feel free to subscribe anywhere you typically get your podcasts. You know our usual places: Apple Podcasts, Pocket Casts, Spotify, Google Podcasts, and our RSS feed. Subscribe your friends, too! Steal their phones and just sign them up for the podcast; they’ll love it.
Ashley Carman: First things first: can you explain what the Internet Archive is and what the Wayback Machine is?
Mark Graham, director of the Wayback Machine: So the Wayback Machine is part of the Internet Archive. The Internet Archive is a 21-year-old nonprofit in San Francisco. And for all of that time, we’ve been backing up large portions of the public web, and for much of that time, making those available through the Wayback Machine.
Ashley: Just to be clear: are tweets involved in the Wayback Machine or just the Internet Archive in general?
So we archive some tweets — not all of them and not even nearly most of them — but we do archive tens of millions of tweets every week, and we archive them from a variety of sources. For example, there’s a service on the Wayback Machine called “Save Page Now,” and anyone can go to web.archive.org, and they can put a URL into the Save Page Now feature, and they can archive that URL.
That’s used actually tens of millions of times a week to archive individual URLs, many of which are tweets. In addition to that, though, we do also archive tweets from various feeds. People have constructed their own lists of tweets to archive, and then there are other feeds that we also follow. In addition to archiving individual tweets, we archive the URLs that are in tweets so that can be a webpage that’s referred to in a tweet. It could be a YouTube video, for example. So as a result of us parsing URLs in tweets, we actually archive several hundred thousand YouTube videos every single week.
Kaitlyn Tiffany: So when people are submitting these URLs from tweets to save, what kinds of tweets are they typically saving?
It runs the gamut. Obviously, there are people who are passionate about a particular person or domain. A subject matter. It may be a politician. It may be a government or an NGO or a celebrity. We don’t know. That process is anonymous where people choose to save things to the Wayback Machine through the Save Page Now function. But the net result of it is that we end up archiving a lot of tweets. You may remember the 2016 election. At one point, Michael Flynn tweeted about Hillary Clinton, a reference to sex crimes with children. This was an actual tweet that he put out during the heat of the election that was accusing Hillary Clinton to be tied up with Pizzagate, basically. That tweet lasted for a while, and then it was removed, but not before it was archived to the Wayback Machine and then available. Available to journalists and others to be able to basically help set the record straight. Help hold people accountable for what they say in public.
Ashley: So do you just have Donald Trump’s Twitter feed on automatic?
Pretty much. That’s an easy one. It’s not just “we,” like lots of people do. We’re part of a community of people all over the world of web archivists. Some of them are supported by governments, and as I said earlier, by libraries, museums, NGOs, individual citizens that are working to preserve the things that they’re passionate about. That they think are most important.
Ashley: What do you think makes an archived URL better than a screenshot of a tweet, for example, or a screenshot of a website or something from the internet?
We archive more than a billion URLs a week here with the Wayback Machine, so a billion screenshots is a pretty tall order. But more importantly, it’s the ability to audit the capture. We refer to these as “captures,” and to maintain all of the associated information about the HTTP request, the headers, for example, the individual timestamps of each element of a given page. So as a result of this — a result of the fact that we’ve been doing this for a couple of decades and we do it in a very open fashion where our systems are well-documented, and we have public APIs, and there’s a lot of history and experience doing what we’re doing — there’s a high amount of confidence in the credibility of what we archive. Such that, many courts in the United States, for example, have ruled that archives from the Wayback Machine are admissible as evidence in courts.
Kaitlyn: Just extrapolating a bit here: that holistic archive of the internet then requires a pretty large-scale culture of participation, right?
It can, but again, I want to bring it back to you. We’re talking about the web here, not really the internet, right? Although we are working to expand into other protocols and other kinds of platforms. But it is a collaborative effort. That’s true. I highlighted the Save Page Now feature of the Wayback Machine, but of the more than billion URLs that we archive every week, those that are initiated via Save Page Now by end users, really crowdsourced, that’s only a few tens of millions. There are many, many other hundreds of millions of URLs that we archive as a result of a series of processes that we’ve evolved over the decades. But we also have our subscription service at Internet Archive called Archive-It. And there are more than 600 partners. These are museums, libraries, governments, and others that use the archivist service to archive URLs from lists that they have handcrafted.
Ashley: Can you explain the difference between the internet and the web?
So the web is, generally speaking, what you get via a browser, and it rides on top of a lower-level infrastructure called the internet.
Kaitlyn: A little bit of what we talked about with our other guest was when the concept of an archive is almost weaponized. Specifically, during Gamergate, when people were going very, very far back in someone’s tweets in order to dig up something that they could use, devoid of its original context in order to make semi-public figures look really bad. Is that something that you think about or discuss, or how do you feel about that?
Sure, obviously we think about trying to help set the record straight. And a big part of that is context. So we care about the ability for people to be able to understand what they see on the web within an appropriate context. And that context may involve other tweets, other web-based resources, adaptions, webpages, etc. In fact, we have a project here at the Internet Archive to begin to stitch some of these components together.
We’ve created a Chrome extension, but today, it’s used primarily to help people archive webpages and then replay pages that they’ve archived. We’ve recently added some new features to this extension that, for example, allow someone to be on a webpage and then see the tweaks that are made about that webpage. And we’re also working with organizations that are in the fact-checking space. So if someone is looking at a tweet or a webpage, we want to make it easier for people to get more context about what they’re looking at and maybe there’s been some analysis done on that tweet or that webpage and someone has written up some background for it. So context is important and certainly something that as we rely more and more on getting information from the web and also from the internet at large about what’s really important in the world, being able to see these things within a larger context becomes more and more critical.
Ashley: So it sounds like, for you, a large part of this archive endeavor is about keeping people who are in power honest and accountable. Do you think that archiving what regular folks on the internet say is important? Do you think their tweets should be archived and why?
It’s hard to say before the fact. Like, I don’t know, you may do something really important in the world at some point. Something that you write might be critical for our understanding of what that is, but I guess as a practical matter, no, probably not.
Kaitlyn: What about you, personally? Do you tweet, and if you do, do you feel obligated to never delete?
I do tweet. @markgraham is my Twitter handle. No, I don’t feel obligated to never delete. I make spelling errors sometimes, and it’s embarrassing. And so sometimes I will just quickly delete a tweet, but it’s okay. I try to be open, and if I make a factual claim that I later learn is false, I’ll go back and attempt to set the record straight. But I don’t know, that’s just me. I mean, everyone’s going to be different about that. We don’t want to create a platform where we say, “Everything that you say is part of the record, and it could be taken out of context and held against you.”
Maybe that’s why I actually think it’s important that we attempt to record things and facts, so that the context can be there for review. You can look at a whole body of work. The other thing I’ll say about Twitter, in particular, is we’re talking about individual people and individual tweets, but the bigger picture here is all of the tweets. The tweets that are malicious. The tweets that are manufactured. The tweets that are part of what’s referred to as computational propaganda. They’re made by organizations that are attempting to influence some outcome in the world in a completely manipulative kind of way, and in order to more fully understand what’s going on, what’s happening in this incredible environment called Twitter, one has to be able to see large numbers of tweets and be able to see them within context. To be able to see, for example, where a tweet originates; what the history of that Twitter account has been; what other tweets they’ve made; how those tweets are being propagated and amplified throughout a network; of those amplification points, are they human beings or are they other known robotic entities; and who’s behind them; etc.
So this is something where we’re even now just beginning to understand what happened in 2016, for example, around this. There are a lot of reasons to have the ability to go back and look at what happens in social media that aren’t obvious in the moment, but become obvious and important after the fact. And since the Wayback Machine is, unfortunately, not a real time machine, we can’t go back and get stuff we didn’t get, so we’re going to continue to do the best job we can, day in and day out, to get as much as we can of the things that we think are probably going to be important going forward.