Archiving the Internet in the Trump Era
On a recent Friday afternoon, around 70 librarians, programmers, and graduate students lined up for tacos in the basement of San Francisco’s Internet Archive. The gathering was a weekly custom, a chance for the curious to tour the digital library born in the early days of the web. Today, the lunch was filled to capacity. Donald Trump’s victory spawned dozens of “guerrilla archiving” events across the country, organized by university professors, scientists, and computer engineers who feared what might happen to research living on government websites once the new president took charge. Since the election, more people had been calling and emailing and visiting the Internet Archive than anyone could remember.
In the middle of the room, three couches encircled a wide flat-screen monitor. On a cluttered side table: two Nintendo Game Boys from the early 1990s, a stack of game cartridges, and a 51-year-old copy of The Radio Amateur’s Handbook. Old hard drives, VCR players, and audio equipment were piled up on metal shelves. Near the windows, a long banner commemorated the day the archive backed up ten petabytes, or 10,000,000,000,000,000 bytes, of web pages.
A staff member rang a cowbell to signal it was time to eat. Then introductions, and an icebreaker: What was the most embarrassing web address or username you ever owned?
The Internet Archive has been saving copies of the web for almost as long as the web has been around. Brewster Kahle, the archive’s founder, studied artificial intelligence at mit in the 1980s. Later he helped found two technology companies — Wide Area Information Server, a system for text-searching databases on remote computers, which was bought by aol, and Alexa Internet, which helped catalog the web and was acquired by Amazon. Kahle launched the Internet Archive in 1996, in a San Francisco attic. Over the years, a few computers have blossomed into one of the largest digital libraries in the world, encompassing 279 billion web pages, 12 million books, and millions more copies of music, films, television shows, and software. (In the lobby, a new arcade machine lets visitors play 500 vintage games from the past 40 years.)
After lunch, a small crowd gathered around Mark Graham, a cheery man wearing a yellow button-down and khakis. Graham is the director of the Wayback Machine, an online tool the archive launched in 2001 that captures websites as they existed in a particular moment, allowing them to be viewed later — like a time capsule. Graham led the group up a set of gold-carpeted stairs and into the great room, a vast space with a domed ceiling and tall, sun-filled stained glass windows. When Kahle bought the building from the Church of Christian Scientists in 2009, he took it as fate that it resembled the archive’s logo: a classical Greek facade, meant to symbolize the group’s mission as a temple for “universal access to all knowledge.”
Graham directed the group’s attention to the back of the room, where six stacks of black computer servers hummed along, lights blinking — most of the archive’s collection is stored there. A typical web page lasts 100 days before being altered or deleted, Graham explained, and it was this ephemeral nature of the web that inspired the archive to begin preserving it.
On the day after the election, Kahle published a blog post addressed to the Internet Archive’s supporters. “I am a bit shell-shocked — I did not think the election would go the way it did,” he wrote. “As we take the next weeks to have this sink in, I believe we will come to find we will have new responsibilities, increased roles to play, in keeping the world an open and free environment.”
The archive had already started backing up copies of every government website that existed during the Obama administration — a practice they began at the end of George W. Bush’s presidency. And this January, they released a searchable database containing 520 hours of Trump’s televised speeches, interviews, and news broadcasts.
Still, they were not prepared for the spike in public attention after Trump’s election. A few days after the inauguration, Reuters reported that White House officials had ordered the Environmental Protection Agency to take down its climate change page. People sent messages to the archive, asking if they planned to preserve the information. Similar questions came when the Department of Agriculture abruptly removed thousands of documents from its website, including animal welfare inspection records for some 9,000 labs, zoos, and breeders across the country. “We have all that,” Graham said. Lately the archive has started receiving phone calls from people claiming to have inside information about government websites under threat of getting scrubbed.
Kahle once said that the entire internet could fit inside one cargo container. It is now infinitely bigger, and also more fragmented and cloistered, with the rise of YouTube, social media, mobile apps, and fleeting content like Snapchat. All of that makes the work of internet archiving increasingly complicated. And as ever more things are born digital, they become more fragile, Graham told me. He leaned over to touch a shelf full of vinyl records. “This right here, this is harder to destroy.”
Graham believes that the most endangered parts of the internet live beyond American borders. In the fall, the Internet Archive partnered with a physicist from Turkey to help recover the work of dozens of news outlets that had been banned by the Turkish government after the failed coup in 2016. Graham is now looking for volunteers to help him identify digital information that may be at risk in places like Russia, Ethiopia, and India. At least so far, he said, in the United States “we’ve made assumptions that we value history, truth, and facts.”
Still, a few weeks after the election, Kahle announced that the Internet Archive will build a mirror copy of its collection in Canada. The new Trump administration promised radical change, he wrote, and for the archive, that meant “preparing for a web that may face greater restrictions.” Kahle wrote that the Library of Alexandria, once the largest library in the world, is now “best known for its disappearance” and noted that the Internet Archive, like any library, is susceptible to earthquakes, changing legal regimes, and institutional failure. The archive, he reminded everyone, exists so that no one can change the past simply because there is no digital record. “The web needs a memory,” he said. “The ability to look back.”