Future researchers can rest easy: Know Your Meme, Urban Dictionary, Creepypasta and Cute Overload have all been preserved by the Library of Congress. So has the band website for They Might Be Giants and the entire published output of The Toast, the humor site that shut down in 2016.

And while the Library of Congress owns a rare print copy of the Gutenberg Bible, the web archive features the LOLCat Bible Translation Project, which rendered the bible in LOLspeak.

For the past 20 years, a small team of archivists at the Library of Congress has been collecting the web, quietly and dutifully in its way. The initiative was born out of a desire to collect and preserve open-access materials from the web, especially U.S. government content around elections, which makes this the team’s busy season.

But the project has turned into a sweeping catalog of internet culture, defunct blogs, digital chat rooms, web comics, tweets and most other aspects of online life.

“Suddenly, these new technologies and social media platforms come in, and these new types of ways people were communicating or sharing data online,” said Abbie Grotke, who leads the archiving team and has worked for the program since 2002, two years after its founding. “And we had to keep up with it all. There’s always something new the web is throwing at us.”

March turned out to be particularly chaotic. With an entire team working from home, the web archivists are participating in an international project to collect content around the coronavirus, as well as adding to the library’s own collections about the pandemic. And, of course, it’s still technically campaign season.

“We do an all-hands-on-deck,” Ms. Grotke said.“And we don’t delete anything. We’re digital hoarders.”


We asked the Library of Congress digital archivists
to riff on popular memes

Image
Image
Image
Image

The web archive team has grown from one librarian who used to read newspapers and circle mentions of websites to a staff of five, along with employees from other departments who pitch in. It is hardly adequate, given their monumental task.

Already the library has amassed more than 2.129 petabytes of data — or put another way, 18 billion digital documents. And that’s just a sliver of the internet.

“In the vastness of the web, what is the sampling of stuff that we can pull together that demonstrates what’s going on now?” said John Fenn, the head of research and programs at the American Folklife Center. He is also one of about 80 recommending officers, who make suggestions for the library’s archive — in Mr. Fenn’s case, for the Web Cultures collection. (It is one of several thematic groupings in the archive, along with the Webcomics collection, American Music Creators and dozens more.)

“It’s like whack-a-mole,” said Gina Jones, a digital projects coordinator on the team.

The criteria for selection typically used by print archivists — value to future scholars, uniqueness of the material — still apply to the web archivists, though the high extinction rate of digital matter factors into decision making. One of the most recent acquisitions is the recently defunct Design Sponge, an interior decorating website that ran for 15 years. (Though it will cease to exist as a website, every single blog post will be fully accessible through the Library’s web archive.)

The earliest material in the archive dates to the 2000 elections, when the web archive was still a pilot program. After the terrorist attacks of 9/11, when heart-rending memorials and fierce political debates played out online, the library recognized the need for an official digital record.

For years, collecting was keyed to major news events: the Iraq War, the 2004 elections. Then, around 2009, came a more continuing, expanded approach that sought to reflect the web in all its dizzying newness.

It is inevitable that many things go uncollected or are lost forever. The recommending officers have regrets.

But the Library of Congress digital collection carries with it the heft of the federal government and the official stamp of American history. Digital material that is chosen by the web archivists will live alongside the rough draft of the Declaration of Independence, “Moby Dick” and other sacrosanct print holdings.

Ms. Grotke, 52, is of the generation who were adults when they first learned about the internet. In her case, it was back around 1993, at a house in the Dupont Circle of Washington, D.C., where a friend of her brother’s lived. “He brought us over and we got to see Mosaic, an early browser,” she said. “I remember clicking and, like, whoa, there’s hyperlinks.”

That wide-eyed reaction to the internet has morphed, over 18 years of trying to corral it, into a more seasoned outlook. “The web is messy, and the web archives are messier,” she likes to say.

In addition to running the team, Ms. Grotke’s other current task is to make the public aware of the archive’s existence. The archive’s website is available to anyone with an internet connection, but after 20 years it remains underutilized by the general public and the scholars it may be most beneficial to.

Ian Milligan, an associate professor of history at the University of Waterloo in Canada, has used the web archive to research the 1990s, and as a teaching tool in the classroom. Historians have had a long tradition of sitting down in a reading room, looking through tidily packaged print material, he said.

But with a digital archive, “we’re talking petabytes of information. You need technical skills to work with this — skills that are beyond almost anyone in the social sciences,” or the general public.

Ms. Grotke said that if the archive is slightly impenetrable at present, it’s a consequence of limited resources and the ever-expanding ocean of digital content. “We don’t have time to stop and make it more user friendly. We’re just trying to collect it all” before it disappears, she said.

“Our archives are just massive and keep growing and growing,” Ms. Grotke said. “And I have the same number of staff.”

source: nytimes.com

LEAVE A REPLY

Please enter your comment!
Please enter your name here