A new President and the End of Term Web Archive

Obama inauguration, 2009. end of term web archive

Obama inauguration, 2009

Preservation of online information presents a very difficult problem in general.

Federal government websites are especially vulnerable at the end of a presidential term. The End of Term Web Archive has preserved a snapshot of them since 2008.

I have seen web posts that speak of a frantic effort to preserve government information. They attribute it to fear of the incoming Trump administration.

Don’t believe such mindless hysteria.

Regardless of who won the election, the End of Term Web Archive team would be hard at work. Even when a President is reelected, turnover in the cabinet and at other agencies can be high. The new team often takes down the old sites to make room for its own.

If a government document exists in print, some archive preserves it. The National Archives or a presidential library preserve a lot. Government information that exists only on the web easily disappears without a trace. Just like any other web-based information.

Who has responsibility for archiving the government’s web presence? No one. Some sites have a mandated custodian. Many do not. Of all the PDFs on .gov websites in 2008, 83% disappeared by 2012..

Some sites simply cease to exist. For example, the National Institute of Literacy had a website until 2011. Then it disappeared. The EOT Archive provides access to it as it existed in 2008.
Other sites get folded into sites operated by larger agencies. Still others undergo major reorganization, and much of the material gets new URLs.

The first End of Term Web Archive

LOC Jefferson Building, end of term web archive

Library of Congress, Thomas Jefferson Building


The idea of an EOT archive came out of the 2008 meeting of The International Internet Preservation Consortium (IIPC).

The National Archives had crawled .gov sites in 2004 and decided not to repeat the effort. Several American institutions belong to the IIPC, including:

  • Library of Congress,
  • Internet Archive,
  • California Digital Library,
  • University of North Texas
  • Government Publishing Office

And so these and other American attendees discussed the problem. They realized that they already collected government material for their own organizations. Pooling their efforts in a large-scale collaboration appeared as the obvious solution.

The George W. Bush administration was coming to an end. A new administration would take over regardless of how the election turned out. How would government websites change in the transition?

The new archive’s first task was to document the answer to that question. A press release that announced the initiative estimated the lifespan of a government website as 44 days.

The partner organizations asked librarians and other information specialists for help. Volunteers would elect and prioritize which sites to include. They began to crawl government websites in August 2008 and collected URLs in December.

They collected the same URLs after the inauguration in January 2009, and again in the spring and fall of that year. In all they gathered some 15.9 terabytes of data that documents how the sites changed.

Second and third End of Term Web Archives

Internet Archive servers--end of term web archive

Internet Archive servers

Another EOT harvest began in November 2011 to document changes from Obama’s first term to his second.
More people took part in nominating URLs this time. The Pratt Institute School of Information in New York volunteered a class of library school students.
And now the 2016 effort will be the largest to date. It will concentrate on sites most likely to change or disappear during the transition.

As the Obama Administration comes to an end, the Trump Administration prepares to take office. the Internet Archive and its partners are harvesting webpages from more than 6,000 .gov and .mil domains and more than 200,000 hosts.

These pages comprise material from all three branches of the federal government and regulatory agencies. The partnership is also collecting social media feeds from about 10,000 official federal accounts.

The EOT Web Archive process

Collection of 2016 EOT material began in July 2016. A vast army of people identify and nominate URLs to preserve. Volunteers include government documents and subject experts, researchers, librarians, educators, and students
Project specialists weed out duplicates and any out-of-scope nominations. They assign each URL a weighted score to determine its priority level.

The Internet Archive hosts a searchable and browsable public access copy. The Library of Congress keeps a preservation copy. The University of North Texas also holds a copy for data analysis.

A project that occurs once in four years can’t possibly preserve all electronic data from the federal government. But it permanently documents the inevitable changes after every presidential election.

Government website harvest enlists librarians, educations, students / Lisa Peel, Library Journal. December 13, 2016.
Preserving U.S. government websites and data as the Obama term ends / Jefferson. Internet Archive Blogs. December 15, 2016

Photo credits:
Obama inauguration. Public domain from Wikimedia Commons
Library of Congress Jefferson Building. Public domain from Wikimedia Commons
Internet Archive servers. Some rights reserved by John Blyberg

Leave a Reply

Your email address will not be published. Required fields are marked *