Locally Archived Websites

Moveable library shelves

What do you do with a website once it has reached the end of its lifecycle? There are any number of reasons why you might create a website, but what should you do when you’re ready to disable or move your site to another platform or Content Management System (CMS) (like WordPress, Drupal, SquareSpace, etc.)?

With the advent of Web 2.0 tools and easy-to-use online publishing platforms, a proliferation of personal websites has inundated the web. Constantly developing technologies lead us to make major changes to the platforms we employ to create our web presence and retaining an archive of expiring sites becomes crucial to maintaining a record of our work. Creating an archive will allow you to keep a viewable copy of a website after it has been removed from the internet.

Whether you’re upgrading your personal site to a new platform or are an instructor who wants to keep copies of student projects at the end of a quarter, being able to keep a digital record of your content is important.

Thankfully, there are tools available online to help you create an active archive of sites prior to their removal. Site mirroring tools allow you to create static HTML copies of websites. One such tool is HTTrack.

Dr. Miriam Posner introduced me to the tool during our work for her DH101 class here at UCLA. Prior to finding out about HTTrack, I would create an archive by saving PDFs or screenshots of each page from the site, a time-consuming method and one that and lacked a clear organizational structure. The PDFs were also static, meaning none of the material is clickable or interactive.

HTTrack solves many of the issues associated with the static PDF copy, as their website explains: “HTTrack is a free and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure.” As the name of the process suggests, the copy creates a mirrored image of your website. When opened in your web browser, you can view the site from link to link, as if you were viewing it online.

The Windows version is more straightforward to install and use; the program has a graphical user interface (GUI) (front-end user interface with buttons) and great documentation. If you don’t have a PC, you’ll have to run the program using the command line, since there is no GUI for Macs. For the tutorial I created for this post, I focused on installation and use through the Mac command line. Archiving the sites themselves is relatively easy, the tricky part is setting up the program on your Mac! An alternative to HTTrack would be something like the Wget command. Wget does not have a GUI for Macs or PCs, operating only through the command line.

Whichever archive method you decide to use—whether that means screenshots, converting webpages to PDF or using a tool like HTTrack—make sure you retain a copy of your online work before it disappears from the web.

Image: GersteinLibrary.jpg by user Raysonho @ Open Grid Scheduler / Grid Engine. Image is used under the Creative Commons CC0 1.0 Universal Public Domain Dedication. (https://commons.wikimedia.org/wiki/File:GersteinLibrary.jpg

Wendy Perla Kurtz is a PhD student in the UCLA Department of Spanish & Portuguese, and the Senior Research and Instructional Technology Consultant (RITC) at the Center for Digital Humanities. Her research focuses on contemporary Peninsular literature and film regarding the recuperation of historical memory from the Spanish Civil War. In her dissertation titled “Representations of Loss and Recovery in Contemporary Iberian Culture: Historical Memory from the Real to the Virtual,” she explores different strategies applied to represent loss and recovery in textual and visual media on historical memory from the 20th and 21st centuries in Spain.