Part One: Accessing the Archived Web (Responsibly)
Internet Archive’s Wayback Machine: http://archive.org/web/
- We’ll begin by trying a URL search: i.e. https://web.archive.org/web/*/www.sfu.ca
- We’ll then try a keyword search: i.e. https://web.archive.org/web/*/Simon%20Fraser%20University
- We’ll then explore provenance – why have certain sites been collected? Why are their gaps?
- We’ll then explore temporal coherence – how can we tell if the page we were viewing ever existed? i.e. http://web.archive.org/web/19980213154824/http://www13.geocities.com/
Explore your own content. Try to find sites, try to find gaps, explore provenance questions, and find some examples of temporal violations. Can you find a site that might have never existed?
Part Two: Conceiving a Research Question
Let’s now begin to think about what we could do with web archives. Try to think of a simple research question you could explore with five to ten websites from the Web, that involve:
- Using plain text (we will use something like Voyant Tools – a quick demo);
- Using hyperlinks (I’ll show some quick examples of what you can do with hyperlinks – quick demo using MIT immersion)
- Just by looking at their visual layout (i.e. Wayback Machine).
Let’s take a few minutes to write down our ideas.
Part Three: Rolling your Own Web Archives
Let’s think about some of the difficulties of crawling the Web. In particular, let’s use WebRecorder.io.
Then let’s try to create a relatively curated web collected.
Try to go to to those five to ten websites from your research question above and begin downloading relevant content.
Make sure to download the WARC files and remember where they are.
Part Four: Archive-It Special Collections
Now let’s think about other collections we can find around the Web. In particular, let’s check out the Canadian Political Parties and Political Interest Groups collection; the University of Toronto’s page; Simon Fraser University’s page.
Then do some explorations.
- What sorts of collections can you find in Canada?
- Can you imagine using any of these in research?
We’ll have a quick chat about how you might begin to explore access to these, as well as some of the options that Archive-It Research Services and the Archives Unleashed project might be able to offer.
Part Five: WASAPI for Fun and Profit
We’ll then highlight some of the exciting API work going on with special guest Nick Ruest! WASAPI can be found here.
Part Six: Unleashing Archives with the Archives Unleashed Toolkit
Now we’ll do our hands-on work with accessing web archives.