Web archives have a lot of very useful information in them! As websites disappear every second on the Web, we need to save sites now. Luckily, we’ve been saving sites since 2005: even if they don’t exist on the live web today, we may have them saved for historical research.
This is where WebArchives.ca comes in, which we’ve been “softly” launching this week – a public kicking of the tires (tell your friends about us). This is hopefully the first of many portals that we’ll be putting up on this site, using different research tools. In a nutshell, we provide access to the University of Toronto’s Archive-It Collection of Canadian Political Parties and Political Interest Groups, which they have been collecting since late 2005. For information on what is within this collection, please see the University of Toronto’s page. This site uses the UK Web Archive’s shine interface, which they have made available here.
For example, did you know that the Green Party of Canada ran a public blog on their website back in 2008, where anybody could write in? Today, if you try to visit them, you’ll receive a “403 Access Denied error”). Look for yourself: on our “advanced search” page, you can search “harper” and “fascist” with a proximity of “25” to see some provocative posts on this Green Party blog (results here). These are just a few random examples: you can certainly find hundreds more as you begin to explore through our portal.
With literally millions of pages – there are 14,490,355 “documents” in the archive found here – you sometimes need to pull your gaze back to see how ideas have risen and fallen. For example, we can discover how terms like “depression” and “recession” waned and rose over time, through our trends view. We’ve tentatively found that left-wing groups tended to use the word “depression” more than centrist or right-wingers, who used “recession” more during the economic crisis? There is a literal treasure trove of stories to be found in these collections, limited only by your imagination.
Acknowledgement and Thanks to the Team
This has been a joint production! At Waterloo, I’ve been working with Shawn Dickinson and Danielle McDonald on implementing this portal (I have two other RAs – Dave Hussey and Jeremy Wiebe – who’ve been working on other projects related to digging into the WARCs themselves). Jimmy Lin, newly arriving at Waterloo, has been making it possible for us to index material – using warcbase and the UK Web Archive’s hadoop indexer – in something shorter than a week a collection. At York University, where this server sits, Nick Ruest has been doing the heavy lifting to make this site a (pretty) reality. At Toronto, Nicholas Worby gave us access to these files. At the Internet Archive, Jefferson Bailey got the ball rolling with the Archive-It Research Services and connecting us to the Toronto folks. Finally, at Western, Bill Turkel’s also providing support and soon some cool Mathematica hacks.
And – of course – the UK Web Archive got the ball rolling with Shine!