Announcing my Ontario Early Researcher Award: Web Archives for Historical Research

ul19I’ve been sitting on this good news for a few months now, but the official word is out: the Ontario Ministry of Research and Innovation has funded by Waterloo_ARTS_History_Logo_bkweb archive project with an Ontario Early Researcher Award. These grants are designed to help early career researchers build up research teams by hiring graduate students, postdoctoral fellows, and research associates – all things that I’m hoping to do over the next five years (we also received some complementary funding that will be announced in due course). It gives me $150,000 over the next five years to begin building the Web Archives for Historical Research Group.

Since May 2015, I’ve been able to hire three research assistants with this line: Jeremy Wiebe, a PhD candidate; as well as MA candidates Shawn Dickinson and Danielle Robinson. David Hussey, my MA student who’s been working on a digital history of the Canadian video games industry, has also been working on the project as part of some complementary funding. Their profiles are available here, along with Nick Ruest and Bill Turkel who are joining me as affiliate faculty for a broader, separate grant.

The University of Waterloo announcement is here, and I think it does a good job explaining what the project does. I wanted to really thank the amazing folks at the University of Waterloo’s Office of Research, the Arts Research Office, and the Department of History for helping with the groundwork that made this possible. UW truly has offered amazing resources to get my project off the ground.

Creating Link Graphs with Warcbase

Screen Shot 2015-06-05 at 11.51.29 AMI was at the Columbia Web Archiving Collaboration: New Tools and Models conference this Thursday and Friday, and gave a quick demo. Here’s a bit more detail on it.

If you use Warcbase, using this handy guide to installing it on OS X, and follow scripts, you will eventually come up with a data file that looks a bit like this.

200510	acq.osd.mil	acq.osd.mil	96
200510	acq.osd.mil	akss.dau.mil	12
200510	agoracosmopolite.com	agorabookcafe.com	325
200510	agoracosmopolite.com	agoracosmopolitan.com	271
200510	agoracosmopolite.com	agoracosmopolite.com	8319
200510	agoracosmopolite.com	genesmedia.com	325
200510	bloc.org	go.microsoft.com	22
200510	blocpot.qc.ca	blocpot.qc.ca	104
200510	blocpot.qc.ca	marijuanaparty.org	16
200510	blocpot.qc.ca	norml.org	16
200510	blocquebecois.org	bernardbigras.qc.ca	16
200510	blocquebecois.org	bloc.org	1069
200510	blocquebecois.org	blocquebecois.org	276682

You can download your own sample file here, which draws on the Canadian Political Party and Political Interest Groups collection.

But how do you turn this into a beautiful Gephi visualization?

Easy! Continue reading

New Article: “A Haven for Perverts, Criminals, and Goons”

Screen Shot 2015-05-27 at 5.22.58 PMI’ve got a new article in Histoire Sociale/Social History‘s May 2015 issue that uses web archives to explore 1990s battles over Web regulation in Canada.

I had a lot of fun researching, writing, and editing this piece as it brought me back to some earlier themes in my research. It combined some of the earlier writing I’d done around the role administrative tribunals have played in Canadian history, as well as youth and childhood history.

The abstract:

While we today take a largely free and unregulated Internet for granted, our present regulatory environment was established in the 1990s thanks in part to a fight around the role of children on the World Wide Web. Public pressure, coupled with a national debate around cyberporn, led to serious calls for its regulation under the prism of child protection. This article explores the tensions and early fights over whether individuals and families should regulate the Internet, or, as some strenuously argued, the government had a responsibility to impose regulation. Children were the focal point of these debates.

While currently paywalled, you can find it on Project MUSE here.

The 1990s are history now, right? While the peer reviewers had lots of very helpful suggestions, the periodization didn’t even come up as an aside.

Running Shine Locally on a Collection of ARC/WARC Files

TL;DR? You can find a walkthrough here. To find out what it does, read on.

As Twitter followers will know, I’ve been playing with the UK Web Archive’s Shine front-end over the last week or so. I think it’s a fantastic front-end to a collection, and helps you get both a birds-eyes view of a collection and to have the ability to dive into concordances or the pages themselves. I’m currently indexing the material in the University of Toronto’s Canadian Political Parties and Political Interest Groups Archive-It collection and exploring it.

Results are still indexing (I’ve got about 10% left) so there may be changes to the data below, but it allows us to do things like this:

Screen Shot 2015-05-27 at 9.40.08 AM

Screen Shot 2015-05-27 at 9.22.27 AM

Screen Shot 2015-05-27 at 9.34.16 AM

Screen Shot 2015-05-27 at 9.35.40 AM

Observers of the Canadian political scene may find some of the above examples intriguing Again, once indexing is done tonight and I’m able to port it over to another machine, I’ll put some results up. Continue reading

International Internet Preservation Consortium Annual General Meeting 2015: Recap

(x-posted from Web Archives for Historians)

logoI had a fantastic time at the International Internet Preservation Consortium’s Annual General Meeting this year, held on the beautiful campus of Stanford University (with a day trip down to the Internet Archive in San Francisco). It’s hard to write these sorts of recaps: I had such an amazing time, my head filled with great ideas, that it’s difficult to give everything the justice that they deserve. Many of the presentation slide decks are available on the schedule, and videos will be forthcoming.

My main takeaways: we’re continuing to see the development of sophisticated access tools to these repositories, coupled with increasingly exciting and sophisticated researcher use of them. There’s a recognition that context matters when understanding archived webpages, a phrase that came up a few times throughout the event. Crucially, there was a lot of energy in the room: there’s a real enthusiasm towards making these as accessible as possible and facilitating their use. I wasn’t exaggerating when I noted to one of the organizers that I wish every conference was like this: leaving me on my flight home with lots of fantastic ideas, hope for the future, and excitement about what can be done. As the recent “Conference Manifesto” in the New York Times noted, that’s not the experience at all conferences!

Read one for a short day-by-day breakdown, with apologies for presentations I couldn’t include or didn’t give full justice to: Continue reading

Using Longitudinal Link Structures to Look at Three Major Canadian Political Parties

Another day, another short post. We’ve been working with Jimmy Lin‘s cluster at the University of Maryland – Jimmy’s been helping us run pig scripts and get access to things. Some more results have been illuminating.

The first script that we ran pulled out all the links to social media platforms (YouTube, Twitter, and Facebook) and aggregated them by top-level domains. This means that if say liberal.ca linked to twitter.com, and so did liberal.ca/page1 and liberal.ca/page2, the resulting chart would say liberal.ca linked to twitter.com three times. Repeat that over 11,620,105 sites collected by the University of Toronto between 2005 and 2009 and you get some neat results. Here we’re looking at Canadian political parties and interest groups.

It goes without saying that none of this could happen without collaboration. My desktop would still be crunching along if running locally on that first question, probably.  Continue reading

A Webarchiving Short Story: The Liberal Party of Canada, 2006-2008

l4yau

We’ve been playing with Jimmy Lin’s warcbase a bit more, and have been extracting all links from a series of WARCs. Here I decided to zoom in on the Liberal Party of Canada’s website, drag the websites in its modularity class close to it, and see how the links can tell us the story of the party.

In short: we see the election of a new leader (Stephanie Dion), the announcement of his new plan (the Green Shift), the pre-election fundraising (VictoryFund), the rise of an attack ad industry, and the end of it.

Here are the frames below: Continue reading