I’ve been sitting on this good news for a few months now, but the official word is out: the Ontario Ministry of Research and Innovation has funded by web archive project with an Ontario Early Researcher Award. These grants are designed to help early career researchers build up research teams by hiring graduate students, postdoctoral fellows, and research associates – all things that I’m hoping to do over the next five years (we also received some complementary funding that will be announced in due course). It gives me $150,000 over the next five years to begin building the Web Archives for Historical Research Group.
Since May 2015, I’ve been able to hire three research assistants with this line: Jeremy Wiebe, a PhD candidate; as well as MA candidates Shawn Dickinson and Danielle Robinson. David Hussey, my MA student who’s been working on a digital history of the Canadian video games industry, has also been working on the project as part of some complementary funding. Their profiles are available here, along with Nick Ruest and Bill Turkel who are joining me as affiliate faculty for a broader, separate grant.
The University of Waterloo announcement is here, and I think it does a good job explaining what the project does. I wanted to really thank the amazing folks at the University of Waterloo’s Office of Research, the Arts Research Office, and the Department of History for helping with the groundwork that made this possible. UW truly has offered amazing resources to get my project off the ground.
I was at the Columbia Web Archiving Collaboration: New Tools and Models conference this Thursday and Friday, and gave a quick demo. Here’s a bit more detail on it.
If you use Warcbase, using this handy guide to installing it on OS X, and follow scripts, you will eventually come up with a data file that looks a bit like this.
200510 acq.osd.mil acq.osd.mil 96
200510 acq.osd.mil akss.dau.mil 12
200510 agoracosmopolite.com agorabookcafe.com 325
200510 agoracosmopolite.com agoracosmopolitan.com 271
200510 agoracosmopolite.com agoracosmopolite.com 8319
200510 agoracosmopolite.com genesmedia.com 325
200510 bloc.org go.microsoft.com 22
200510 blocpot.qc.ca blocpot.qc.ca 104
200510 blocpot.qc.ca marijuanaparty.org 16
200510 blocpot.qc.ca norml.org 16
200510 blocquebecois.org bernardbigras.qc.ca 16
200510 blocquebecois.org bloc.org 1069
200510 blocquebecois.org blocquebecois.org 276682
You can download your own sample file here, which draws on the Canadian Political Party and Political Interest Groups collection.
But how do you turn this into a beautiful Gephi visualization?
Easy! Continue reading
I’ve got a new article in Histoire Sociale/Social History‘s May 2015 issue that uses web archives to explore 1990s battles over Web regulation in Canada.
I had a lot of fun researching, writing, and editing this piece as it brought me back to some earlier themes in my research. It combined some of the earlier writing I’d done around the role administrative tribunals have played in Canadian history, as well as youth and childhood history.
While we today take a largely free and unregulated Internet for granted, our present regulatory environment was established in the 1990s thanks in part to a fight around the role of children on the World Wide Web. Public pressure, coupled with a national debate around cyberporn, led to serious calls for its regulation under the prism of child protection. This article explores the tensions and early fights over whether individuals and families should regulate the Internet, or, as some strenuously argued, the government had a responsibility to impose regulation. Children were the focal point of these debates.
While currently paywalled, you can find it on Project MUSE here.
The 1990s are history now, right? While the peer reviewers had lots of very helpful suggestions, the periodization didn’t even come up as an aside.
TL;DR? You can find a walkthrough here. To find out what it does, read on.
As Twitter followers will know, I’ve been playing with the UK Web Archive’s Shine front-end over the last week or so. I think it’s a fantastic front-end to a collection, and helps you get both a birds-eyes view of a collection and to have the ability to dive into concordances or the pages themselves. I’m currently indexing the material in the University of Toronto’s Canadian Political Parties and Political Interest Groups Archive-It collection and exploring it.
Results are still indexing (I’ve got about 10% left) so there may be changes to the data below, but it allows us to do things like this:
Observers of the Canadian political scene may find some of the above examples intriguing Again, once indexing is done tonight and I’m able to port it over to another machine, I’ll put some results up. Continue reading
(x-posted from Web Archives for Historians)
I had a fantastic time at the International Internet Preservation Consortium’s Annual General Meeting this year, held on the beautiful campus of Stanford University (with a day trip down to the Internet Archive in San Francisco). It’s hard to write these sorts of recaps: I had such an amazing time, my head filled with great ideas, that it’s difficult to give everything the justice that they deserve. Many of the presentation slide decks are available on the schedule, and videos will be forthcoming.
My main takeaways: we’re continuing to see the development of sophisticated access tools to these repositories, coupled with increasingly exciting and sophisticated researcher use of them. There’s a recognition that context matters when understanding archived webpages, a phrase that came up a few times throughout the event. Crucially, there was a lot of energy in the room: there’s a real enthusiasm towards making these as accessible as possible and facilitating their use. I wasn’t exaggerating when I noted to one of the organizers that I wish every conference was like this: leaving me on my flight home with lots of fantastic ideas, hope for the future, and excitement about what can be done. As the recent “Conference Manifesto” in the New York Times noted, that’s not the experience at all conferences!
Read one for a short day-by-day breakdown, with apologies for presentations I couldn’t include or didn’t give full justice to: Continue reading
Another day, another short post. We’ve been working with Jimmy Lin‘s cluster at the University of Maryland – Jimmy’s been helping us run pig scripts and get access to things. Some more results have been illuminating.
The first script that we ran pulled out all the links to social media platforms (YouTube, Twitter, and Facebook) and aggregated them by top-level domains. This means that if say liberal.ca linked to twitter.com, and so did liberal.ca/page1 and liberal.ca/page2, the resulting chart would say liberal.ca linked to twitter.com three times. Repeat that over 11,620,105 sites collected by the University of Toronto between 2005 and 2009 and you get some neat results. Here we’re looking at Canadian political parties and interest groups.
It goes without saying that none of this could happen without collaboration. My desktop would still be crunching along if running locally on that first question, probably. Continue reading
We’ve been playing with Jimmy Lin’s warcbase a bit more, and have been extracting all links from a series of WARCs. Here I decided to zoom in on the Liberal Party of Canada’s website, drag the websites in its modularity class close to it, and see how the links can tell us the story of the party.
In short: we see the election of a new leader (Stephanie Dion), the announcement of his new plan (the Green Shift), the pre-election fundraising (VictoryFund), the rise of an attack ad industry, and the end of it.
Here are the frames below: Continue reading