New Article: “If These Crawls Could Talk: Studying and Documenting Web Archives Provenance”

I’m part of a team that’s just published a new article, “If These Crawls Could Talk: Studying and Documenting Web Archives Provenance” in the Journal of the Association for Information Science and Technology. If your institution subscribes, you can find the article here. Alternatively, we have a preprint here.

Our abstract does a hopefully good job of explaining what the article is about. Read on if you’re curious: Read more

New Grant: “Continuing Education to Advance Web Archiving”

logoWe heard some exciting news yesterday! I’m part of an interdisciplinary team, led by Virginia Tech Libraries and Virginia Tech Department of Computer Science, and in collaboration with Los Alamos, Old Dominion University, Internet Archive, and George Washington University Libraries, that will be exploring “Continuing Education to Advance Web Archiving.” This was funded as part of the Institute of Museum and Library ServicesLaura Bush 21st Century Librarian Program.

The overall grant is valued at $248,451.00 USD, and here at the University of Waterloo we’ll be using $20,000 USD to support our efforts on the grant. In particular, this will help support a PhD Candidate and also some knowledge mobilization activities.

I can’t wait to see our grant vision be realized and to help assemble “a collection of educational resources, cyberinfrastructure for deploying tools to support the curriculum (including source code), and other related resources.”

Ethics and the Archived Web Presentation: “The Ethics of Studying GeoCities”

I had the great pleasure to be a speaker at the Ethics and Archiving the Web conference at the New Museum in New York City. My own contribution to the conference was a piece on the “Ethics of Studying GeoCities.”

The livestream of the whole conference is available here.

Hi everybody and thanks so much for coming to my talk today. What I want to do is discuss the “ethics of studying GeoCities,” which to me gets at both the potential but also the risks of doing a lot of this web archival research. Read more

New Article: “Ten Simple Rules for Collaborative Lesson Development”

Screen Shot 2018-03-11 at 2.16.01 PMI’m part of a great team that’s just published a new article: “Ten Simple Rules for Collaborative Lesson Development.” It’s part of the “Ten Simple Rules” series at PLOS Computational Biology.

The first paragraph of our introduction sets the stage:

Lessons take significant effort to build and even more to maintain. Most academics do this work on their own, but leveraging a community approach can make educational resource development more sustainable, robust, and responsive. Treating lessons as a community resource to be updated, adapted, and improved incrementally can free up valuable time while increasing quality.

If you’re curious, read on! The article can be found here. You can find a nicely-laid out PDF here as well.

Web Archive Analysis Workshop

Screen Shot 2018-02-22 at 9.33.56 AM
You can follow along through the links in this presentation

I was recently out at Simon Fraser University with Nick Ruest, where we ran a “Twitter and Web Analysis at Scale” workshop. We had a great and hardy band of students (including librarians, graduate students, and faculty) who braved the uncharacteristic snow atop Burnaby Mountain to learn about all things web archives and social media. My sincerest thanks again to SFU for being such amazing hosts, and for their fantastic “Data Love-In” programming.


My role in the workshop primarily focused on how to use web archives: I introduced students to the Wayback Machine (from doing searches in it to learning about temporal violations and provenance),, and of course, the Archives Unleashed Toolkit. We ended up taking data from and running analysis with it in AUT which worked for the most part. The workshop then concluded with work in Gephi.

As part of this, I made an interactive presentation: feel free to explore it, click on the many, many hyperlinks that are part of it, and you can learn a bit about web archives. I hope I get the opportunity to run this workshop a few more times: it’s always nice to have some dividends from the amount of work putting these things together can be. Read more

We’ve Been Busy! The Archives Unleashed Project in 2017

Screen Shot 2018-01-18 at 12.55.26 PMHappy 2018!

Last year (2017) was a busy year on many fronts (from my own personal parental leave to launching our Archives Unleashed project!). Our project manager, Samantha Fritz, has a great write-up on the project’s activities to date. Please check it out!

We have many more exciting things planned, from our Toronto datathon in April 2018 and the Archives Unleashed Cloud, so please stay tuned and subscribe to our Medium blog or e-mail list if you’re curious!

The Death of Storify, Difficult Alternatives, and the Need to Steward our Data Responsibly

Screen Shot 2017-12-12 at 8.09.29 PMStorify is dead. The service, which let you take social media content like Twitter and Facebook posts and aggregate them together into stories, announced that they’ll be shutting down and deleting all content as of March 16th, 2018. It’s not as bad as some platform shutdowns – there is notice and at least you can export your own content (one story at a time) – but it’s still a reminder of how vulnerable user-generated content can be online.

This hits all users hard. Within academia, Storify seems to be the go to to document controversy, or more commonly, conferences (say, the proceedings of an online conference or the hard work that went into documenting a presidential address). And why not? It’s an intuitive platform, far better than grabbing screenshots, and the other standard method – embedding Tweets in say a blog post – is equally vulnerable to an external service, that of Twitter itself, changing its access, model, or failing altogether.

So what should we do? Read more