Investigating Curatorial Models as the Marshall McLuhan Centenary Fellow in Digital Sustainability

Most importantly, I'll be getting to work in my favourite building in North America!

Most importantly, I’ll be getting to work in my favourite building in North America!

Great news! Starting on July 1st, I’m the inaugural Marshall McLuhan Centenary Fellow in Digital Sustainability, held at the University of Toronto’s Digital Curation Institute, which is housed in their Faculty of Information. The DCI is led by Christoph Becker, who I’m really looking forward to working with more over the next 12 months (as well as his great graduate students).

What does this mean? Basically, over the next year I’ll be hosting the following public events in Toronto. This will primarily be taking place in the January – May timeframe, and I will be in Toronto roughly once-a-week during this period. It is also an excuse to be physically proximate to great collaborators: folks at the DCI, Toronto libraries (especially Nich Worby who I’ve worked with quite a bit), and York (where my frequent collaborator Nick Ruest is based).

  • Workshops: I’ll run a web archiving analysis workshop in Toronto, probably focusing on the warcbase platform – perhaps riding the coattails of great virtual machine and repository that Nick Ruest developed. I would also like to run a workshop on Twitter archiving and analysis.
  • Give an Invited Lecture: I’ll be giving a Coach House Institute lecture on the findings of the Fellowship research project, discussed below;
  • Organize a Marquee Event: I’d like to help the DCI with bringing in a high-profile invited speaker to discuss web archiving. Maybe I can score some free canapés.

Most importantly, I’ll be carrying out a research project on qualitative comparisons of web archival content, specifically the kinds of content curated using a social media approach versus a manually-curated professional one. Continue reading

Archives Unleashed, Part Two: Unlocking Library of Congress Collections with Warcbase

As part of the Archives Unleashed hackathon, the Library of Congress graciously provided access to several of their collections. Jimmy Lin and myself worked with one of the teams, “The Supremes,” to see if we could generate useful scholarly derivatives from the underlying collections.

The team was called “The Supremes” for an apt reason: we worked with web archival data around the nominations for Justice Alito and Justice Roberts. These were two nominations that began in 2005, and contained legal blogs, Senatorial discussions, and other content relevant to those nominations.

As it was a datathon with limited time and resources, we used data subsets:

  • Alito – 51 GB, 1.8 million records, 1.2 million pages
  • Roberts – 41 GB – 1.4 million records, 1.0 million pages

Given the age of these collections, rather than being in WARC format, they were actually in the earlier (now depreciated) ARC format. But still, we were able to generate results quickly.

After two hours of Jimmy painstakingly hunting down some malfunctioning ARC – the web archival container format – files (the juicy details on how we’re going to fix that can be found here), the analysis began.

Within five minutes, we had useful scholarly derivatives and were already raising research questions. Continue reading

Archives Unleashed, Part One: Unlocking Web Archives through Interdisciplinary Collaboration

Me presenting our final datathon projects at the closing symposium.

Me presenting our final datathon projects at the closing symposium.

Last week, I had the pleasure of co-hosting our “Archives Unleashed 2.0 Hackathon” at the Library of Congress, along with Matthew Weber (Rutgers), Jimmy Lin (Waterloo), Nathalie Casemajor (Université du Québec en Outaouais), and Nicholas Worby (Toronto). While a lot of our time was taken up by facilitating the smooth running of the event – providing virtual machines, ensuring people had great test datasets, making sure that people knew when fresh coffee arrived – we also had time to participate and hack within some of the teams.

Why did this datathon matter?

I was asked to give a short presentation about the datathon to the Saving the Web Symposium, organized by Dame Wendy Hall and the Kluge Center immediately following our hackathon. Continue reading

Web Archives and Born-Digital Sources Workshop: Challenges, Future Steps, and the Field

On June 8th, I had the pleasure of attending “Born digital big data and approaches for history and the humanities,” a workshop hosted by the University of London’s School of Advanced Study. You can see the full program of the day here. It’s part of an AHRC research network that I’m part of.

With Peter Webster and Jason Webber, I participated in a roundtable discussion on web archives, moderated by Jane Winters. Jane asked us four questions:

  1. What do you think is unique about web archives, particularly in relation to other types of born digital data?
  2. What are the key challenges facing researchers when working with web archives?
  3. What should we be doing that we’re not currently doing, in order to ensure that web archives can be accessed now and in the future? What are the barriers?
  4. Talk about the most interesting project/piece of research you’ve been involved with.

I had a few responses: Continue reading

New Article: “An Open-Source Strategy for Documenting Events”

Screen Shot 2016-04-26 at 1.08.31 PMNick Ruest and myself have a piece that’s just come out in Code4Lib Journal. The article takes readers through the (a) why Twitter matters for event archiving and future historical research; (b) how you can collect data yourself; and (c) how you can analyze the data. You can read the abstract below, and check out the article here!

As always, hope you enjoy reading it, and if you have any comments, questions, or anything, we are always happy to hear from you.

Abstract follows after the fold. Continue reading

New Article: “The Great WARC Adventure: Using SIPS, AIPS, and DIPS to document SLAPPs”

Screen Shot 2016-04-06 at 12.38.11 PMNick Ruest, Anna St-Onge, and myself have a piece that’s just come out in the open-access journal Digital Studies / Le champ numérique. The deliberately acronym-heavy title introduces an article that really takes us through the process of (a) creating a web archive; (b) preserving and providing access to the files; and (c) running some basic analysis on it from the perspective of a historian. While some of the text analysis done in the rear bit of the article predates more recent warcbase developments, I think it hopefully provides a great and useful conceptual approach.

You can find the article here, and abstract below. Hope you enjoy it. Continue reading

Obama and Twitter: Actually, Mr. President, We think Social Media Will Matter in the Future

Obama tweeting in happier days.

I don’t normally take partisan positions here at ianmilligan.ca, especially in the rough and tumble world of American politics. But sometimes a line is crossed, and I cannot stay silent!😉

Speaking at a journalism event in late March 2016, American President Obama had this to say according to the Washington Examiner.

Ten, 20, 50 years from now, no one seeking to understand our age is going to be searching the tweets that got the most retweets, or the post that got the most likes … They’ll look for the kind of reporting, the smartest investigative journalism that told our story — lifted up the contradictions in our societies and asked the hard questions and forced people to see the truth even when it was uncomfortable.

I guess all this really shows is that President Obama doesn’t follow us on GitHub or on Twitter, or else he’d know that Nick Ruest and I have been tackling these very questions. Continue reading