As part of our Web Archives for Longitudinal Knowledge project, we’ve been signing Memorandums of Agreement with Canadian Archive-It partners, ingesting their web archival collections into our Compute Canada system, and generating derivative datasets. An example of these collections is something like the University of Toronto’s Canadian Political Parties collection: a discrete collection on a focused topic, exploring matters of interest to researchers and everyday Canadians. As the size and number of collections begins to creep upwards – we’ve got about 15 TB of data spread over 46 collections (primarily from Alberta, Toronto, and Victoria) – our workflows need to scale to deal with this material.
More importantly, when one’s productivity is impacted by unfolding world events (it’s been a long week!), scripting means that the work still gets done. Continue reading