I had the opportunity to present “Studying the Web in the Shadow of Uncle Sam,” a paper that Tom Smyth (Library and Archives Canada) and myself proposed to the National Webs workshop that Niels Brügger and Ditte Laursen hosted in Aarhus, Denmark. Our paper abstract is below, as are the slides that I presented.
Hopefully more cool things can be announced soon (and the paper should appear in an edited collection, hopefully in early 2018 – our drafts go in for March).
What is the Canadian Web? While Canada does have the .ca top-level domain, this does not capture Canada: while universities, governmental institutions, and some companies use the .ca TLD, many other corporations, small businesses, bloggers, and others generally gravitate towards .com, .org, or .net (a question we will briefly explore in our paper). In short, the .ca domain is a relatively niche player. Analyses using just the top-level domain would be skewed towards certain forms of content providers. This question presents considerable challenge for national libraries and researchers working in a national perspective on an inherently global network.
We will approach this question in three ways within this paper. First, we present the state of the Canadian Web. Drawing on initial work by Library and Archives Canada and twenty-five Archive-It partners in Canada, we discuss what it means to study the Canadian Web. Second, we explore what work has been done to date: how Library and Archives Canada and Canadian partners have embraced the challenge and what they have been collecting. How do librarians and archivists select their seeds in this context, and does it approach a national web? This collection development strategy is an interim one, beginning to lay the foundation for greater capacity for domain crawling. “Thematic web collections” steward parts of the Canadian Web, with a recognition of the stopgap nature of things. Finally, we use the piece to show various paths forward towards a domain crawl of the “Canadian Web,” highlighting the Web Archives for Longitudinal Knowledge (WALK) project that is beginning to integrate disparate web archives across the country.
Click on the first slide and you can view it as a slideshow!