Most of my research updates can be found on my Twitter feed. But in short, I am working on:

- how digital methods are transforming historical work (building off an earlier blog post)
– how children and youth became central to early debates around Internet regulation in Canada, for both sides
– how we can use the Internet Archive as a historical resource

Read down for my two major projects (which are, of course, linked):

An Infinite Archive? Developing HistoryCrawler to Explore the Internet Archive as a Historical Resource

Screen Shot 2013-02-05 at 10.20.17 AMThis project, “An Infinite Archive? Developing HistoryCrawler to Explore the Internet Archive as a Historical Resource,” will work with a groundbreaking new data resource of an Internet Archive sweep to explore how historians can use and develop new digital tools to carry out historical research on previously unconceivable amounts of information. The project will primarily aim to create an innovative research tool, tentatively named HistoryCrawler, that can be used by historians without much technical expertise to create “on-the-fly” finding aids and quickly run textual analyses on material. Additionally, the project will look to create a proof-of-concept Internet social history, explore the disciplinary implications of the Internet Archive, and help train highly qualified personnel to deal with these challenges. Bringing together faculty and students at the University of Waterloo and Western University, this Insight Development Grant is foreseen as an early stage to a bigger project with either the Insight or Partnership Development programmes.

This project is well timed due to the data resource that has recently become available. On 26 October 2012, the Internet Archive announced that it would make an entire crawl of the Internet available to interested researchers: the entire collection amounts to 80 terabytes of information. A sub-set of “.ca” level domains is available, some 160,884 websites. This would be an unparalleled snapshot of the internet during the Internet Archive crawl between 9 March 2011 and 23 December 2011, how it changed, and what its content was. We would gain fascinating insight into the everyday lives of Canadians. What would a social or cultural history of 2011, conducted through these large-scale born-digital collections, reveal?

Several methods will be employed: these include sentiment analysis (where the emotional content of text can be assessed), tracing the words that surround a given concept to word frequencies, the establishment of recurring topics of discussion throughout the Internet, as well as ways to quickly move from distant readings of all files towards a close reading of only one webpage. The idea will be moving between the macro-scale of “information overload” to the micro-scale of individual documents. This tool will be named HistoryCrawler and it will enable historians to use this resource. This tool will enhance and combine several open-source tools, from the WayBackMachine (that views websites as they originally appeared), WARC tools (which compile a full-text archive), and various visualization techniques that will allow users to quickly see overall data and then move to relevant points. Such computational methods offer the only fruitful way for a social or cultural historian to explore collections such as the Internet Archive.

HistoryCrawler will be deployed online, and all code will be made publicly accessible through a CreativeCommons-ByAttribution-ShareAlike license. Findings will be disseminated through conventional peer-reviewed publications, academic conferences, as well as an ongoing blog.


My next major project, “Postwar English-Canadian Youth Cultures: A Digital History, 1945-1990,” aims to expand our understanding of youth cultures through new and emerging digital methodologies. Previous approaches, while fruitful, have focused on a small number of influential youth (often those who went to university or assumed leadership positions). They left records that enable historians to find and interview them, produced documents that were preserved, and in some cases continue to maintain influence. Digital history, the application of digital methodologies to historical questions, offers a means to widen our perspective from these youth alone to gain a synoptic view of youth culture. We can now trace the rise and fall of cultural ideas. Historical digital sources have reached a scale where they defy conventional analysis. The Internet Archive has 2.9 million texts; there are 2.6 million pages of newspapers at the Library of Congress’ Chronicling America; the McCord Museum at McGill has over 80,000 photographs; Google Books has digitized fifteen million books. The amount of accessible digital information grows on a daily basis, making digital humanities projects increasingly feasible, and for that matter, necessary.

My most recent born-digital archival navigator, here running on a web archive hosted by Library and Archives Canada (late Feb 2012).

Historians can use computational tools to make sense of and process these digital sources. Text mining, for example, enables us to isolate recurring words or phrases, note their frequency in a given year, within specific contexts, and to normalize the information with respects to dates. I will trace the evolution of anxieties surrounding youth amongst youth themselves, observers, and government. Lyrics will also be key, as I will analyze thousands of songs (not just those by the oft-studied Dylan, Beatles, Rolling Stones) to see the relative rise and fall of ideas and certain key words over time. I will approach this project by writing software in the Mathematica programming language, an integrated platform for technical computing that will allow me to process, visualize, and interact with digital historical sources.

Want to learn more? All the nitty-gritty can be found in my SSHRC postdoctoral research proposal, which I have been approved to hold at the University of Western Ontario.

