Accessing Treasure Troves of Data: Empowering your own Research

[x-posted with]

This post is a bit technical. My goal is to explain technical concepts related to digital history so people can save time and not have to rely on experts. The worst thing that could happen to digital history is for knowledge to consolidate among a handful of experts.

From the holdings of Library and Archives Canada, to the Internet Archive, or smaller repositories like digitized presidential diaries, or Roman Empire transcriptions, there are a lot of digitized primary sources out there on the Web. You don’t need to be a “digital historian” to realize that sometimes there is a benefit to having copies of these sources on your own computer. You can add them to your own research database, make them into Word Clouds (I know, they’re not perfect), or find ways to manipulate them with tools such as Voyant-Tools, a spreadsheet software, or many other tools that are available. If you can download sources, you may not have to physically travel to an archive, which to me suggests a more democratic access to sources.

Digital historians have been working on teaching users how to access the databases that run online archival collections and how to harness this information for your own research. In this post, I want to give readers a quick overview of some of the resources out there that you can use to build your own repositories of information. If you ever find yourself clicking at your computer, hitting ‘right click’ and then ‘save page as,’ or downloading PDF after PDF after PDF… this post will help you better utilize your computer’s tools, making the digital research process a bit quicker.

Continue reading

Public Lecture at Acadia University on “Big Data and History,” November 28th

Screen Shot 2014-11-10 at 2.56.42 PMI’ll be giving a talk entitled “Big Data and History: How Web Archives Will Challenge, Complement and Enhance the Historical Profession” at the Acadia Institute for Data Analytics on November 28th at 3PM, in BAC 132. I’m going to be making the case that historians need to start thinking about data, drawing on arguments around digital preservation, understanding sources, the rise of web archives, and featuring some examples from my own work with GeoCities and the Wide Web Scrape.

A more eloquent abstract:

“Big Data and History” argues that we need to understand the implications of the arrival of new archives: web collections. These collections of websites aggregated into single files necessitate a rethinking of how historians will approach their professional standards and trainings, with particular implications for historians studying topics involving the 1980s onwards. While historians are normally accustomed to not having enough information about their topic, the problem for many is now shifting towards having far too much data. How can humanities-based researchers begin to grapple with these problems?

If you can make it, the event page is available here. I’m really excited to have the opportunity to go back to beautiful Wolfville, Nova Scotia (my partner did her undergraduate degree there, and my friend and colleague Thomas Peace used to teach there, so I’ve heard so much about the university).

SSHRC’s Research Data Archiving Policy and Historians

Whenever I even think about archival trips, my back pre-emptively aches. It involves sitting or standing near documents, taking digital photographs. And I know that if I looked around the archive, chances are that nine out of ten of my colleagues are doing something similar (and yes, because the plural of an anecdote is not data, Ithaka S+R has reported on this widespread trend in historical research).

When we all travel home to our universities, those historians who are travelling on SSHRC’s dime will surely deposit their research data (photos?) after a reasonable amount of time with their organization’s institutional repository or on some other sharing website, right, to make sure their publicly-funded research is made accessible? SSHRC just wants “qualitative information in digital format,” so maybe our photos, or just our notes, right? </sarcasm>

I wager – unscientifically, based only on anecdotal conversations at the Canadian Historical Association, on Twitter, and in hallways – that the vast majority of historians in Canada would be opposed to the very idea, even if their work was generously funded. The value of our work is too wrapped up in the scarcity of sources themselves, rather than just the narratives that we weave with them. Continue reading

Short Interview on CBC’s The Current on Historians and Big Data

IMG_20141001_095116I did a short interview on CBC Radio’s The Current with Anna Maria Tremonti, which aired this morning. I was responding to some of the utopian arguments made by Christian Rudder’s book Dataclysm, noting that while the historical record is going to be enriched by digital sources, we’ve got to consider issues of access, preservation, and funding. I was nervous, but I think I got my main points across pretty decently.

The talk is available here: “Historians want Canada to give them access to Big Data.”

It was a fantastic experience, and it really did get me thinking about how it would be rewarding to build the capital to get website legal deposit in Canada — or at the very least, to get the preservation of digital resources a little bit more on the table. Maybe I’ll try my hand at some popular writing.

The Future of the Library in the Digital Age? Worrying about Preserving our Knowledge

X-Posted with

By Ian Milligan

Yesterday afternoon, in the atrium of the University of Waterloo’s Stratford Campus, a packed room forewent what was likely the last nice weekend of summer to join Peter Mansbridge and guests for a discussion around “What’s the future of the library in the age of Google?” It was aired on CBC’s Cross Country Checkup on CBC Radio One, available here. It was an interesting discussion, tackling major issues such as what local libraries should do in the digital age, issues of universal accessibility, and whether we should start shifting away from a model of physically acquiring sources (notably books) towards new models for the 21st century. Historians, and those who care about history, have much to contribute to these sorts of conversations. Those who know me or have read my writings over the last three years know that I’m not a luddite. But I came away worried about some of the assumptions made in the conversation, and what they mean for us who write about the past.

A big crowd of folks who care enough about libraries to spend a beautiful Sunday afternoon in a university building lobby.

A big crowd of folks who care enough about libraries to spend a beautiful Sunday afternoon in a university building lobby.

I don’t want to rehash the conversation, as you could rewatch it, but a brief summary of some of the main themes might help. The broadcast began with Peter Mansbridge asking the major question “Digital technology is changing the way we store information, and how we learn from it. Does it make sense to stack printed books in costly buildings when virtual libraries are just a mouse-click away?” Mansbridge was joined by Christine McWebb, director of academic programs at the Waterloo Stratford Campus, and Ken Roberts, former chief librarians of the Hamilton Public Library and a member of the Royal Society of Canada’s Expert Panel on the Future of Libraries and Archives in Canada. Continue reading

Back to School (Teaching for Fall 2014)

I think even my colleagues are surprised when they’re reminded that I’m beginning my third year at the University of Waterloo. Last year was a fantastic one: great students, fun colleagues, and getting more involved in the life of the university (from sitting on doctoral committees, Master’s committees, and getting to attend fun events).

I’m teaching four courses this year, two in the Fall and two in the Winter. This term, I’m teaching two classes: the second-year historical methodology course for our majors, minors, and lovers-of-history, and a fourth-year honours seminar on Canadian social movements. If you’re curious, feel free to click through on the syllabus thumbnails below to read the whole things. As with every class, there are things that have been left out, but that’s the way of things!

My digital history course, which a few folks on Twitter have asked about, will be offered again in Winter 2015.

History 250 - The Art and Craft of History

History 250 – The Art and Craft of History

History 403A - Canadian Honours Seminar

History 403A – Canadian Honours Seminar

Using ImagePlot to Explore Web Archived Images

A low-resolution version of the EnchantedForest visualization. Read on for higher-resolution downloads.

A low-resolution version of the EnchantedForest visualization. Read on for higher-resolution downloads.

ImagePlot, developed by Lev Manovich’s Software Studies Initiative, promises to help you “explore patterns in large image collections.” It doesn’t disappoint. In this short post, I want to demonstrate what we can learn by visualizing the 243,520 images of all formats that make up the child-focused EnchantedForest neighbourhood of the GeoCities web archive.

Setting it Up

Loading web archived images into ImagePlot (macros which work with the open-source program ImageJ) requires an extra step, which works for both Wide Web Scrape as well as GeoCities data. Images need to be 24-bit RGB to work. My experience was that weird file formats broke the macros (i.e. an ico file, or other junk that you do get in a web archive), so I used ImageMagick to convert the files. Continue reading