Accessing Historical Data en Masse

Click to download the slide deck.

Click to download the slide deck.

This webpage was prepared for the “Accessing Historical Data en masse” workshop. It was held as part of a broader “Research, Teaching, and Digital Humanities” event at the Massachusetts Institute of Technology’s Department of History on Tuesday, January 13th. My thanks to Heather Lee, who helped talk out the shape of this workshop with me, as well as her fellow organizers Jeffrey Ravel and Sana Aiyar. I’d also like to thank Margo Collett for her logistical help.

The Slide Deck and Overall Approach

The slide deck is the most important resource. It is available as a PDF download here [9.8MB]. It doubled both as my slides for parts of the presentation, but also as a handout to help participants follow along.

I wavered back and forth about whether to keep us all on the same page as we moved through these lessons, or to provide links, the handout, have the talk videotaped, and make myself available for twenty minutes of Q + A/hands-on time at the end. I’ve elected for the latter: it increased the ground we can work on. We’re not going to make people programmers in 80 minutes, but we’re going to awaken an idea of why they might want to learn, what they can do with these skills, and how to conceptualize what they might want to do.

Links to Tools and Resources Used

Dream Cases and Links to Important Repositories (Advanced Searches)
Epigraphic Database Heidelberg (provides an ‘export results to CSV’ option)
Commonwealth War Graves Commission (provides a ‘download results’ button)
Google Books Advanced Search (remember to select ‘full view only’, which will make sure you can download PDFs and ePUB files)
Internet Archive Advanced Search
Hathi Trust Advanced Search
Shawn Graham’s list of data repositories for HIST 3907B (Carleton University)

API Example
Canadiana’s API Example
Example of a Canadiana API Call, JSON format.
Example of a full-text record generated using the Canadiana API.
Example bash script to download files.

Outwit Hub Example
Outwit Hub Light
Suda On-Line Database: A Massive 10th Century Byzantine Greek Encyclopedia
Search results page, right click to ‘view source.’

Computational Resources
Introduction to the Bash Command Line, Programming Historian lesson by Ian Milligan and James Baker.
Automated Downloading with Wget, Programming Historian lesson by Ian Milligan.
Downloading in Bulk with Wget, blog post by the Internet Archive.
Boston Public Library Anti-Slavery Collection
Data Mining the Internet Archive Collection”, Programming Historian lesson by Caleb McDaniel.
“Downloading Multiple Records Using Query Strings, Programming Historian lesson by Adam Crymble.
HistoryCrawler Virtual Machine.

One thought on “Accessing Historical Data en Masse

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s