This webpage was prepared for the “Accessing Historical Data en masse” workshop. It was held as part of a broader “Research, Teaching, and Digital Humanities” event at the Massachusetts Institute of Technology’s Department of History on Tuesday, January 13th. My thanks to Heather Lee, who helped talk out the shape of this workshop with me, as well as her fellow organizers Jeffrey Ravel and Sana Aiyar. I’d also like to thank Margo Collett for her logistical help.
The Slide Deck and Overall Approach
The slide deck is the most important resource. It is available as a PDF download here [9.8MB]. It doubled both as my slides for parts of the presentation, but also as a handout to help participants follow along.
I wavered back and forth about whether to keep us all on the same page as we moved through these lessons, or to provide links, the handout, have the talk videotaped, and make myself available for twenty minutes of Q + A/hands-on time at the end. I’ve elected for the latter: it increased the ground we can work on. We’re not going to make people programmers in 80 minutes, but we’re going to awaken an idea of why they might want to learn, what they can do with these skills, and how to conceptualize what they might want to do.
Links to Tools and Resources Used
Dream Cases and Links to Important Repositories (Advanced Searches)
Epigraphic Database Heidelberg (provides an ‘export results to CSV’ option)
Commonwealth War Graves Commission (provides a ‘download results’ button)
Google Books Advanced Search (remember to select ‘full view only’, which will make sure you can download PDFs and ePUB files)
Internet Archive Advanced Search
Hathi Trust Advanced Search
Shawn Graham’s list of data repositories for HIST 3907B (Carleton University)
Introduction to the Bash Command Line, Programming Historian lesson by Ian Milligan and James Baker.
Automated Downloading with Wget, Programming Historian lesson by Ian Milligan.
Downloading in Bulk with Wget, blog post by the Internet Archive.
Boston Public Library Anti-Slavery Collection
Data Mining the Internet Archive Collection”, Programming Historian lesson by Caleb McDaniel.
“Downloading Multiple Records Using Query Strings, Programming Historian lesson by Adam Crymble.
HistoryCrawler Virtual Machine.