Those who were at the Web Archives 2015 conference and stuck around for my closing keynote saw some glimpses of one project that I’m part of: making warcbase, a powerful platform for hosting and providing analytics on web archives that’s developed by Jimmy Lin of the University of Waterloo, accessible to humanities researchers.
Long-time readers know that I’ve been a Mathematica programmer for the last five or six years. I’ve loved the notebook metaphor: a way to mix rich text, code, outputs, and visualizations together into a document. You can explain what you’re doing in rich marked-up text, run code, manipulate outputs, and basically have a data-rich document that helps inform what’s going on.
Those using the Jupyter platform (which grew out of the iPython project, which itself continues), have been able to experience this metaphor yourself.
The idea was to take this richness and let you use Warcbase with it: once you have it up and running, you can use a GUI to run our scripts, rapidly prototype your own scripts (using a smaller subset of data to see if things work), and get a sense of your overall collection contours.
Want to see it for yourself?
- This walkthrough, “Installing and Running Spark on OS X,” written by undergraduate research assistant extraordinaire Alice Zhou, shows you how to get everything set up.
- This walkthrough, “Spark on EC2 or Compute Canada,” written by me (and thus far less extraordinary), shows you how to set up warcbase on a vanilla Ubuntu machine you could spin up in Amazon or another service provider (such as Compute Canada here in Canada).
Have fun warcbasing!