Using Warcbase with a Spark Notebook: What it is, and how to set it up

Those who were at the Web Archives 2015 conference and stuck around for my closing keynote saw some glimpses of one project that I’m part of: making warcbase, a powerful platform for hosting and providing analytics on web archives that’s developed by Jimmy Lin of the University of Waterloo, accessible to humanities researchers.

Screen Shot 2015-11-24 at 9.27.33 AM

Dynamically exploring a collection of ARC files, seeing what’s in them using Spark Notebook.

Long-time readers know that I’ve been a Mathematica programmer for the last five or six years. I’ve loved the notebook metaphor: a way to mix rich text, code, outputs, and visualizations together into a document. You can explain what you’re doing in rich marked-up text, run code, manipulate outputs, and basically have a data-rich document that helps inform what’s going on.

Those using the Jupyter platform (which grew out of the iPython project, which itself continues), have been able to experience this metaphor yourself.

jupyterpreview

From the Jupyter website, showing a living document

The idea was to take this richness and let you use Warcbase with it: once you have it up and running, you can use a GUI to run our scripts, rapidly prototype your own scripts (using a smaller subset of data to see if things work), and get a sense of your overall collection contours.

Want to see it for yourself?

  • This walkthrough, “Installing and Running Spark on OS X,” written by undergraduate research assistant extraordinaire Alice Zhou, shows you how to get everything set up.
  • This walkthrough, “Spark on EC2 or Compute Canada,” written by me (and thus far less extraordinary), shows you how to set up warcbase on a vanilla Ubuntu machine you could spin up in Amazon or another service provider (such as Compute Canada here in Canada).

Have fun warcbasing!

One thought on “Using Warcbase with a Spark Notebook: What it is, and how to set it up

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s