Running Shine Locally on a Collection of ARC/WARC Files

TL;DR? You can find a walkthrough here. To find out what it does, read on.

As Twitter followers will know, I’ve been playing with the UK Web Archive’s Shine front-end over the last week or so. I think it’s a fantastic front-end to a collection, and helps you get both a birds-eyes view of a collection and to have the ability to dive into concordances or the pages themselves. I’m currently indexing the material in the University of Toronto’s Canadian Political Parties and Political Interest Groups Archive-It collection and exploring it.

Results are still indexing (I’ve got about 10% left) so there may be changes to the data below, but it allows us to do things like this:

Screen Shot 2015-05-27 at 9.40.08 AM

Screen Shot 2015-05-27 at 9.22.27 AM

Screen Shot 2015-05-27 at 9.34.16 AM

Screen Shot 2015-05-27 at 9.35.40 AM

Observers of the Canadian political scene may find some of the above examples intriguing Again, once indexing is done tonight and I’m able to port it over to another machine, I’ll put some results up.

Do you want to do this yourself?

One of my RAs, Shawn Dickinson, an incoming MA student at the University of Waterloo who has done quite a bit of web-based digital humanities work (his web-based app for creating manuscript corpuses is especially intriguing), worked on getting this working on our OS X machines. It was subsequently tested by David Hussey, a current MA student. David’s also been working on some great MALLET to time-ordered data java code, which we’ll share soon.

You can find the walkthrough here. Please note any suggestions or changes: it’s very much a living document.

Stay tuned for some more news later next week. Things have been a bit quiet as we’re in a bit of a sprint to have some cool stuff to show off at the Web Archiving Collaboration conference at Columbia next week.

3 thoughts on “Running Shine Locally on a Collection of ARC/WARC Files

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s