An Aside: An Ode to Exploratory Research

With the semester done, the search committee I was on having wrapped up, and – finally – my two article drafts (one on moral panics on the early Canadian Internet and one on WebArchiving) completed, I had an entire afternoon of guilt-free exploratory research.

I love exploratory research. A forthcoming article grew out of exploratory research (blogged about here), when I was messing around with dissertations and citation counts. Now, on the other hand, that obscures out the days and days that I’ve literally spent hitting up against dead ends, batting my wall up against bad data, technical limitations, or sources that never really went everywhere. I have a terabytes of external hard drives, filled up with datasets, some of them representing a few days of work that won’t soon see the light of day.

So it’s nice to have these guilt free days to just play, in a constructive way. I think of it as akin to the Google 20% time. To check out what’s new at the Internet Archive. To take an abstract problem and play with it in a programming language. To listen to a CBC debate on Canadian history.

So what have I discovered today?

– Well, Jim Clifford and I spent a good hour figuring out how to do a pattern match for a list made up of mixed integers and strings in Mathematica. Sounds boring, eh? Next time this comes up, it won’t take an hour – it’ll take a second. That’s the joy of programming. And I’ve hopefully made just a small, minor contribution to the understanding of global historical commodity flows.🙂

And because it took so damn long: Cases[test, {x_ /; 1620 <= x <= 1629, _, _}] got us the dates we needed.

– You can download Internet Archive collections en masse, thanks to their openness to using wget on their collections! Check out the blog post here, and try generating your own list here. I’m downloading a massive collection of magazines right now. Once again, there’s a good chance that it’ll just sit on my external hard drive and collect digital dust. But who knows. (p.s. you can check wget out at the Programming Historian)

This command, which you can see broken down on the Internet Archive blog, is downloading an entire item list for me – just the text files and some of the images.

wget -r -H -nc -np -nH --cut-dirs=2 -t 1 -A .txt,.jpg -e robots=off -l1 -i ./itemlist.txt -B 'http://archive.org/download/'

You could use that to download cookbooks en masse, or Statistical reports, or really anything you might want.

– I discovered a new OS X command, textutil. There’s probably a reason I’ve never heard of it before, but it worked on some Lynx-formatted web archives where html2text balked. This is probably more my fault than html2text’s, but it’s still nice to have a built-in terminal command for this stuff.

– Also, the coffee over at Engineering’s appropriately named “Coffee and Donuts” has more caffeine than the President’s Choice stuff I normally drink here. Gah!

Come Monday morning, I’ll be back at the main projects: wrangling these articles together, preparing a conference presentation, responding to the e-mails that are stacking up, and so forth. But for the rest of the day, it’s play time.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s