Thinking about a comment by Jonathan Goodwin on my last post about extracting weather data for specified dates in a historical document, I looked into the Wolfram|Alpha API. Luckily, it’s pretty powerful and you could then replicate some of these findings within your own programming language or environment. For those of you who haven’t used an API, I want to just give you a quick sense of how you can call this information yourself. This is a very basic introduction (i.e. letting you know the API exists), but I remember my first tinkering with APIs so hopefully this can help.
It could also be used to jump-start work into the other things I’ve been doing with Mathematica.
Step One: Sign up for an API key. Go to http://products.wolframalpha.com/api/, and sign up. If you already have a Wolfram account, you can use that, otherwise just set one up yourself.
A free account is limited to 2,000 calls a day. For your own research, non-commercial purposes that should be enough for at least a proof of concept. It’s not perfect, as you might be dealing with big data, but in the meantime.
Step Two: Start querying Wolfram|Alpha using the API key. Your application key pops up when you first create it, but you can also click on the ‘Edit’ button below it to see your AppID.
Let’s begin with a very basic query:
You’ll need to replace APPID with the one you just got from Wolfram, but then the following data should pop up:
Now, you could make a more specific query: for example, this chose Toronto (Ontario) – correctly – but you might have wanted Toronto, Iowa or Toronto, South Dakota. In that case, you would have to be more specific.
That brings up the correct data.
Again, the URL is thus broken down in the following way:
A query: what you’re asking – you can copy it from the Wolfram|Alpha website proper, and refine it based on the results you’re getting there. i.e.
An app ID: the code that lets Wolfram know who you are and that you’re allowed to get this information.
Step Three: Start Running Documents Through it
Let’s say you then wanted to extract dates from a document and run them through the Wolfram|Alpha API. How could you do that if you didn’t have Mathematica? I’m just going to give a few vague steps:
(a) Use a Regular Expression to extract all of the dates. Regular expressions are patterns that can find particular strings of text, such as dates that occur in a regular pattern. You’ll unfortunately have to tinker with this, but if your document collection uses dates in a regular fashion you could pull one up.
These are very complicated. Luckily, with your Google-Fu, you can usually find some pre-existing regular expressions in libraries. They’re pretty epic, though. For example, to match dates like 31 January 2003, here is the code:
Luckily, Regular Expressions are supported by tons of programming languages, such as Python.
(b) With these dates, you could then convert them into the input form. i.e. say you wanted weather, you would set up a quick program that took that list of dates, added pluses in between each section (i.e. 31 January 2003 to 31-January-2003) and made them into strings like:
(c) Then put them into the API format: i.e.
(d) Start querying these, and getting the results. You’ll then have a ton of information in XML format. You can then begin tinkering around with the kind of results you actually want, by appending extra information after your APPID. For example, to get plaintext results, you could append
to the end.
At this point, you should have at least seen some of the potential here. Check out the API documentation, and start thinking about how you could harness Wolfram|Alpha to your primary sources.