Topic Modelling in the Lyrics Database, Part Three: Talking to Wolfram Alpha

(and here my series continues… I’m blogging through August mainly to keep the work going when it can be so easy to sneak away, and this is more of an internal diary than anything else!)

Importing economic data into Mathematica – it really is this easy…

Mathematica is made by the same company, Wolfram Research, that brings us Wolfram Alpha – the computational knowledge engine that powers parts of Siri, as well as being an overall fun resource to use as historians, tinkerers, or well, anybody (I’ve written about it before). As a diversion, I thought I would start comparing economic data to the topics that I am finding through MALLET.

Using free-form input, let’s get annual figures for unemployment in the US, 1964-1989.

With that done, we can then manipulate our data – getting them into comparable datasets – and begin to run correlations. Let’s see if we can find correlations in topic occurrences against the unemployment rate…

Where do we find the Max and Min correlations?

The most correlated is topic 17, “don tonight night give wanna fire rock inside lover good lose start body light burning crazy dream hot waiting” and the least correlated is topic 39 “baby girl time world soul ya goodbye boy hear goin don girls fool man ring lies waiting make tears.”

So when the US unemployment rate is high, songs lyrics contain references to fire, rock, lover, burning, crazy, dream, hot, waiting; and when it’s lowest, the topic about baby, girls, world, goodbye, boy, fool, waiting, tears is the most common.

What is happening here?

Well, we actually end up getting a sense of the “golden age” of American unemployment versus the more dour days of the 1980s. Turns out topic 39 is the most popular topic for the late 1960s: a time of hope, yes, but also low unemployment. And, then, topic 17 is the most popular topic in the 1980s, especially during Reagan’s first term.

These are such wild swings in the unemployment rate – the 1980s being highest, and the 1960s being lowest, that we’re seeing it here.

More digging will have to be done, I guess.

2 thoughts on “Topic Modelling in the Lyrics Database, Part Three: Talking to Wolfram Alpha

  1. tedunderwood says:

    Very intriguing. One way you might check the significance of correlations like these is to control for the “false discovery rate” (see Wikipedia). Basically, any time you check a bunch of different time series for correlations at the same time, you’re going to get some striking correlations more or less by accident. Controlling for the false discovery rate is a way to tell whether the correlations you’ve discovered are *so* striking that they’re unlikely to have occurred by accident.

  2. Ian Milligan says:

    Thanks, Ted – I’m reading up on that now and will try to implement the control into this. It’s a good thing to incorporate, esp. as my datasets get bigger and more topics may be added. Really appreciate the suggestion!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s