Topic Modelling in the Lyrics Database, Part Three: Talking to Wolfram Alpha

(and here my series continues… I’m blogging through August mainly to keep the work going when it can be so easy to sneak away, and this is more of an internal diary than anything else!)

Importing economic data into Mathematica – it really is this easy…

Mathematica is made by the same company, Wolfram Research, that brings us Wolfram Alpha – the computational knowledge engine that powers parts of Siri, as well as being an overall fun resource to use as historians, tinkerers, or well, anybody (I’ve written about it before). As a diversion, I thought I would start comparing economic data to the topics that I am finding through MALLET.

Using free-form input, let’s get annual figures for unemployment in the US, 1964-1989.

With that done, we can then manipulate our data – getting them into comparable datasets – and begin to run correlations. Let’s see if we can find correlations in topic occurrences against the unemployment rate… Read more

Topic Modelling in the Lyrics Database, Part Two: Finding Trends

A bit of a mess of a visualization, but here’s if we put all the different topics against each other. Using tooltips, we can figure out the most common throughout – and turn them on and off.

In yesterday’s post, I introduced some of the work I’ve been doing with MALLET and provided a list of topics, sparklines, etc. I wanted today to pull some of that data out and see what trends we could find in the database. Many of the topics found had simple little spikes: a single year where the topic was significant, but not part of a broader trend. Twenty, however, had either raises or falls, and looked like they were worth more investigation.

I divide them into four groups: (1) those who became more prominent since 1964 until 1989; (2) those who became less prominent; (3) those that were always prominent; and (4) those who displayed other sorts of statistical behaviour. Let’s take a closer look.

In a future post, I will be going into detail with individual songs to find exemplars of these topics. This topic is more speculation about what might be happening, and helping us think about how these topics could help us in our scholarship… Read more

Topic Modelling in the Lyrics Database, Part One: Checking Out Topics

I’ve been playing a lot with MALLET (MAchine Learning for LanguagE Toolkit), a command-line program developed at UMass Amherst. Combining it with my Top 40 Lyrics DB, which I’ve discussed elsewhere, I’ve been able to pick out frequently occurring clusters of words (or topics – hence “topic modelling”). With this corpus, after some experimentation, I began with picking out the top 50 topics that appeared.

As I spend the last pre-teaching month of the summer trying to program at least half a day everyday (the other half is book and article writing/revising time), I’ve been having a lot of fun tinkering with this material. Topic modelling is proving even more fruitful than keyword searching, mainly as the data comes to me rather than the other way around.

The only downside of MALLET is that the output can be a bit opaque without putting it into another environment. Shawn Graham has a great series on using the Gephi GUI to process it (if you want to use MALLET yourself, his how-to guide is an amazing resource; we have a forthcoming piece in the Programming Historian 2 that will also help new users). I’ve been importing it into Mathematica, my own programming platform of choice. Below is my first level of visualizing, a series of sparklines with topics. After this, I can take the number, plug it into another Mathematica cell, and look at the findings in a bit more detail. Read more

A Picture is Worth a Thousand Words: Visualizing the Past

An inspiring historical visualization of Napoleon’s 1812 campaign (please click to see it).

I have recently been trying to figure out good ways of representing large amounts of historical information in a way that makes sense to everybody who might stumble across my work! I think that a good graphic has the ability to draw readers into what we do, letting us convey the scope, joy, or horror of history without needing to read through often dense prose. In this post, I want to give a sense of what I think works, what doesn’t, and why we should start thinking about cool maps, graphs, and charts!

Read more