Topic Modelling in the Lyrics Database, Part Two: Finding Trends

A bit of a mess of a visualization, but here’s if we put all the different topics against each other. Using tooltips, we can figure out the most common throughout – and turn them on and off.

In yesterday’s post, I introduced some of the work I’ve been doing with MALLET and provided a list of topics, sparklines, etc. I wanted today to pull some of that data out and see what trends we could find in the database. Many of the topics found had simple little spikes: a single year where the topic was significant, but not part of a broader trend. Twenty, however, had either raises or falls, and looked like they were worth more investigation.

I divide them into four groups: (1) those who became more prominent since 1964 until 1989; (2) those who became less prominent; (3) those that were always prominent; and (4) those who displayed other sorts of statistical behaviour. Let’s take a closer look.

In a future post, I will be going into detail with individual songs to find exemplars of these topics. This topic is more speculation about what might be happening, and helping us think about how these topics could help us in our scholarship…

First, it’s worth noting that these are all relative frequencies, compared to themselves – not on a scale. Above, we see what it looks like on a scale: the pink peaks in the back represent a “love” topic. It will be fun to find out the most representative song of these topics.

More Prominent Since 1964

Selected Topics, Top 40 Lyrics Database, showing an increase post-1964.

Here we’ve isolated four topics, and we can use their sparklines to get a sense of how it looks. What are these four topics:

6. “heart time chorus wanna turn living dancing money ready man radio loved lot matter pop til power forever moon”

17. “don tonight night give wanna fire rock insider lover good lose start body light burning crazy dream hot waiting”

41. “control wild boys war round daddy foolish danger afraid closer gimme starts hero wrap colors gas tearing shape”

46. “ooh feel night back gonna feeling head roll face make play show ll crazy days dream anymore slow part”

Our data isn’t perfect (a popupwindow has snuck in there, removed above, an artifact of the scraping), but it’s pretty good. Topic 41 is pretty self explanatory: a topic that expresses fear, gas, war, control, boys – tapping into mid-1980s paranoia, responding to the Cold War. Stranger, however, is that it also has its first peak in 1964 (!). This calls for further exploration. As noted, in a future post, we’ll begin to take exemplars of these topics to read closer into the data. But for now, let’s continue at the macro level.

The other topics are not as self-explanatory, requiring parsing: topic 6 refers to money, dancing, pop, forever, living, dancing – a 1980s exuberance, similar to the “wanna fire rock inside lover good lose start body light burning crazy dream hot waiting” topic (17). It’s unique as it shows a very steady increase – a topic that’s always there, but is one of the most popular by the late 1980s.

I’m really looking forward to pulling out representative exemplars of these topics.

Less Prominent Since 1964

Selected Topics, Top 40 Lyrics Database, showing a decrease post-1964.

Six topics now: we can see a decline in the more smarmy poppy love songs of the early-1960s: love is a little less popular, which jives well with our SEASR Sentiment Analysis: the darlings, the daddies, the “happy”s, the “doo”s, the “crying” and the “happiness.”

There’s also a spike with topic 42, around the early 1970s: “funky” makes its appearance, as it does in any visualization of early 1970s music!

Again, I’ll look forward to getting into the exemplars in a future post.

Consistant Topics

Selected Topics, Top 40 Lyrics Database, showing consistant appearances.

Strangely, 1964 seems to be a bit of an outlier – something that future work will have to look into (is there an error in the database? Or is 1964 just the last vestiges of previous topics?). These all represent additional love topics, that always continue: “love” “friends” “girl” “Kiss” “man”, etc.

It would be neat to find the most representative song.


Selected Topics, Top 40 Lyrics Database, showing “oddball” appearances.

Here we see three topics that have two different peaks. It would be neat to see where these are coming from, especially 7: “refuse gun” is especially alluring. Could we have a “retro” topic here, or are these coincidences?

All of this doesn’t give me enough conclusions, but is starting to give us a sense of some of the overall contours of this collection. It gives me a few ideas of things to run through my n-gram visualizer tool.

Next steps:

(1) Get into the individual songs, looking at case studies – this can help bring the topics into relief.
(2) Begin to look for interconnections using SEASR and some of my other Mathematica work.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s