This is the first of a series of posts exploring some of the work I’ve been doing with SEASR. This has been complementing my ongoing work in Mathematica, and I’m actually finding the two complement each other well. Occasionally, it’s proven fruitful to try to incorporate SEASR workflows into Mathematica, as I’m always looking for neat things to play with.
This month, I was at the University of Victoria’s Digital Humanities Summer Institute, attending the SEASR Analytics course. I’m glad I did: the tools and skills that I learned there have enabled me to set up what I consider to be a pretty top-notch suite of textual analysis tools. In this post, I just want to quickly introduce you to the environment and show you a few of the neat things that you can quickly do.
Setting up the Meandre workbench, which is the user environment, is a bit on the tricky side although not impossible for beginning and intermediate users. If you do not have institutional support, you will have to set up a server on your home system and locally access it. There is documentation available online, however, to help you through this process. It can be a bit persnickety, but once it is running apart from the occasional reboot you should be good to go. The downside to the power of the environment is that it is somewhat resource intensive: my fairly poky Mac with 4GB RAM chokes when I dump too much data into it.
I’ve now installed it on two systems, and am happy to chat with anybody who’s having any trouble.
Once it is up and running, the environment is superb. As you can see here, it is a graphical approach to creating workflows. For this first example, I will show you a simple work flow so you can get a sense of how it works. Then we’ll conclude with some actually cool examples.
I’ve chosen this one, as it is a simple one “Demo POS,” or part-of-speech. What this does is takes text and tags each part of speech. Let’s think about the flow that this data takes: you start with input, the system figures out what kind of data it is, extracts the text, detects sentences, tokenizes them (splitting it up into chunks), and then tags it with various tags. The remaining commands are putting it into a usable format. Let’s run it.
The first thing that emerges in this version is an Input window. You can provide your inputs as text and forego this step, or even do an entire directory. But here we are:
Within a few seconds, we now get our output:
What the heck is all this? We can use this legend to decode it. It does a pretty decent job of breaking my text into parts of speech. Not too useful on its own, but we could begin to look at adjectives, changing patterns, etc. But in any case, this is a good example of how the system works!
In the next few posts, I’ll explore different areas.
But let’s tease you with some great examples:
SEASR has a fascinating sentiment analysis extraction workflow. Read more about it here, but essentially you’re tracking the emotional sentiment within a given text or corpus. For example, political speeches – here I’ve used Stephen Harper’s 2012 Throne Speech (our Canadian version of a State of the Union) – have a consistent form. Bad news comes in the middle:
Let’s apply the same test to ALL Throne Speeches between 1867 and 1972 (when the formatting changes a bit):
Or if we take every song that charted on the Billboard Top 40 between 1964 and 1989, we see a general reduction in happiness and a slight increase in anger. It doesn’t seem too pronounced, but trust me, these are statistically significant shifts.
Or here, taking John A MacDonald’s Dictionary of Canadian Biography, some processes that extract entities (i.e. people, places, etc.) and find relations. Two visualizations follow (the last one reminds me of Tank Wars from my childhood):
Over the next few weeks, time permitting (book manuscript revisions take up half of my time), I hope to share more experiences with you.