Quick Post: Web Archive Sentiment Analysis with Mathematica

The Green Party tends to be happy, I guess? At least in October 2005.

I’m continuing to be impressed with the new features found within Mathematica 10.2 (see my recent posts on geo-extraction and extracting person entities). Sentiment analysis is a snap, although the findings will probably need a bit more exploration. We’re trying it out on the Canadian Political Party and Political Interest Groups collection, which you can also play with at webarchives.ca.

In short, “positive” sentiment analysis within a political party tends to find happy taglines, advertisements for community meetings that really do stress “fun” and “entertainment,” and announcements referring to great “fanfare” and meetings at places. The Green Party, Canada’s smallest major party, had a lot of pretty casual content back in 2005, for example, lending itself to this (pub nights, for example).

A positive example:

The Halton Federal EDA is hosting a Fall Fun Day Oct. 16th from 11:30am-3:30pm at Lowville Park on Guelph line just south of Derry Road. We will have pumpkin decorating, corn on the cob, scavenger hunt and back pack safety tips to save your posture and your back. We welcome greens to attend this fun afternoon with us. This is the first event for this newly formed association which is showing its strength with an event so soon after its assiciations creation on August 25 2005.

On the other hand, their negative content speaks to their frustration as they try to make their way as a party:

Many people are cynical about politics. They think that nothing will ever change. They say we can never have the government we really want. The Green Party understands that frustration – we are frustrated too. But cynicism and frustration will not solve our problem – we can curse the darkness, or light a candle. Will we ever have a better choice than the lesser of two evils? Yes, if we vote for a party who will not settle for the status quo. What will we do when our lands, our waters and our ecosystems can no longer support the demands we make of them? If we manage our resources well, they will sustain us. The greatest mistake we can make is to think that we have no power.

Unfortunately, the negative analysis also swept up a lot of “Internal Server Errors” – they are rather sad, indeed. By deleting duplicate text like that, we were able to get rid of those save for one.

The timings on this were impressive, with about 420 seconds to process each of positive and negative sentiment for an annual datescrape of the Green Party of Canada’s page.

Our next step is to compare to the other political parties.

Code below:

text=Import["/users/ianmilligan1/Dropbox/WAHR-Private/sample-text/200510-greenparty.txt","Lines"];
positive=TextCases[text,"PositiveSentiment"];
negative=TextCases[text,"NegativeSentiment"];
neutral=TextCases[text,"NeutralSentiment"];

(* to display, delete duplicate boilerplate data *)
DeleteDuplicates[Cases[positive,{_}]]
DeleteDuplicates[Cases[negative,{_}]]
DeleteDuplicates[Cases[neutral,{_}]]

One thought on “Quick Post: Web Archive Sentiment Analysis with Mathematica

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s