Using Mathematica to Plot Locations Mentioned in Web Archives

We’ve been using warcbase to extract entities from different domains within the Canadian political parties and interest groups collection. While previously I’ve used the Google Maps API/Many Eyes to quickly visualize these things, this morning I wondered what we could do with Mathematica 10’s (relatively) new geographic visualization services.

The results were promising. Semantic Interpretation, in particular, is pretty good although I do need to learn to tweak it a bit better – consider the results here:

Each correctly isolated entity has quite a bit of information - Calgary for example is also attached to the administrative unit of Alberta and the country of Canada.

Each correctly isolated entity has quite a bit of information – Calgary for example is also attached to the administrative unit of Alberta and the country of Canada.

As a quick and dirty workaround to ambiguity, I used Wolfram|Alpha to grab the latitude and longitude of each point and then map them. The results, pictured here for Conservative.ca, were very promising:

Conservative-Frequency-Map

Distribution of locations mentioned in the Conservative Party of Canada’s website from February 2009.

We can see, for example, the relatively high frequency of Calgary (home to the Conservative Party of Canada), and the extremely low frequency of the city of Toronto (Canada’s largest city, albeit not a large base for the party).

We can also zoom in on different sections of the world. Switching to the New Democratic Party of Canada, here is just “Canada” (using CountryData[“Canada”] as my GeoRange):

Screen Shot 2015-08-05 at 12.06.45 PM

Or even the European Union:

Screen Shot 2015-08-05 at 12.08.12 PM

This is all easily automated.

Here’s the code, which is also all available on GitHub.

th = 150; (* set threshold here - i.e. 150 *)

locfreqraw = 
  Import["~/dropbox/Warcbase-NER-Visualization/ndp-200902-loc-freq.\
txt", "Lines"];

processedfreq = {StringTrim[
       StringSplit[#, i : ("" | "$" ~~ NumberString) :> i]][[3]], 
     ToExpression[
      StringTrim[
        StringSplit[#, 
         i : ("" | "$" ~~ NumberString) :> i]][[2]]]} & /@ locfreqraw;

loclist = 
  Interpreter["Location"][#] & /@ Take[processedfreq[[All, 1]], th];

list = Transpose@{Take[loclist, th], 
    Take[processedfreq[[All, 2]], th]};

delpos = Position[list, _Entity, Infinity][[All, 1]];
cleanlist = Delete[list, Partition[delpos, 1]];

GeoRegionValuePlot[cleanlist,(*PlotLegends\[Rule]Placed[Histogram,\
Below],*)PlotLabel -> "Distribution of Locations", 
 ColorFunction -> ColorData["BrightBands"], PlotRange -> 700, 
 PlotMarkers -> GeoMarker, ImageSize -> Large, PlotRange -> All, GeoRange -> "World"]

You can switch “World” to values like CountryData[“EuropeanUnion”] to change things there.

3 thoughts on “Using Mathematica to Plot Locations Mentioned in Web Archives

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s