Another day, another short post. We’ve been working with Jimmy Lin‘s cluster at the University of Maryland – Jimmy’s been helping us run pig scripts and get access to things. Some more results have been illuminating.
The first script that we ran pulled out all the links to social media platforms (YouTube, Twitter, and Facebook) and aggregated them by top-level domains. This means that if say liberal.ca linked to twitter.com, and so did liberal.ca/page1 and liberal.ca/page2, the resulting chart would say liberal.ca linked to twitter.com three times. Repeat that over 11,620,105 sites collected by the University of Toronto between 2005 and 2009 and you get some neat results. Here we’re looking at Canadian political parties and interest groups.
It goes without saying that none of this could happen without collaboration. My desktop would still be crunching along if running locally on that first question, probably.
For all of these results, we have a time slider at the bottom in Gephi that lets us track change over time.
Here we’re seeing what political parties and interest groups, between 2005 and 2009, linked to various social media nodes. Who adopted Twitter first? (the Liberals, it appears) Who stuck to Facebook by 2009, and who embraced YouTube (everybody). It’s a neat way to track communication medium change over time. Note: I’ve been exploring a few ways to launch interactive and dynamic visualizations, but haven’t found one that does temporal aspects well yet. So you’ll have to just trust me. Linkurious might be the best solution I’ve found to put things online, in theory, but it isn’t quite running for this post.
The second was to then do the same for links within the collection, but instead of twitter, YouTube, and Facebook, we did it for the following URLs: conservative, conservateur, liberal, ndp, npd. The results, once we plotted them to Gephi, naturally mimicked the political structure of Canadian politics:
Same idea as above, just minus my shaky handwriting. We see the sites that are only linked to/from the NDP, those who are linked to/from the NDP and the Liberals (a substantial number), those who are linked to/from all three, and those who only link to the Conservatives or the Liberals. What I found fascinating: within this collection, the number of sites that linked only to the Liberals was limited.
Now we’re of course seeing seed list limitations here. The Globe isn’t included, so you’d need to zoom in to see the directional arrows (i.e. those are links from the liberals to theglobeandmail.com, not the other way around). This is another vote in favour of us getting access to seed lists.
Again, we have longitudinal information so we can begin to see moments in time. Here, for example, is the 2005 federal election. We’re seeing the NDP heavily linking to Liberal pages, presumably in some sort of attack situation.
Next up: getting Gephi properly working, so we can begin to export it easily to the Web. If you move the slider, you can begin to see a story about each political party.
Of course, the weight has to be used carefully. If a link to a website is put into every page, for example, it’ll be heavily weighted; does that mean more than one carefully placed link on the front splash page?
All food for thought. But for three days of tinkering around with web archives, we’re really beginning to see stuff going.
Three cheers for collaboration!