Topic Modeling Web Archive Modularity Classes

Screen Shot 2015-02-05 at 1.23.18 PMThis is a brief follow up to Tuesday’s post. By allowing some recursive downloading, I grabbed quick snapshots from the Wayback Machine of the sites that fell within both the Conservative and Liberal party websites in 2006 and 2014. After converting to text, the Mark I eyeball found some interesting things: economic development for the Conservatives, more social justice websites for the Liberals.

But with web archives, the Mark I eyeball isn’t enough. Topic modelling turned up some interesting results, however. I’ve pasted some findings before, but some highlights:

  • Finding child care plans in the 2006 Liberal modularity class: a perennial promise of that party, this is a good thing to find;
  • Way more emphasis towards Aboriginals in the Conservative sphere from 2006. I’m not quite sure why, but it’s at least an area to dig more in;
  • Current data is very good: the Conservatives care about our economic action plan, and that appears in 2014;
  • The Liberal’s attachment to social movements comes through.

Let’s look deeper, shall we?

Here, for example, are twenty-five topics in the 2014 sphere:

1. today year work business building communities health family announced great
 2. web www http bloomberg news markets bonds east energy law
 3. im http web jpg ca content wp uploads feature jim
 4. html nytimes events trade free june nyt st international immigration
 5. mp support https announces facebook status larry calgary days post
 6. ottawa en time archive office public internet machine net project
 7. canadian web http aug resources gif liveleak edge foreign families
 8. parliament member women bill website members hill rights videos july
 9. minister riding canada prime harper stephen profile uniter statement parliamentary
 10. de la du le les des pour infrastructure au en
 11. media ca im www http jpg png resampled big markwarawa
 12. government day national tax seniors funding fund ukraine constituents supports
 13. canada canadians jobs community country ndp people email summer sign
 14. web ca http www photo gallery multimedia video releases buttonvid
 15. http web www maximebernier ca constituency category brucestanton tag saint
 16. gc ca federal links conservative pm service made act response
 17. id page contact house org report disclosure commons society travel
 18. web http ca www darylkramp jasonkenney bevshipley peterbraid larrymaguire patriciadavidson
 19. david program canada million opportunities john view funding wilks anniversary
 20. web http www philmccolemanmp financial jamesrajottemp canadianmanufacturing personal room finance
 21. index dec read eng local budget html pages call list
 22. ca web http www home images articles reports tillygordon liberal
 23. canada services twitter economic plan action government search randykamp online
 24. news honourable development asp state years behalf social press economy
 25. php information world centre issues south jan veterans open war

Some key ones: economic development, family, immigration, legislation, women’s issues, senior issues, Ukrainians, constituency offices, some prominent (and not-so-prominent) MPs, and of course, our economic action plan. Given what I know about Canadian politics, this makes sense.

And here are the Liberal topics, also from 2014:

1. http uk web net dpac category action cuts social education
 2. element ontario cupe home issues poverty child options coalition street
 3. liberal party justin trudeau day statement canada ca newsroom national
 4. de la des le les du femmes pour au une
 5. org contact www facebook archive work birminghammail machine legal internet
 6. rights php index length var week case policy human open
 7. return function var utm join id https flipboard object source
 8. www ca web http gc page aug index code eng
 9. en ca events aspx lang http mi change digital cfuw
 10. ca news https yahoo video family good blogs man videos
 11. web http ca www health english support mentalhealthcommission yourlegalrights bc
 12. http web wordpress theglobeandmail globe share johnnyvoid sign disabled sanctions
 13. http www web straight account world business site year link
 14. web im images png media city files cdn convention council
 15. http web ca membership www budget canadianlabour vimeo date labour
 16. web people information june centre life august workers meeting justice
 17. web http cbc html twitter revenuegroup time archives sports days
 18. www ca http web im macleans jpg content wp uploads
 19. web program http srchc community group committee apc resources free
 20. www http web womensworlds stop channel torontostopthecuts arts ipetitions mobile
 21. web ca http women services www qc fr ftq report
 22. event cbc email live est password pm blog search instructions
 23. canada radio ici ca img src nouvelles mp sn vancouver
 24. http web www ca npd menu main ford psacunion press
 25. toronto public node ocap sites read march housing smugmug form

Very different: Justin Trudeau (the new leader), cuts to social programs, child poverty, mental health, municipal issues, labour, workers, Stop the Cuts, and housing. Again, this isn’t based on nearest neighbour links, this is the modularity community that has popped up around

Compare to the two groups from 2006, where things get curious. Here are the Conservatives:

1. images jpg nova bc scotia ubcic indians union splash sen
 2. starbulletin history relationship water nov centre shtml danielnpaul order releases
 3. gif fpimages spacer mph front aug crest armoirie navbar text
 4. rights treaty peoples make conservative home party fpnp accord climate
 5. http web page www shawnmurphymp browser regina halifax statement administration
 6. html people canadian email uregina system article justice annett live
 7. site information change search javascript alberta harper resources premier environmental
 8. index php events option view task bcafn day year itemid
 9. web ca www http turning weblogs education gc macleans yahoo
 10. htm support jan archives church united university calendar years house
 11. web www http epa gov cgi bin epalink unique heritage
 12. org archive library building profit https website work children research
 13. canada du aboriginal gouvernement genocide land schedule parti nato computer
 14. point residential services forum schools legal students shawn general law
 15. web http ca www usask net srpska pre irsss property
 16. de la des en les le sur pour ressources au
 17. contact indian logo iisd node november paul file december act
 18. time gc response indigenous crawl fran society ais declaration issues
 19. dec pdf nation book columbia british family long members line
 20. directory cfm government taxonomy parentid office kevin network industrial action
 21. machine archive internet wayback subject initiative terms feb document kosovo
 22. id asp news default item thisdate links gallery international mi
 23. digital form sites artifacts cultural chief federal organization member top
 24. http ca im arts knet main media staff community global
 25. nations council summit health leadership open include inuit process development

A ton of stuff on First Nations issues! I find this really curious. If anything, it helps us think about a new question to ask of this archive: why is this such a big issue? Perhaps there was a big campaign against the Conservatives that had to do with the end of the Paul Martin First Nations accord? But for whatever reason, aboriginal issues have popped up..

Some other ones: education issues, university issues, legal issues.

My other suspicion is that wasn’t as linked into an incipient blogosphere back then, at least in our web archive, as opposed to (note my video from before).

Questions are appearing, which is good…

The 2006 Liberal one is also interesting:

1. web http www im nav annamather athabascau annemclellan acadie cat
 2. time htm links response work crawl search service family online
 3. northern behcho territories northwest cho tli yellowknife canadians programs life
 4. asp events siskinds pdf click sites campaign school call default
 5. index web http ca www uwinnipeg bevdesjarlais hosting design directory
 6. site services id cfm community bill city toronto access jan
 7. de la en dec des vous est une au par
 8. le les include open electoral pour system sur systems dans
 9. women public office canadian riding feb education comments english election
 10. gif spacer ndp email ford future blank pix main javascript
 11. web www http htm mail high eastlink webmail igs electsusanthompson
 12. ca http www web gc unb mailto parl garymerasty number
 13. canada liberal member university find report good forward change strong
 14. aug issues rights press world reserved simcom political history year
 15. html php contact students people current index business ontario departments
 16. ca votemalhi hs dvobzmez svtsfnxbf fg iwkbomr dexqscr wsspbc necshkvfxiokytab
 17. internet machine archive wayback initiative terms subject privacy free policy
 18. news htm party government releases website full communities release story
 19. org php archive https mp heritage worldbank alberta temp gallery
 20. government parliament care child minister national learning harper st early
 21. images jpg page home aspx interbaun global splash icons button
 22. information support federal years research conservative vehicle showroom liberal questions
 23. digital building form cultural library profit artifacts media development today
 24. du content section law image publications view task herald agent
 25. web http www entrust net resources mun ssl peelsb technical

Again, community questions, electoral topics, universities, human rights, child care support (which makes sense!!) and so forth.

3 thoughts on “Topic Modeling Web Archive Modularity Classes

  • Ian,

    First Nations issues have been a key priority for the Conservative Party under Harper, so there presence in web links should not be a surprise. For many years, Tom Flanagan was an important member of the inner circle and helped shape policy. He also spent his career writing about aboriginal issues. Those ideas are not popular with all Canadian academics, but everyone recognizes that Flanagan considers aboriginal issues a priority. Once in government, Harper’s Conservatives also brought through some controversial bills around property rights on reserves and aboriginal governance. Interestingly, they also seem to have more electoral success in heavily aboriginal ridings (Nunavut, Kenora, Labrador, northern Saskatchewan) than previous Conservative campaigns.

    David Zylberberg

  • Hi David,

    Thanks for this – that occurred to me as well, and I think around 2006 there was quite a bit of discussion of these issues (esp. around the relative merits of the Kelowna Accord). I think the best part of this approach is that it’s hopefully generating some questions to inquire of the data: the metadata isn’t the whole story, but very suggestive…

    Appreciate the comments!


Leave a Reply to Ian Milligan Cancel reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s