Exploring Canada’s Parliamentary History through data.gc.ca: Occupations, 1867-2010

I’m not a political historian, although I did serve as the Secretary-Webmaster of the Political History Group for two years. Today, after having today’s class all prepared but not quite in the right mind frame to look over some manuscript drafts, I decided to play with some of the data that you can find in Canada’s open data repository.

I view this as sort of “putting in the hours” like a pilot would: I now find myself wrapped up in writing and using off-the-shelf analysis software that it’s good to keep my data analysis and programming skills a bit honed. But I was also thinking: it’s a great example of showing what, with a bit of computational skill, you can learn in about thirty minutes of unstructured data play. And, like my post from yesterday, my dream with all of this is that somebody will stumble across this and decide that there is some potential for their own work.

In any event, this morning I stumbled across History of the Federal Electoral Ridings, 1867-2010 and grabbed the English-language CSV file. It’s a big one, containing the information of 38,778 candidates for federal office in Canada. It’s a thirteen column file, with the following entries:

Election Date
Election Type
Last Name
First Name
Votes (%)

Using Mathematica, I was able to either pull out all information in a given area, or could control for one (or more) and look at the different variables that emerged. For example, in the following line of code, we can replace ‘x‘ with the sitting of Parliament (between 1 and 40) to then get All of the values in the 9th field (Occupation).

Cases[data, {__, __, x, __, __, __, __, __, __, __, __, __, __}][[All, 9]]

If we take that result, we get an unordered list of every occupation. The following command ‘flattens’ the occupations, so it’s just one big list (it’s previously divided into nested lists, one per Parliament), tallies the entries, and then sorts them in order.

In[470]:= Take[Sort[Tally[Flatten[occupations]],#1[[2]]>#2[[2]]&],30]

Out[470]= {{lawyer,3730},{farmer,2587},{,2308},{teacher,1415},{merchant,1194},{businessman,1125},{physician,999},{barrister,981},{parliamentarian,816},{student,795},{journalist,497},{retired,476},{manufacturer,425},{manager,355},{Member of Parliament,351},{administrator,298},{accountant,271},{consultant,267},{contractor,267},{notary,224},{engineer,223},{housewife,196},{salesman,195},{insurance agent,190},{professor,184},{secretary,179},{editor,164},{barrister-at-law,163},{educator,145},{insurance broker,144}}

So we see the usual suspects: lawyers, farmers, unentered data (2,308 – which mainly comes from defeated candidates before the 14th Parliament where data was very unevenly entered if at all), teachers, merchants, businessman, doctors, etc. At a glance, we can see one problem with this data: something like “merchant” and “businessman” might fall into the same category, as would “manufacturer” or “manager.” We’d have to be very careful with this data, especially as some re-elected MPs apparently just write ‘Member of Parliament’ or ‘parliamentarian’ when they get re-elected. If this was a real research project, I would probably spend a week or two trying to normalize all of this as best as I was able. But again, this is a quick tinker.

Let’s get some real questions. I’m curious about lawyers. How common are lawyers within the candidate pool? More so, do they have a disproportionate level of success at being elected? We know that they were damn common back in 1867, and anecdotally, they seem to be today.

I thus generated these two charts, drawing on the Parliaments from 1921 onwards – the 14th sitting onwards. Before that, defeated candidates usually did not have their occupation recorded. Consider the two following figures (the x axis refers to sittings of Parliament – we could quickly map those into dates, but again, we’re moving quickly here):

Screen Shot 2014-01-08 at 11.54.59 AMFrom this, we see that in the 14th Parliament nearly 11% of all candidates for seats listed their occupation as lawyer (there were some solicitors too, but lawyer was overwhelmingly the way they recorded their occupation). Yet if we drop all the defeated candidates, we see that almost 20% of the successful candidates that year were lawyers. So there were a lot of lawyers up for election, and they were disproportionately successful at getting elected. Things have dramatically declined since – although, keep an eye on the y axis, we’re not going too low. Still ~9% of our elected candidates in the 40th Parliament listed lawyer as occupation.

Again – we’re relying on the data from this chart – many more lawyers would have listed their occupation as businessman, perhaps, or simply parliamentarian.

But the occupational data is still fun. Let’s take every Liberal Party candidate’s occupation from 1962 onwards and compare this list to the occupations of the New Democratic Party. It speaks volumes about the two parties:


{{lawyer,737},{parliamentarian,412},{businessman,251},{farmer,212},{Member of Parliament,142},{teacher,138},{administrator,82},{consultant,71},{politician,68},{physician,56},{barrister,56},{merchant,54},{manager,53},{economist,52},{chartered accountant,49},{accountant,44},{journalist,43},{professor,41},{retired,38},{engineer,37},{manufacturer,36},{businesswoman,31},{insurance broker,31},{educator,30},{barrister and solicitor,29},{business person,27},{broadcaster,26},{,25},{school principal,25},{public servant,24},{insurance agent,22},{executive director,21},{cabinet minister,21},{publisher,20},{notary,19},{contractor,19},{management consultant,18},{housewife,17},{professional engineer,16},{barrister-at-law,16},{mayor,16},{executive,15},{business executive,14},{medical doctor,13},{student,13},{social worker,12},{clergyman,12},{veterinarian,11},{realtor,11},{sales manager,11}}


{{teacher,484},{student,192},{lawyer,179},{farmer,150},{professor,71},{retired,70},{union representative,69},{social worker,52},{parliamentarian,51},{Member of Parliament,48},{journalist,43},{businessman,43},{administrator,38},{consultant,37},{university professor,37},{housewife,36},{electrician,34},{economist,33},{,32},{secretary,31},{educator,31},{representative,31},{physician,29},{clergyman,29},{high school teacher,27},{salesman,27},{researcher,25},{school teacher,23},{writer,22},{manager,22},{self-employed,20},{minister,19},{organizer,18},{steelworker,18},{machinist,17},{business manager,17},{business agent,16},{trade unionist,16},{engineer,16},{clerk,16},{accountant,14},{contractor,14},{college instructor,13},{executive assistant,13},{instructor,13},{executive director,12},{unemployed,12},{nurse,12},{truck driver,12},{sociologist,12}}

Anyways, I think there is some cool stuff to be had with this dataset. If anybody is interested in collaborating or knows any neat questions to ask, drop me a line. As it stands, this is obviously outside of my comfort zone!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s