I’ve been working recently with a large quantity of federal parliamentary transcripts (from 1996 to the present day – downloaded with Mathematica), which is forcing me to tweak some of my existing workflows. It’s a massive collection: 2,177 individual transcripts (one per day), 802.5MB of plain text. It’s chronologically ordered and is of, as far as I can tell, extremely high quality.
One of the issues I have had is finding information that I want quickly. The online index isn’t complete enough for my needs, many of my visualization tools in Mathematica creak under that much data, and – needless to say – spotlight on OS X is basically useless. So read on for one off-the-shelf tool that I’m playing with.
I’ve been finally experimenting with DEVONthink Pro but the one tool that I’m currently using is EasyFind. It’s a free program, available in the app store, that carries out searches with a lot more options than you get in spotlight. The downside is that it doesn’t pre-index everything like spotlight, so things take a bit longer, but I’m going through that Hansard collection in something approaching 10 or 15 seconds (it’s all on a flash drive for me, so YMMV).
Basically, the glory is that it supports full boolean searches on your system (spotlight has minimal support). These include the regular AND OR and NOT, but also – critically – the NEAR/n command. A search like this:
~children NEAR/30 internet highlights documents where the word children and internet appear within 30 words of each other. The tilde before child notes that I’m okay with truncated versions of the word, so ‘child’ will be okay too.
For more refined searches, it also supports AFTER, BEFORE, NEXT, and everything you might remember from your old Grade 10 library visit (at least for me, before Google let me get lazy).