William J. Turkel and I have been working a bit on getting WARC files to play with Mathematica. For larger numbers of files, warcbase is still the solution. But for a small collection – say a few WARCs created with webrecorder.io – this might be a lighter-weight approach. Indeed, I can see myself doing this if I went out around the web with WebRecorder, grabbed some sites (say public history sites or the like), and wanted to do some analysis on it.
Bill and I developed this together: he cooked up the record to association bit (which is really the core of this code), and I worked on getting us to be able to process entire WARCs and generate some basic analysis. It was also fun getting back into Mathematica, after living in Scala and Bash. Continue reading “Reading WARC Records with Mathematica”→
What does this mean? Basically, over the next year I’ll be hosting the following public events in Toronto. This will primarily be taking place in the January – May timeframe, and I will be in Toronto roughly once-a-week during this period. It is also an excuse to be physically proximate to great collaborators: folks at the DCI, Toronto libraries (especially Nich Worby who I’ve worked with quite a bit), and York (where my frequent collaborator Nick Ruest is based).
Nick Ruest and myself have a piece that’s just come out in Code4Lib Journal. The article takes readers through the (a) why Twitter matters for event archiving and future historical research; (b) how you can collect data yourself; and (c) how you can analyze the data. You can read the abstract below, and check out the article here!
As always, hope you enjoy reading it, and if you have any comments, questions, or anything, we are always happy to hear from you.
Nick Ruest, Anna St-Onge, and myself have a piece that’s just come out in the open-access journal Digital Studies / Le champ numérique. The deliberately acronym-heavy title introduces an article that really takes us through the process of (a) creating a web archive; (b) preserving and providing access to the files; and (c) running some basic analysis on it from the perspective of a historian. While some of the text analysis done in the rear bit of the article predates more recent warcbase developments, I think it hopefully provides a great and useful conceptual approach.
I don’t normally take partisan positions here at ianmilligan.ca, especially in the rough and tumble world of American politics. But sometimes a line is crossed, and I cannot stay silent! 😉
Speaking at a journalism event in late March 2016, American President Obama had this to say according to the Washington Examiner.
Ten, 20, 50 years from now, no one seeking to understand our age is going to be searching the tweets that got the most retweets, or the post that got the most likes … They’ll look for the kind of reporting, the smartest investigative journalism that told our story — lifted up the contradictions in our societies and asked the hard questions and forced people to see the truth even when it was uncomfortable.