Last year (2017) was a busy year on many fronts (from my own personal parental leave to launching our Archives Unleashed project!). Our project manager, Samantha Fritz, has a great write-up on the project’s activities to date. Please check it out!
This hits all users hard. Within academia, Storify seems to be the go to to document controversy, or more commonly, conferences (say, the proceedings of an online conference or the hard work that went into documenting a presidential address). And why not? It’s an intuitive platform, far better than grabbing screenshots, and the other standard method – embedding Tweets in say a blog post – is equally vulnerable to an external service, that of Twitter itself, changing its access, model, or failing altogether.
By Ziquan Wang, Borui Lin, Ian Milligan, and Jimmy Lin
While Americans are busy enjoying their Fourth of July, us Canadians are digging into data… and indeed, we wanted to share some research recently presented at the Web Archives and Digital Libraries workshop.
Shortly after Donald Trump’s inauguration as President of the United States, eagle eyed observers noted a crucial difference in his webpage as compared to his predecessor, President Obama. Whereas Obama’s information page had listed the three branches of the US government: executive, judicial, and legislative, Trump’s page listed only two.
Examples like this made our research team at the University of Waterloo wonder: could we systematically begin to track the changes in discourses, priorities, topics, and beyond between two US Presidential elections, and more so, could we do so on a budget? As I’ve argued elsewhere, web archives are of crucial importance for historians seeking to understand any period after 1996. Yet the scale requires us to turn to digital methods. We cannot go page by page through websites, but rather we need tools to extract the information that we need. Could we “distantly read” websites to notice shifts like observers did in the early days of the Trump administration?
Luckily for us, students had just finished taking Jimmy Lin’s (awesome) Big Data Infrastructure course and wanted to exercise their skills. The amazing Ziquan Wang and Borui Lin joined us and set out to explore shifts between two American presidential administrations.
The University of Waterloo and York University have been awarded a grant from the Andrew W. Mellon Foundation to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past.
The grant, valued at $610,625, supports Archives Unleashed, a project that will develop web archive search and data analysis tools to enable scholars and librarians to access, share, and investigate recent history since the early days of the World Wide Web. It is additionally supported by generous in-kind and financial contributions from Start Smart Labs, Compute Canada, York University Libraries and the University of Waterloo’s Faculty of Arts.
Warcbase is great, so we often get lots of questions about how to install it – never straightforward when using a piece of software with multiple dependencies. While we’re hoping to make some significant changes to the process of installing and using Warcbase in the near future, in the short to medium term I wanted to make a slightly simpler guide.
If you’re interested in using Warcbase to analyze your web archives, you might find this PDF helpful. I will have some tutorials available soon to run some scripts on your own data – and generate a standard set of derivatives.