Work in Progress

Too often, our work is hidden away from the public and even colleagues! To that end, I have decided to post abstracts and working notes that I am currently working on. If you see anything that interests you, please let me know.

ABSTRACTS:

“Digging into Music: An Interactive Textual Analysis of the Top 40 Billboard Lyrics Database”

What would you do if you could access word and phrase frequency data across every song that charted between 1964 and 1989 on the Billboard Top 40? In this paper, I will introduce my ongoing work in data mining and textual analysis. While lyrics pose severe copyright problems, when taken from the web and distilled into abstract frequency, we avoid this issue. Using the Mathematica programming language, an integrated platform for high-performance computing, I have created an interactive program that allows you to chart the evolution of ideas and content over time (similar to the Google Books/culturomics n-gram viewer project). I will also demonstrate a number of other visualization techniques, such as word clouds and phrase maps.

This will be an interactive presentation with my program running on my personal computer. After a brief introduction, I would like to collude with the audience and run searches together. My background is in history and the digital humanities, and I would love to see what an interdisciplinary audience thinks about the potential for this tool. Furthermore, would they be interested in using it for their own work.

“Mining the ‘Internet Graveyard’: Exploring Canada’s Digital Collections Projects”

Between 1996 and 2004, Canada’s Digital Collections was run by Industry Canada “to provide young Canadians with skills and experience in preparing digital Canadian content of local, regional and international interest.” Over 650 websites were created on historical topics, from “1841: A Census of Prince Edward Island,” to “Images of Montreal, Canadian Metropolis, 1872-1898,” to the “York Factory Cree Nation: A Cultural Journey Back in Time.” Once the project was shut down, the websites were archived online at Library and Archives Canada (LAC). Now, a visitor to this somewhat hidden corner of LAC’s site is met with a disclaimer that “information may be out of date and some functionality lost.”

I have recovered these files, mirroring them onto my own system, and am data mining this massive array of textual and graphical information. Through this, I first hope to learn about how young Canadians viewed the past, what they felt was important, and how they sought to convey it to readers in these early days of the World Wide Web.

More importantly, however, this project aims to make methodological contributions. When future historians research the 1990s, they will have to engage with a tremendous array of digital sources: thousands of websites, often hidden and slightly incompatible, but offering a treasure trove of information. My presentation will discuss several ways that we can engage with these sources: through mirroring and preservation (or exporting them from the Internet Archive), large-scale textual analysis, and cutting-edge visualization techniques. Hyperlinked information presents new issues of analysis and interpretation, and we need to understand the archival benefits and pitfalls of this format. Through this, historians can be ready to study the digital past.

Leave a comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.