I was fortunate to receive a travel grant to present my research in a short, three-minute slot plus poster at the Herrenhäuser Konferenz: Big Data in a Transdisciplinary Perspective in Hanover, Germany. Here’s what I’ll be saying (pretty strictly) in my slot this afternoon. Some of it is designed to respond to the time format (if you scroll down you will see that there is an actual bell).
Big Data is coming to history. The advent of web archived material from 1996 onwards presents a challenge. In my work, I explore what tools, methods, and approaches historians need to adopt to study web archives.
GeoCities lets us test this. It will be one of the largest records of the lives of non-elite people ever. The Old Bailey Online can rightfully describe their 197,000 trials as the “largest body of texts detailing the lives of non-elite people ever published” between 1674 and 1913. But GeoCities, drawing on the material we have between 1996 and 2009, has over thirty-eight million pages.
These are the records of everyday people who published on the Web, reaching audiences far bigger than previously imaginable.
I approach GeoCities by asking questions such as: did they have a real community? How did people understand the Web in its earliest days?
GeoCities was a unique place. It rapidly grew, reaching millions of users in a few years. It concept helped to bridge the locally-based networks of the early 1990s and BBSes with the wide open Web – it clustered users together in neighbourhoods, from the children-focused EnchantedForest, the family-focused Heartland, the education-focused Athens. Users relied on each other to find content: from living next to each other in neighbourhoods, to linking to each other using Web rings.
This unique early experiment in the history of the Web came to halt in 2009 when Yahoo! shuttered it. If it hadn’t been for the timely intervention of Archive Team and others such as the Internet Archive, GeoCities would have been lost forever.
It would be as if the Old Bailey had been thrown on the fire pit of history.
Through text analysis and data mining, we can begin to approach the question of whether GeoCities users experienced community, and whether the neighbourhood system worked. We can do this through a few quick ways.
- use topic modeling to take various neighbourhoods and communities and see what recurring concepts appear; do these line up with the descriptions? Was there coherence?
- we can find the ‘community leaders’ that provided instructional support, extract their pages, and run content analysis to see what they provided – more than a word cloud, but it fit on the slide!
- we can move between distant reading and close reading, using link structures and topics, to tell detailed stories;
- and we can extract hundreds of thousands of images, arrange them, and begin to see different profiles that tie neighbourhoods together.
My findings suggest that yes the web archives of GeoCities reveal a vibrant, interconnected community of users taking their first steps out on the Web. It makes a case for why this sort of work matters, both to our understanding of the early Web as well as human culture more generally in the 1990s.
Web archives are the archives of the future, and to access them requires a look towards Big Data.
And, no lightning talk post would be complete without a picture of the timekeeper: in this case, the dreaded bell.