ImagePlot, developed by Lev Manovich’s Software Studies Initiative, promises to help you “explore patterns in large image collections.” It doesn’t disappoint. In this short post, I want to demonstrate what we can learn by visualizing the 243,520 images of all formats that make up the child-focused EnchantedForest neighbourhood of the GeoCities web archive.
Setting it Up
Loading web archived images into ImagePlot (macros which work with the open-source program ImageJ) requires an extra step, which works for both Wide Web Scrape as well as GeoCities data. Images need to be 24-bit RGB to work. My experience was that weird file formats broke the macros (i.e. an
ico file, or other junk that you do get in a web archive), so I used ImageMagick to convert the files.
mogrify -path /volumes/lacie/geocities-images/enchantedforest-JPGs -format jpg -type TrueColor "*.*"
Takes all of the files, converts them into TrueColor JPGs. Basically, you run it from the folder with all your images, it sends them to the specified directory (in this case, enchantedforest-JPGs). Even better,
mogrify lets you put the file list into quotation marks and gets over the annoying and delay-inducing argument list error.
After doing this, I used the ImagePlot macro
ImageMeasure.txt, included in the full package within the ‘extras’ folder to generate brightness, saturation, and hue values for each image. This takes a long time, but there’s an output window you can follow along with.
Once it’s generated, you’ll have a text file
measurements.txt with the values. When you now run the
ImagePlot.txt macro, you select it as your tab-delimited text file and then select the directory with all the images (in my case, the
enchantedforest-JPGs directory). I decided to set it up so that the x axis is the median brightness of an image, and the y axis is the median saturation.
The results again take a few hours to generate, and you can run it so it slowly populates. It’s a fun thing to have running in the background. The final results were worth the wait, however. Here’s a video of my findings:
You can download a full JPG here [43MB download].
The findings were really useful! As you see in the tour below, the broad contours of the web archive are demonstrated. We see:
- In the middle, we have a cluster of dark background images, mostly clipart. Neat, but not transformative in its findings.
- Above the middle, we have a dense cluster of digital photographs. After trawling around the image, the vast majority of them are located here. This collection is not digital photograph heavy, so it will be neat to compare it to another corpus (currently generating the .ca TLD as I write this).
- Around the periphery, we have a vast amount of clipart. In the upper right is the focus, the white-background clipart, below it are more colourful cliparts, and throughout we see cartoon characters occupying the periphery. Recurring images come out very well.
A few minutes with this, and it becomes clear that this actually was a child-focused neighbourhood, that digital photography was at a minimum, and that white background cartoons and icons dominated to the detriment of more colourful images.
This is neat in itself, but we need to generate comparative data (which I’m doing right now). If we do this for a variety of TLDs and GeoCities neighbourhoods, what patterns emerge? Could we use this as part of a finding aid to learn about a neighbourhood by ‘distantly reading’ the images?
I think we can, especially when combined with other methods (colour analysis, montages, extensions, facial extraction). Let’s hope I can get this together soon as I am trying to write a short piece on this..
p.s. thanks to all who’ve shared advice and tips via Twitter.