Creating Link Graphs with Warcbase

Screen Shot 2015-06-05 at 11.51.29 AMI was at the Columbia Web Archiving Collaboration: New Tools and Models conference this Thursday and Friday, and gave a quick demo. Here’s a bit more detail on it.

If you use Warcbase, using this handy guide to installing it on OS X, and follow scripts, you will eventually come up with a data file that looks a bit like this.

200510	acq.osd.mil	acq.osd.mil	96
200510	acq.osd.mil	akss.dau.mil	12
200510	agoracosmopolite.com	agorabookcafe.com	325
200510	agoracosmopolite.com	agoracosmopolitan.com	271
200510	agoracosmopolite.com	agoracosmopolite.com	8319
200510	agoracosmopolite.com	genesmedia.com	325
200510	bloc.org	go.microsoft.com	22
200510	blocpot.qc.ca	blocpot.qc.ca	104
200510	blocpot.qc.ca	marijuanaparty.org	16
200510	blocpot.qc.ca	norml.org	16
200510	blocquebecois.org	bernardbigras.qc.ca	16
200510	blocquebecois.org	bloc.org	1069
200510	blocquebecois.org	blocquebecois.org	276682

You can download your own sample file here, which draws on the Canadian Political Party and Political Interest Groups collection.

But how do you turn this into a beautiful Gephi visualization?

Easy!

Step One: Convert Link Output into GDF Format

You could do this manually by creating a CSV file, adding ‘TimeInt Source Target Weight’ in the first line of the file (tabs separating each value), and so forth, or you could use our handy script pig2gdf.py.

Usage is as follows:

pig2gdf.py usage:

$ ./pig2gdf.py <file> > <output file>

OR

$ cat <file> | ./pig2gdf.py

OR

$ ./pig2gdf.py < <file>

Step Two: Import into Gephi

You now want to take it into Gephi. Start Gephi (if you have trouble running it, this tutorial might help). Open the GDF file that you just generated. Click OK.

Now visit the ‘Data Laboratory’ panel and do the following. Select the ‘edges’ table so it looks like this.

Screen Shot 2015-06-05 at 11.44.38 AM

Click on the ‘Merge Columns’ button and do this:

Screen Shot 2015-06-05 at 11.45.14 AM

Make sure to parse dates as yyyymm.

The final step is to click on ‘nodes’ in the upper left, click ‘copy data to other column,’ select ‘id,’ and copy to ‘label.’

Screen Shot 2015-06-05 at 11.47.00 AM

Bobs your uncle! You’ll notice that you’ve got the option to enable a dynamic timeslider at the bottom.

Select ‘Overview,’ use the ‘Force Atlas’ visualization, tinker with the settings, and you have a dynamic web graph!

Screen Shot 2015-06-05 at 11.49.02 AM

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s