BookScale: Understanding Big Data in Conventional Terms

I’ve been doing a lot of work with tweets lately: thinking about how they (1) provide a fascinating window into the everyday lives of users; and (2) what an incredible source they are going to be for historians sooner than we know it. Here’s one way I’ve been thinking about explaining this issue – with tongue only slightly in cheek:

For 90 minutes on the evening of 3 October 2012, American President Barrack Obama and his Republican challenger, Mitt Romney, faced each other on a stage in Denver. In homes, workplaces, pubs, gyms, and university study rooms, millions followed along – and tens of thousands participated in their own, social way. The next morning, amidst the controversy around an ostensibly listless president and energetic challenger, another note was made: that social media had arrived. Online commentary during the debate could be measured in Tweets per Minute (TPM): from the 94,409 during the minute where Romney jokes about Obama spending his 20th wedding anniversary with him, to an astounding 158,690 (TPM) when the moderator Jim Lehrer humorously interjected. Overall, 10.3 million tweets were sent about the debate on that early October evening.

Each tweet could be up to 140 characters. The mean length of a tweet (based on a one million tweet sample size) is 67.9 characters, and the median 60. Taking the median, a whopping 618,000,000 characters were sent. Let us put this in book terms. An average word (well, it’s more complicated than this, but still) has 5 characters plus a space, which would give us 103,000,000 words. If an average page in a hardcover book has 300 words, this would be 343,333 pages.

343,333 pages on a single night! From a single historical event! Of real-time observations, jokes, quirks, and so forth.

Historians can’t always grasp how just big this twitter repository truly will be for changing how we do social and cultural history. But the ‘book’ resonates.

So BookScale: 

((t*60)/6)/300

Where t is the number of tweets. So if we take 500,000 tweets:

Step one: 500,000 * (times) 60 = the number of tweets times the median length, for number of characters. We get 30,000,000 characters.

Step two: 30,000,000/6 = an average word is 5 characters, plus a space. So if we take the number of characters and divide by this, we get the average number of words. So we would have 5,000,000 words.

Step three: 5,000,000/300 = an average page has 300 words, so we now see that those 5,000,000 words would result in 16,666.7 pages.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s