What Are People Tweeting About #Syria in Your Neck of the Woods?
Update: Sept 9, 2013 – View Surprising Stats From Mining One Million Tweets About #Syria for a more comprehensive analysis of content in this post.
In the opening chapter of Mining the Social Web, 2nd Edition we begin the journey into mining social data by way of Twitter. To paraphrase some of the content from an early section entitled “Why Is Twitter All the Rage?”, I remind us that technology must amplify a meaningful aspect of our human experience in order to be successful, and Twitter’s success largely has been dependent on its ability to do this quite well. Although you could describe Twitter as just a “free, high-speed, global text-messaging service,” that would be to miss the much larger point that Twitter scratches some of the most fundamental itches of our humanity:
- We want to be heard
- We want to satisfy our curiosity
- We want it easy
- We want it now
…technology must amplify a meaningful aspect of our human experience in order to be successful…
In this short blog post, I’ll demonstrate how you could use the tools and example code from the book to very quickly answer the question, “What are people tweeting about #Syria in your neck of the woods?” in order to craft an interactive visualization for exploring a compelling set of data. Think of this post as the first in a series that’s designed to introduce example code to help you satisfy three of the four items on the list. It’ll then be up to you if you’d like to satisfy the fourth item (to be heard) by sharing the insights that you discover with others.
In addition to an introduction for mining Twitter data that’s presented in Chapter 1, a cookbook of more than two dozen recipes for mining Twitter data is featured in Chapter 9. The recipes are fairly atomic and designed to be composed as simple building blocks that can be copied, pasted, minimally massaged in order to get you on your way. (In a nutshell, that’s actually the purpose of the entire book for the broader social web: to give you the tools that you need to transform curiosity into insight as quickly and easily as possible.)
In this post, we’ll adapt concepts from three primary recipes:
- Accessing Twitter’s API for Development Purposes (Example 9-1)
- Saving and Accessing JSON Data with MongoDB (Example 9-7)
- Sampling the Twitter Firehose with the Streaming API (Example 9-8)
You can review these short examples in the nicely rendered IPython Notebook for all of the finer details, but the combined approach is pretty simple: let’s use Twitter’s Streaming API to store away a collection of tweets containing #Syria. Although there is a little bit of extra robustness you may want to add to the code for certain exceptional circumstances, the essence of the combined recipes is quite simple:
import twitter import pymongo # Our query of interest q = '#Syria' # See Example 9-1 (Accessing Twitter's API...) for API access twitter_stream = twitter.TwitterStream(auth=twitter.oauth.OAuth(...)) # See https://dev.twitter.com/docs/streaming-apis for more options # in filtering the firehose stream = twitter_stream.statuses.filter(track=q) # Connect to the database client = pymongo.MongoClient() db = client["StreamingTweets"] # Read tweets from the stream and store them to the database for tweet in stream: db["Syria"].insert(tweet)
In other words, you just filter the public stream for a search term and stash away the results to a convenient storage medium like MongoDB. It really is that easy! Over the past 72 hours, I’ve collected just over 500,000 tweets using exactly this approach, and you could do the very same thing for any particular topic of interest.
As discussed at length in Chapter 1, there is roughly 5KB of metadata that accompanies those 140 characters that you commonly think of as a tweet, and among the metadata fields are geo-coordinates. Although the percentage of Twitter users who have geo-coordinates enabled for their tweets has consistently been less than 1% in my own experience, that’s still plenty of data points out from our collection to plot on a map and explore.
…there is roughly 5KB of metadata that accompanies those 140 characters that you commonly think of as a tweet…
The minutia of running a MongoDB query to parse out the geo-coordinates along with a couple of other convenient fields to support the visualization such as the tweet’s id, text, and author is fairly uninteresting and won’t be repeated here. Suffice it to say that you can rather trivially munge the format of your data in just a few lines of additional code and export it to an exciting new format called GeoJSON that is both convenient and portable. As it turns out, GitHub automatically provides a terrific GeoJSON visualization for any GeoJSON file that’s checked into a repository or stored as a gist.
Without further delay, click on the screenshot below to try out the interactive visualization, and note also that you can click on the styling markers to view the referenced metadata.
Bearing in mind that the results that we have collected represent up to 1% of the firehose, we do see tweets all over the map with higher concentrations in certain parts of the world that have been routinely making headlines such as the Unites States and Great Britain. Do you see anything that surprises you by perusing the tweets? Are the tweets from your “neck of the woods” similar to your own sentiments about the political situation?
Although we could have opted to use other recipes in the cookbook to just search for the most recent tweets about #Syria within a given locale, we wouldn’t have been able to as easily explore a sample of worldwide tweets about #Syria on a map using this approach. However, searching for tweets within a particular area once you’ve identified a locale of interest is exactly what you’d want to do in order to exploit some portion of the data. For example, you might wonder what the sentiment is on the western coast of Africa regarding #Syria and zoom in on that content for closer review. Perhaps this is something that you’d like to see in the next post?
Thanks for reading. I hope you’ve enjoyed and adapt the sample code to transform your curiosity into great insights about the world around us.
If you’ve enjoyed this content, you may want to tweet about it using the blog’s sidebar widget or consider purchasing a copy of Mining the Social Web, 2nd Edition, which is in its final stages of production. (I recommend the DRM-free ebook, which is already available as an “Early Access” product, but paper copies should be available in approximately one month as well.)
Read more about the journey of authoring Mining the Social Web, 2nd Edition and how I tried to apply lean practices to make it the best possible product for you in Reflection on Authoring a Minimum Viable Book.
An entire chapter featuring GitHub was added to Mining the Social Web, 2nd Edition that’s devoted to mining GitHub data because it is quickly becoming one of the most mainstream hubs for collaboration anywhere on the social web.
Pingback: Surprising Stats From Mining One Million Tweets About #Syria | Mining the Social Web
Pingback: Mining One Million Tweets About #Syria - Programming - O'Reilly Media