Super Simple Storage for Social Web Data with MongoDB (Computing Twitter Influence, Part 4)

In the last few posts for this series on computing twitter influence, we’ve reviewed some of the considerations in calculating a base metric for influence and how to acquire the necessary data to begin analysis. This post finishes up all of the prerequisite machinery before the real data science fun begins by introducing MongoDB as a staple in your social web mining toolkit and showing how to employ it for storing social data such as Twitter API responses.

As Easy As It Should Be

MongoDB is an excellent option to consider if you need a quick and easy fix for your data science experiments, and if you like Python, there’s a good chance you’ll enjoy MongoDB as well. Much like Python, MongoDB easy to pick up along the way, it scales up fairly well as the size of your data grows without too much fuss, the online documentation is excellent, the community is robust, language bindings are plentiful, and it’s generally just as easy as it should be to do a lot of data manipulation to/from Python.

MongoDB document-oriented, which (for our purposes) basically means that it stores JSON data, enabling you to easily archive the responses that you get back from most social web APIs. It’s easy enough to query the data with the standard find() operator, but a more powerful aggregation framework is available for constructing more nuanced data pipelines.

A primer of MongoDB is unwarranted, but if you have a copy of the book on hand, Chapter 6 (Mining Mailboxes) introduces a MongoDB as a sort of surrogate API for mail data. (The first half of this chapter focuses on normalizing arbitrarily sourced mail data so that it can be ingested into MongoDB for standardized analysis.)

Saving and accessing JSON data with MongoDB (Example 9-7 from the Twitter Cookbook) introduces two functions for storing and retrieving Twitter API data from MongoDB that we’ll adapt in the next section for our immediate needs. Take a moment to review this recipe if you haven’t previously encountered it. The functions that it provides are little more than load/store convenience wrappers.

Storing Millions of Twitter Followers

Recall from the last post in this series that a recipe like Getting all friends or followers for a user (Example 9-19 from the Twitter Cookbook) is fundamentally limited by the amount of memory that’s available. It buffers API responses in memory and accumulates 75,000 long integer values every 15 minutes, and although this is fine for a user with a “reasonable” number of followers, it won’t work at all for celebrity users with millions of followers. Even if we did have unlimited heap space, we’d still want to strive for a low memory profile as well as maintain a persistent archive for more convenient analysis that’s unconstrained by rate limits and network latency. After all, once you have the data, you won’t want to go to the trouble of fetching it again unless absolutely necessary since this process can be quite time consuming.

To illustrate just how easy it is to adapt a recipe from the cookbook like Example 9-19, take a look at this revised version of get_friends_followers_ids that’s been renamed to store_friends_followers_ids and compare it back to the original version. The primary substance of the change is simply the introduction of a save_to_mongo call for persisting each API response (along with a few tweaks to make this possible.)

That’s really all that there is to it. We’re now to the point that we can reliably harvest and store arbitrary volumes of Twitter data.

It may be worthwhile to review the prior posts in this series as a reminder for just how far we’ve come so far. Now having all of the necessary machinery and prerequisite discussion in place, we’ll return to the original proposition of computing Twitter influence with an initial review of some data for a few well-known Twitter accounts in the next post in this series.

Recommended Resources

3 Comments on “Super Simple Storage for Social Web Data with MongoDB (Computing Twitter Influence, Part 4)”

Latuji Jnr.
November 29, 2013, 10:11 pm

Reblogged this on My Research Collections.

Reply
hahaconda
March 2, 2014, 1:46 pm

Matthew, is it possible to make tutorials on how to integrate django and mining to show tweets i mined in web interface?
The problem here that Django is highly relational in its nature, and even if i would use http://django-mongodb-engine.readthedocs.org/en/latest/tutorial.html or http://mongoengine-odm.readthedocs.org/en/latest/tutorial.html i still need to make a model classes with relations (thats ruins advantages of using such approach).
What i want is to store tweets in Mongodb as json and then (without writing models) show them in the admin and user interface (maybe only screen names, date, retweet status) – but i still need to save all tweet, not only certain fields.
What’s your advice?

Reply
- Matthew A. Russell
  March 2, 2014, 2:48 pm
  
  To be honest, I probably don’t understand enough about your situation to offer too much prescriptive advice here, and a lot of my initial questions back would be directed at figuring out if you *really* need Django+MongoDB specifically, or if there are existing admin UIs for MongoDB that might work or other approaches all together.
  
  Not sure if this link provides any administration UIs that might be useful to your situation, but I thought I’d go ahead and pass it on in case you hadn’t run across it: http://docs.mongodb.org/ecosystem/tools/administration-interfaces/
  
  More to your point, though, I don’t have any great suggestions on how to do what you are asking apart from finding an existing admin UI for MongoDB or writing some custom model code.
  
  Reply

Mining the Social Web

Transforming Curiosity into Insight

Super Simple Storage for Social Web Data with MongoDB (Computing Twitter Influence, Part 4)

As Easy As It Should Be

Storing Millions of Twitter Followers

Recommended Resources

3 Comments on “Super Simple Storage for Social Web Data with MongoDB (Computing Twitter Influence, Part 4)”

Leave a comment Cancel reply

Be Social!

Recent Posts

Archives

Categories

Blog Feeds

Tweets from @SocialWebMining

Mining the Social Web on Facebook

Mining the Social Web

Transforming Curiosity into Insight

Super Simple Storage for Social Web Data with MongoDB (Computing Twitter Influence, Part 4)

As Easy As It Should Be

Storing Millions of Twitter Followers

Recommended Resources

Sharing is social...

Related

3 Comments on “Super Simple Storage for Social Web Data with MongoDB (Computing Twitter Influence, Part 4)”

Leave a comment Cancel reply

Be Social!

Recent Posts

Archives

Categories

Blog Feeds

Tweets from @SocialWebMining

Mining the Social Web on Facebook