In this blog post, I’ll explain how to obtain data from a specified Twitter account using tweepy and Python. Let’s jump straight into the code!
As usual, we’ll start off by importing dependencies. I’ll use the datetime and Counter modules later on to do some simple analysis tasks.
from tweepy import OAuthHandler from tweepy import API from tweepy import Cursor from datetime import datetime, date, time, timedelta from collections import Counter import sys
The next bit creates a tweepy API object that we will use to query for data from Twitter. As usual, you’ll need to create a Twitter application in order to obtain the relevant authentication keys and fill in those empty strings. You can find a link to a guide about that in one of the previous articles in this series.
consumer_key="" consumer_secret="" access_token="" access_token_secret="" auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) auth_api = API(auth)
Names of accounts to be queried will be passed in as command-line arguments. I’m going to exit the script if no args are passed, since there would be no reason to continue.
account_list =  if (len(sys.argv) > 1): account_list = sys.argv[1:] else: print("Please provide a list of usernames at the command line.") sys.exit(0)
Next, let’s iterate through the account names passed and use tweepy’s API.get_user() to obtain a few details about the queried account.
if len(account_list) > 0: for target in account_list: print("Getting data for " + target) item = auth_api.get_user(target) print("name: " + item.name) print("screen_name: " + item.screen_name) print("description: " + item.description) print("statuses_count: " + str(item.statuses_count)) print("friends_count: " + str(item.friends_count)) print("followers_count: " + str(item.followers_count))
Twitter User Objects contain a created_at field that holds the creation date of the account. We can use this to calculate the age of the account, and since we also know how many Tweets that account has published (statuses_count), we can calculate the average Tweets per day rate of that account. Tweepy provides time-related values as datetime objects which are easy to calculate things like time deltas with.
tweets = item.statuses_count account_created_date = item.created_at delta = datetime.utcnow() - account_created_date account_age_days = delta.days print("Account age (in days): " + str(account_age_days)) if account_age_days > 0: print("Average tweets per day: " + "%.2f"%(float(tweets)/float(account_age_days)))
Next, let’s iterate through the user’s Tweets using tweepy’s API.user_timeline(). Tweepy’s Cursor allows us to stream data from the query without having to manually query for more data in batches. The Twitter API will return around 3200 Tweets using this method (which can take a while). To make things quicker, and show another example of datetime usage we’re going to break out of the loop once we hit Tweets that are more than 30 days old. While looping, we’ll collect lists of all hashtags and mentions seen in Tweets.
hashtags =  mentions =  tweet_count = 0 end_date = datetime.utcnow() - timedelta(days=30) for status in Cursor(auth_api.user_timeline, id=target).items(): tweet_count += 1 if hasattr(status, "entities"): entities = status.entities if "hashtags" in entities: for ent in entities["hashtags"]: if ent is not None: if "text" in ent: hashtag = ent["text"] if hashtag is not None: hashtags.append(hashtag) if "user_mentions" in entities: for ent in entities["user_mentions"]: if ent is not None: if "screen_name" in ent: name = ent["screen_name"] if name is not None: mentions.append(name) if status.created_at < end_date: break
Finally, we’ll use Counter.most_common() to print out the ten most used hashtags and mentions.
print print("Most mentioned Twitter users:") for item, count in Counter(mentions).most_common(10): print(item + "\t" + str(count)) print print("Most used hashtags:") for item, count in Counter(hashtags).most_common(10): print(item + "\t" + str(count)) print print "All done. Processed " + str(tweet_count) + " tweets." print
And that’s it. A simple tool. But effective. And, of course, you can extend this code in any direction you like.