Skip to content

Trending tags

How To Get Twitter Follower Data Using Python And Tweepy

Andrew Patel

27.02.18 8 min. read

In January 2018, I wrote a couple of blog posts outlining some analysis I’d performed on followers of popular Finnish Twitter profiles. A few people asked that I share the tools used to perform that research. Today, I’ll share a tool similar to the one I used to conduct that research, and at the same time, illustrate how to obtain data about a Twitter account’s followers.

This tool uses Tweepy to connect to the Twitter API. In order to enumerate a target account’s followers, I like to start by using Tweepy’s followers_ids() function to get a list of Twitter ids of accounts that are following the target account. This call completes in a single query, and gives us a list of Twitter ids that can be saved for later use (since both screen_name and name an be changed, but the account’s id never changes). Once I’ve obtained a list of Twitter ids, I can use Tweepy’s lookup_users(userids=batch) to obtain Twitter User objects for each Twitter id. As far as I know, this isn’t exactly the documented way of obtaining this data, but it suits my needs. /shrug

Once a full set of Twitter User objects has been obtained, we can perform analysis on it. In the following tool, I chose to look at the account age and friends_count of each account returned, print a summary, and save a summarized form of each account’s details as json, for potential further processing. Here’s the full code:

from tweepy import OAuthHandler
from tweepy import API
from collections import Counter
from datetime import datetime, date, time, timedelta
import sys
import json
import os
import io
import re
import time

# Helper functions to load and save intermediate steps
def save_json(variable, filename):
    with io.open(filename, "w", encoding="utf-8") as f:
        f.write(unicode(json.dumps(variable, indent=4, ensure_ascii=False)))

def load_json(filename):
    ret = None
    if os.path.exists(filename):
        try:
            with io.open(filename, "r", encoding="utf-8") as f:
                ret = json.load(f)
        except:
            pass
    return ret

def try_load_or_process(filename, processor_fn, function_arg):
    load_fn = None
    save_fn = None
    if filename.endswith("json"):
        load_fn = load_json
        save_fn = save_json
    else:
        load_fn = load_bin
        save_fn = save_bin
    if os.path.exists(filename):
        print("Loading " + filename)
        return load_fn(filename)
    else:
        ret = processor_fn(function_arg)
        print("Saving " + filename)
        save_fn(ret, filename)
        return ret

# Some helper functions to convert between different time formats and perform date calculations
def twitter_time_to_object(time_string):
    twitter_format = "%a %b %d %H:%M:%S %Y"
    match_expression = "^(.+)\s(\+[0-9][0-9][0-9][0-9])\s([0-9][0-9][0-9][0-9])$"
    match = re.search(match_expression, time_string)
    if match is not None:
        first_bit = match.group(1)
        second_bit = match.group(2)
        last_bit = match.group(3)
        new_string = first_bit + " " + last_bit
        date_object = datetime.strptime(new_string, twitter_format)
        return date_object

def time_object_to_unix(time_object):
    return int(time_object.strftime("%s"))

def twitter_time_to_unix(time_string):
    return time_object_to_unix(twitter_time_to_object(time_string))

def seconds_since_twitter_time(time_string):
    input_time_unix = int(twitter_time_to_unix(time_string))
    current_time_unix = int(get_utc_unix_time())
    return current_time_unix - input_time_unix

def get_utc_unix_time():
    dts = datetime.utcnow()
    return time.mktime(dts.timetuple())

# Get a list of follower ids for the target account
def get_follower_ids(target):
    return auth_api.followers_ids(target)

# Twitter API allows us to batch query 100 accounts at a time
# So we'll create batches of 100 follower ids and gather Twitter User objects for each batch
def get_user_objects(follower_ids):
    batch_len = 100
    num_batches = len(follower_ids) / 100
    batches = (follower_ids[i:i+batch_len] for i in range(0, len(follower_ids), batch_len))
    all_data = []
    for batch_count, batch in enumerate(batches):
        sys.stdout.write("\r")
        sys.stdout.flush()
        sys.stdout.write("Fetching batch: " + str(batch_count) + "/" + str(num_batches))
        sys.stdout.flush()
        users_list = auth_api.lookup_users(user_ids=batch)
        users_json = (map(lambda t: t._json, users_list))
        all_data += users_json
    return all_data

# Creates one week length ranges and finds items that fit into those range boundaries
def make_ranges(user_data, num_ranges=20):
    range_max = 604800 * num_ranges
    range_step = range_max/num_ranges

# We create ranges and labels first and then iterate these when going through the whole list
# of user data, to speed things up
    ranges = {}
    labels = {}
    for x in range(num_ranges):
        start_range = x * range_step
        end_range = x * range_step + range_step
        label = "%02d" % x + " - " + "%02d" % (x+1) + " weeks"
        labels[label] = []
        ranges[label] = {}
        ranges[label]["start"] = start_range
        ranges[label]["end"] = end_range
    for user in user_data:
        if "created_at" in user:
            account_age = seconds_since_twitter_time(user["created_at"])
            for label, timestamps in ranges.iteritems():
                if account_age > timestamps["start"] and account_age < timestamps["end"]:
                    entry = {} 
                    id_str = user["id_str"] 
                    entry[id_str] = {} 
                    fields = ["screen_name", "name", "created_at", "friends_count", "followers_count", "favourites_count", "statuses_count"] 
                    for f in fields: 
                        if f in user: 
                            entry[id_str][f] = user[f] 
                    labels[label].append(entry) 
    return labels

if __name__ == "__main__": 
    account_list = [] 
    if (len(sys.argv) > 1):
        account_list = sys.argv[1:]

    if len(account_list) < 1:
        print("No parameters supplied. Exiting.")
        sys.exit(0)

    consumer_key=""
    consumer_secret=""
    access_token=""
    access_token_secret=""

    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    auth_api = API(auth)

    for target in account_list:
        print("Processing target: " + target)

# Get a list of Twitter ids for followers of target account and save it
        filename = target + "_follower_ids.json"
        follower_ids = try_load_or_process(filename, get_follower_ids, target)

# Fetch Twitter User objects from each Twitter id found and save the data
        filename = target + "_followers.json"
        user_objects = try_load_or_process(filename, get_user_objects, follower_ids)
        total_objects = len(user_objects)

# Record a few details about each account that falls between specified age ranges
        ranges = make_ranges(user_objects)
        filename = target + "_ranges.json"
        save_json(ranges, filename)

# Print a few summaries
        print
        print("\t\tFollower age ranges")
        print("\t\t===================")
        total = 0
        following_counter = Counter()
        for label, entries in sorted(ranges.iteritems()):
            print("\t\t" + str(len(entries)) + " accounts were created within " + label)
            total += len(entries)
            for entry in entries:
                for id_str, values in entry.iteritems():
                    if "friends_count" in values:
                        following_counter[values["friends_count"]] += 1
        print("\t\tTotal: " + str(total) + "/" + str(total_objects))
        print
        print("\t\tMost common friends counts")
        print("\t\t==========================")
        total = 0
        for num, count in following_counter.most_common(20):
            total += count
            print("\t\t" + str(count) + " accounts are following " + str(num) + " accounts")
        print("\t\tTotal: " + str(total) + "/" + str(total_objects))
        print
        print

Let’s run this tool against a few accounts and see what results we get. First up: @realDonaldTrump

Age ranges of new accounts following @realDonaldTrump

As we can see, over 80% of @realDonaldTrump’s last 5000 followers are very new accounts (less than 20 weeks old), with a majority of those being under a week old. Here’s the top friends_count values of those accounts:

Most common friends_count values seen amongst the new accounts following @realDonaldTrump

No obvious pattern is present in this data.

Next up, an account I looked at in a previous blog post – @niinisto (the president of Finland).

Age ranges of new accounts following @niinisto

Many of @niinisto’s last 5000 followers are new Twitter accounts. However, not in as large of a proportion as in the @realDonaldTrump case. In both of the above cases, this is to be expected, since both accounts are recommended to new users of Twitter. Let’s look at the friends_count values for the above set.

Most common friends_count values seen amongst the new accounts following @niinisto

In some cases, clicking through the creation of a new Twitter account (next, next, next, finish) will create an account that follows 21 Twitter profiles. This can explain the high proportion of accounts in this list with a friends_count value of 21. However, we might expect to see the same (or an even stronger) pattern with the @realDonaldTrump account. And we’re not. I’m not sure why this is the case, but it could be that Twitter has some automation in place to auto-delete programmatically created accounts. If you look at the output of my script you’ll see that between fetching the list of Twitter ids for the last 5000 followers of @realDonaldTrump, and fetching the full Twitter User objects for those ids, 3 accounts “went missing” (and hence the tool only collected data for 4997 accounts.)

Finally, just for good measure, I ran the tool against my own account (@r0zetta).

Age ranges of new accounts following @r0zetta

Here you see a distribution that’s probably common for non-celebrity Twitter accounts. Not many of my followers have new accounts. What’s more, there’s absolutely no pattern in the friends_count values of these accounts:

Most common friends_count values seen amongst the new accounts following @r0zetta

Of course, there are plenty of other interesting analyses that can be performed on the data collected by this tool. Once the script has been run, all data is saved on disk as json files, so you can process it to your heart’s content without having to run additional queries against Twitter’s servers. As usual, have fun extending this tool to your own needs, and if you’re interested in reading some of my other guides or analyses, here’s full list of those articles.

Andrew Patel

27.02.18 8 min. read

Categories

Related posts

Close

Newsletter modal

Thank you for your interest towards F-Secure newsletter. You will shortly get an email to confirm the subscription.

Gated Content modal

Congratulations – You can now access the content by clicking the button below.