A month and a half ago I posted an article in which I uncovered a series of Twitter accounts advertising adult dating (read: scam) websites. If you haven’t read it yet, I recommend taking a look at it before reading this article, since I’ll refer back to it occasionally.
To start with, let’s recap. In my previous research, I used a script to recursively query Twitter accounts for specific patterns, and found just over 22,000 Twitter bots using this process. This figure was based on the fact that I concluded my research (stopped my script) after querying only 3000 of the 22,000 discovered accounts. I have a suspicion that my script would have uncovered a lot more accounts, had I let it run longer.
This week, I decided to re-query all the Twitter IDs I found in March, to see if anything had changed. To my surprise, I was only able to query 2895 of the original 21964 accounts, indicating that Twitter has taken action on most of those accounts.
In order to find out whether the culled accounts were deleted or suspended, I wrote a small python script that utilized the requests module to directly query each account’s URL. If the script encountered a 404 error, it indicated that the account was removed or renamed. A reply indicated that the account was suspended. Of the 19069 culled accounts checked, 18932 were suspended, and 137 were deleted/renamed.
I also checked the surviving accounts in a similar manner, using requests to identify which ones were “restricted” (by checking for specific strings in the html returned from the query). Of the 2895 surviving accounts, 47 were set to restricted and the other 2848 were not.
As noted in my previous article, the accounts identified during my research had creation dates ranging from a few days old to over a decade in age. I checked the creation dates of both the culled set and the survivor’s set (using my previously recorded data) for patterns, but I couldn’t find any. Here they are, for reference:
Based on the connectivity I recorded between the original bot accounts, I’ve created a new graph visualization depicting the surviving communities. Of the 2895 survivors, only 402 presumably still belong to the communities I observed back then. The rest of the accounts were likely orphaned. Here’s a representation of what the surviving communities might look like, if the entity controlling these accounts didn’t make any changes in the meantime.
By the way, I’m using Gephi to create these graph visualizations, in case you were wondering.
Erik Ellason (@slickrockweb) contacted me recently with some evidence that the bots I’d discovered might be re-tooling. He pointed me to a handful of accounts that contained the shortened URL in a pinned tweet (instead of in the account’s description). Here’s an example profile:
Fetching a user object using the Twitter API will also return the last tweet that account published, but I’m not sure it would necessarily return the pinned Tweet. In fact, I don’t think there’s a way of identifying a pinned Tweet using the standard API. Hence, searching for these accounts by their promotional URL would be time consuming and problematic (you’d have to iterate through their tweets).
Fortunately, automating discovery of Twitter profiles similar to those Eric showed me was fairly straightforward. Like the previous botnet, the accounts could be crawled due to the fact that they follow each other. Also, all of these new accounts had text in their descriptions that followed a predictable pattern. Here’s an example of a few of those sentences:
look url in last post
go on link in top tweet
go at site in last post
It was trivial to construct a simple regular expression to find all such sentences:
desc_regex = "(look|go on|go at|see|check|click) (url|link|site) in (top|last) (tweet|post)"
I modified my previous script to include the above regular expression, seeded it with the handful of accounts that Eric had provided me, and let it run. After 24 hours, my new script had identified just over 20000 accounts. Mapping the follower/following relationships between these accounts gave me the following graph:
As we zoom in, you’ll notice that these accounts are way more connected than the older botnet. The 20,000 or so accounts identified at this point map to just over 100 separate communities. With roughly the same amount of accounts, the previous botnet contained over 1000 communities.
Zooming in further shows the presence of “hubs” in each community, similar to in our previous botnet.
Given that this botnet showed a greater degree of connectivity than the previous one studied, I decided to continue my discovery script and collect more data. The discovery rate of new accounts slowed slightly after the first 24 hours, but remained steady for the rest of the time it was running. After 4 days, my script had found close to 44,000 accounts.
And eight days later, the total was just over 80,000.
Here’s another way of visualizing that data:
Here’s the size distribution of communities detected for the 80,000 node graph. Smaller community sizes may indicate places where my discovery script didn’t yet look. The largest communities contained over 1000 accounts. There may be a way of searching more efficiently for these accounts by prioritizing crawling within smaller communities, but this is something I’ve yet to explore.
I shut down my discovery script at this point, having queried just over 30,000 accounts. I’m fairly confident this rabbit hole goes a lot deeper, but it would have taken weeks to query the next 50,000 accounts, not to mention the countless more that would have been added to the list during that time.
As with the previous botnet, the creation dates of these accounts spanned over a decade.
Here’s the oldest account I found.
Using the same methodology I used to analyze the survivor accounts from the old botnet, I checked which of these new accounts were restricted by Twitter. There was an almost exactly even split between restricted and non-restricted accounts in this new set.
Given that these new bots show many similarities to the previously discovered botnet (similar avatar pictures, same URL shortening services, similar usage of the English language) we might speculate that this new set of accounts is being managed by the same entity as those older ones. If this is the case, a further hypothesis is that said entity is re-tooling based on Twitter’s action against their previous botnet (for instance, to evade automation).
Because these new accounts use a pinned Tweet to advertise their services, we can test this hypothesis by examining the creation dates of the most recent Tweet from each account. If the entity is indeed re-tooling, all of the accounts should have Tweeted fairly recently. However, a brief examination of last tweet dates for these accounts revealed a rather large distribution, tracing back as far as 2012. The distribution had a long tail, with a majority of the most recent Tweets having been published within the last year. Here’s the last year’s worth of data graphed.
Here’s the oldest Tweet I found:
This data, on it’s own, would refute the theory that the owner of this botnet has been recently retooling. However, a closer look at some of the discovered accounts reveals an interesting story. Here are a few examples.
This account took a 6 year break from Twitter, and switched language to English.
This account mentions a “url in last post” in its bio, but there isn’t one.
This account went from posting in Korean to posting in English, with a 3 year break in between. However, the newer Tweet mentions “url in bio”. Sounds vaguely familiar.
Examining the text contained in the last Tweets from these discovered accounts revealed around 76,000 unique Tweets. Searching these Tweets for links containing the URL shortening services used by the previous botnet revealed 8,200 unique Tweets. Here’s a graph of the creation dates of those particular Tweets.
As we can see, the Tweets containing shortened URLs date back only 21 days. Here’s a distribution of domains seen in those Tweets.
My current hypothesis is that the owner of the previous botnet has purchased a batch of Twitter accounts (of varying ages) and has been, at least for the last 21 days, repurposing those accounts to advertise adult dating sites using the new pinned-Tweet approach.
One final thing – I checked the 2895 survivor accounts from the previously discovered botnet to see if any had been reconfigured to use a pinned Tweet. At the time of checking, only one of those accounts had been changed.
If you’re interested in looking at the data I collected, I’ve uploaded names/ids of all discovered accounts, the follower/following mappings found between these accounts, the gephi save file for the 80,000 node graph, and a list of accounts queried by my script (in case someone would like to continue iterating through the unqueried accounts.) You can find all of that data in this github repo.