Phishing Campaign targeting French Industry

We have recently observed an ongoing phishing campaign targeting the French industry. Among these targets are organizations involved in chemical manufacturing, aviation, automotive, banking, industry software providers, and IT service providers. Beginning October 2018, we have seen multiple phishing emails which follow a similar pattern, similar indicators, and obfuscation with quick evolution over the course of the campaign. This post will give a quick look into how the campaign has evolved, what it is about, and how you can detect it.

Phishing emails

The phishing emails usually refer to some document that could either be an attachment or could supposedly be obtained by visiting the link provided. The use of the French language here appears to be native and very convincing.

The subject of the email follows the prefix of the attachment name. The attachments could be an HTML or a PDF file usually named as “document“, “preuves“, or “fact” which can be followed by underscore and 6 numbers. Here are some of the attachment names we have observed:

  • fact_395788.xht
  • document_773280.xhtml
  • 474362.xhtml
  • 815929.htm
  • document_824250.html
  • 975677.pdf
  • 743558.pdf

Here’s an example content of an XHTML attachment from 15th of November:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta content="UTF-8" />
</head>
<body onload='document.getElementById("_y").click();'>
<h1>
<a id="_y" href="https://t[.]co/8hMB9xwq9f?540820">Lien de votre document</a>
</h1>
</body>
</html>

 

Evolution of the campaign

The first observed phishing emails in the beginning of October contained an unobfuscated payload address. For example:

  • hxxp://piecejointe[.]pro/facture/redirect[.]php
  • hxxp://mail-server-zpqn8wcphgj[.]pw?client=XXXXXX

These links were inside HTML/XHTML/HTM attachments or simply as links in the email body. The attachment names used were mostly document_[randomized number].xhtml.

Towards the end of October these payload addresses were further obfuscated by putting them behind redirects. The author has developed a simple Javascript to obfuscate a bunch of .pw domains.

var _0xa4d9=["\x75\x71\x76\x6B\x38\x66\x74\x75\x77\x35\x69\x74\x38\x64\x73\x67\x6C\x63\x7A\x2E\x70\x77",
"\x7A\x71\x63\x7A\x66\x6E\x32\x6E\x6E\x6D\x75\x65\x73\x68\x38\x68\x74\x79\x67\x2E\x70\x77",
"\x66\x38\x79\x33\x70\x35\x65\x65\x36\x64\x6C\x71\x72\x37\x39\x36\x33\x35\x7A\x2E\x70\x77",
"\x65\x72\x6B\x79\x67\x74\x79\x63\x6F\x6D\x34\x66\x33\x79\x61\x34\x77\x69\x71\x2E\x70\x77",
"\x65\x70\x72\x72\x39\x71\x79\x32\x39\x30\x65\x62\x65\x70\x6B\x73\x6D\x6B\x62\x2E\x70\x77",
"\x37\x62\x32\x64\x75\x74\x62\x37\x76\x39\x34\x31\x34\x66\x6E\x68\x70\x36\x63\x2E\x70\x77",
"\x64\x69\x6D\x76\x72\x78\x36\x30\x72\x64\x6E\x7A\x36\x63\x68\x6C\x77\x6B\x65\x2E\x70\x77",
"\x78\x6D\x76\x6E\x6C\x67\x6B\x69\x39\x61\x39\x39\x67\x35\x6B\x62\x67\x75\x65\x2E\x70\x77",
"\x62\x72\x75\x62\x32\x66\x77\x64\x39\x30\x64\x38\x6D\x76\x61\x70\x78\x6E\x6C\x2E\x70\x77",
"\x68\x38\x39\x38\x6A\x65\x32\x68\x74\x64\x64\x61\x69\x38\x33\x78\x63\x72\x37\x2E\x70\x77",
"\x6C\x32\x6C\x69\x69\x75\x38\x79\x64\x7A\x6D\x64\x66\x30\x31\x68\x69\x63\x72\x2E\x70\x77",
"\x63\x79\x6B\x36\x6F\x66\x6D\x75\x6E\x6C\x35\x34\x72\x36\x77\x6B\x30\x6B\x74\x2E\x70\x77",
"\x7A\x78\x70\x74\x76\x79\x6F\x64\x6A\x39\x35\x64\x77\x63\x67\x6B\x6C\x62\x77\x2E\x70\x77",
"\x35\x65\x74\x67\x33\x6B\x78\x6D\x69\x78\x67\x6C\x64\x73\x78\x73\x67\x70\x65\x2E\x70\x77",
"\x38\x35\x30\x6F\x6F\x65\x70\x6F\x6C\x73\x69\x71\x34\x6B\x71\x6F\x70\x6D\x65\x2E\x70\x77",
"\x6F\x6D\x63\x36\x75\x32\x6E\x31\x30\x68\x38\x6E\x61\x71\x72\x30\x61\x70\x68\x2E\x70\x77",
"\x63\x30\x7A\x65\x68\x62\x74\x38\x6E\x77\x67\x6F\x63\x35\x63\x6E\x66\x33\x30\x2E\x70\x77",
"\x68\x36\x6A\x70\x64\x6B\x6E\x7A\x76\x79\x63\x61\x36\x6A\x67\x33\x30\x78\x74\x2E\x70\x77",
"\x74\x64\x32\x6E\x62\x7A\x6A\x6D\x67\x6F\x36\x73\x6E\x65\x6E\x6A\x7A\x70\x72\x2E\x70\x77",
"\x6C\x69\x70\x71\x76\x77\x78\x63\x73\x63\x34\x75\x68\x6D\x6A\x36\x74\x6D\x76\x2E\x70\x77",
"\x31\x33\x72\x7A\x61\x75\x30\x69\x64\x39\x79\x76\x37\x71\x78\x37\x76\x6D\x78\x2E\x70\x77",
"\x6B\x64\x33\x37\x68\x62\x6F\x6A\x67\x6F\x65\x76\x6F\x63\x6C\x6F\x7A\x77\x66\x2E\x70\x77",
"\x66\x75\x67\x65\x39\x69\x6F\x63\x74\x6F\x38\x39\x63\x6B\x36\x7A\x62\x30\x76\x2E\x70\x77",
"\x70\x6D\x63\x35\x6B\x71\x6C\x78\x6C\x62\x6C\x78\x30\x65\x67\x74\x63\x37\x32\x2E\x70\x77",
"\x30\x71\x38\x31\x73\x73\x72\x74\x68\x69\x72\x63\x69\x62\x70\x6A\x62\x33\x38\x2E\x70\x77","\x72\x61\x6E\x64\x6F\x6D","\x6C\x65\x6E\x67\x74\x68","\x66\x6C\x6F\x6F\x72","\x68\x74\x74\x70\x3A\x2F\x2F","\x72\x65\x70\x6C\x61\x63\x65","\x6C\x6F\x63\x61\x74\x69\x6F\x6E"];
var arr=[_0xa4d9[0],_0xa4d9[1],_0xa4d9[2],_0xa4d9[3],_0xa4d9[4],_0xa4d9[5],_0xa4d9[6],_0xa4d9[7],_0xa4d9[8],_0xa4d9[9],_0xa4d9[10],_0xa4d9[11],_0xa4d9[12],_0xa4d9[13],_0xa4d9[14],_0xa4d9[15],_0xa4d9[16],_0xa4d9[17],_0xa4d9[18],_0xa4d9[19],_0xa4d9[20],_0xa4d9[21],_0xa4d9[22],_0xa4d9[23],_0xa4d9[24]];
var redir=arr[Math[_0xa4d9[27]](Math[_0xa4d9[25]]()* arr[_0xa4d9[26]])];
window[_0xa4d9[30]][_0xa4d9[29]](_0xa4d9[28]+ redir)

This Javascript code, which was part of the attachment, deobfuscated an array of [random].pw domains that redirected the users to the payload domain. In this particular campaign, the payload domain has changed to hxxp://email-document-joint[.]pro/redir/.

However, it appears that the use of Javascript code inside attachments was not a huge success as only some days later, the Javascript code for domain deobfuscation and redirection has been moved behind pste.eu, a Pastebin-like service for HTML code. So then the phishing emails thereafter contained links to pste.eu such as hxxps[://]pste[.]eu/p/yGqK[.]html.

In the next iteration of evolution during November, we observed few different styles. Some emails contained links to subdomains of random .pw or .site domains such as:

  • hxxp://6NZX7M203U[.]p95jadah5you6bf1dpgm[.]pw
  • hxxp://J8EOPRBA7E[.]jeu0rgf5apd5337[.]site.

At this point .PDF files were also seen in the phishing emails as attachments. Those PDFs contained similar links to a random subdomain in .site or .website domains.

Few days later at 15th of November, the attackers continued to add redirections in between the pste.eu URLs by using Twitter shortened URLs. They used a Twitter account to post 298 pste.eu URLs and then included the t.co equivalents into their phishing emails. The Twitter account appears to be some sort of advertising account with very little activity since its creation in 2012. Most of the tweets and retweets are related to Twitter advertisement campaigns or products/lotteries etc.

 

The pste.eu links in Twitter

 

Example of the URL redirections

The latest links used in the campaign are random .icu domains leading to 302 redirection chain. The delivery method remained as XHTML/HTML attachments or links in the emails. The campaign appears to be evolving fairly quickly and the attackers are active in generating new domains and new ways of redirection and obfuscation. At the time of writing, it seems the payload URLs lead to an advertising redirection chain with multiple different domains and URLs known for malvertising.

 

Infrastructure

The campaign has been observed using mostly compromised Wanadoo email accounts and later email accounts in their own domains such as: rault@3130392E3130322E37322E3734.lho33cefy1g.pw to send out the emails. The subdomain name is the name of the sending email server and is a hex representation of the public IP address of the server, in this case: 109.102.72.74.

The server behind the .pw domain appears to be a postfix email server listed already on multiple blacklists. For compromised email accounts used for sending out the phishing emails, they are always coming from .fr domains.

The links in the emails go through multiple URLs in redirection chains and most of the websites are hosted in the same servers.

Following the redirections after the payload domains (e.g. email-document-joint[.]pro or .pw payload domains) later in November, we get redirected to domains such as ffectuermoi[.]tk or eleverqualit[.]tk. These were hosted on the same servers with a lot of similar looking domains. Closer investigation of these servers revealed that they were known for hosting PUP/Adware programs and more malvertising URLs.

Continuing on to ffectuermoi[.]tk domain would eventually lead to doesok[.]top, which serves advertisements while setting cookies along the way. The servers hosting doesok[.]top are also known for hosting PUP/adware/malware.

 

Additional Find

During the investigation we came across an interesting artifact in Virustotal submitted from France. The file is a .zip archive that contained the following

  • All in One Checker” tool – a tool that can be used to verify email account/password dumps for valid accounts/combinations
  • .vbs dropper – a script that drops a backdoor onto the user’s system upon executing the checker tool
  • Directory created by the checker tool – named with the current date and time of the tool execution that contains results in these text files:
    • Error.txt – contains any errors
    • Good.txt – verified results
    • Ostatok.txt – Ostatok means “the rest” or “remainder”

Contents of the .zip file. 03.10_17:55 is the directory created by the tool containing the checker results. Both .vbs are exactly the same backdoor dropper. The rest are configuration files and the checker tool itself.

 

Contents of the directory created by the checker tool

Almost all of the email accounts inside these .txt files are from .fr domains, and one of them is actually the same address we saw used as a sender in one of the phishing emails in 19th of October. Was this tool used by the attackers behind this campaign? It seems rather fitting.

But what caused them to ZIP up this tool along with the results to Virustotal?

When opening the All In One Checker tool, you are greeted with a lovely message and pressing continue will attempt to install the backdoor.

We replaced the .vbs dropper with Wscript.Echo() alert

 

Hey great!

Perhaps they wanted to check the files because they accidentally infected themselves with a backdoor.

 

Indicators

This is a non-exhaustive list of indicators observed during the campaign.

2bv9npptni4u46knazx2.pw
p95jadah5you6bf1dpgm.pw
lho33cefy1g.pw
mail-server-zpqn8wcphgj.pw
http://piecejointe.pro/facture/redirect.php
http://email-document-joint.pro/redir/
l45yvbz21a.website
95plb963jjhjxd.space
sjvmrvovndqo2u.icu
jeu0rgf5apd5337.site
95.222.24.44 - Email Server
109.102.72.74 - Email Server
83.143.150.210 - Email Server
37.60.177.228 - Web Server / Malware C2  
87.236.22.87 Web Server / Malware C2 
207.180.233.109 - Web Server
91.109.5.170 - Web Server
162.255.119.96 - Web Server
185.86.78.238 - Web Server
176.119.157.62 - Web Server
113.181.61.226

The following indicators have been observed but are benign and can cause false positives.

https://pste.eu
https://t.co
Tags:


Ethics In Artificial Intelligence: Introducing The SHERPA Consortium

In May of this year, Horizon 2020 SHERPA project activities kicked off with a meeting in Brussels. F-Secure is a partner in the SHERPA consortium – a group consisting of 11 members from six European countries – whose mission is to understand how the combination of artificial intelligence and big data analytics will impact ethics and human rights issues today, and in the future (https://www.project-sherpa.eu/).

As part of this project, one of F-Secure’s first tasks will be to study security issues, dangers, and implications of the use of data analytics and artificial intelligence, including applications in the cyber security domain. This research project will examine:

  • ways in which machine learning systems are commonly mis-implemented (and recommendations on how to prevent this from happening)
  • ways in which machine learning models and algorithms can be adversarially attacked (and mitigations against such attacks)
  • how artificial intelligence and data analysis methodologies might be used for malicious purposes

We’ve already done a fair bit of this research*, so expect to see more articles on this topic in the near future!

 

As strange as it sounds, I sometimes find powerpoint a good tool for arranging my thoughts, especially before writing a long document. As an added bonus, I have a presentation ready to go, should I need it.

 

 

Some members of the SHERPA project recently attended WebSummit in Lisbon – a four day event with over 70,000 attendees and over 70 dedicated discussions and panels. Topics related to artificial intelligence were prevalent this year, ranging from tech presentations on how to develop better AI, to existential debates on the implications of AI on the environment and humanity. The event attracted a wide range of participants, including many technologists, politicians, and NGOs.

During WebSummit, SHERPA members participated in the Social Innovation Village, where they joined forces with projects and initiatives such as Next Generation Internet, CAPPSI, MAZI, DemocratieOuverte, grassroots radio, and streetwize to push for “more social good in technology and more technology in social good”. Here, SHERPA researchers showcased the work they’ve already done to deepen the debate on the implications of AI in policing, warfare, education, health and social care, and transport.

The presentations attracted the keen interest of representatives from more than 100 large and small organizations and networks in Europe and further afield, including the likes of Founder’s Institute, Google, and Amazon, and also led to a public commitment by Carlos Moedas, the European Commissioner for Research, Science and Innovation. You can listen to the highlights of the conversation here.

To get a preview of SHERPA’s scenario work and take part in the debate click here.

 


* If you’re wondering why I haven’t blogged in a long while, it’s because I’ve been hiding away, working on a bunch of AI-related research projects (such as this). Down the road, I’m hoping to post more articles and code – if and when I have results to share 😉



Spam campaign targets Exodus Mac Users

We’ve seen a small spam campaign that attempts to target Mac users that use Exodus, a multi-cryptocurrency wallet.

The theme of the email focuses mainly on Exodus. The attachment was “Exodus-MacOS-1.64.1-update.zip” and the sender domain was “update-exodus[.]io”, suggesting that it wanted to associate itself to the organization. It was trying to deliver a fake Exodus update by using the subject “Update 1.64.1 Release – New Assets and more”. Whereas, the latest released version for Exodus is 1.63.1.

Fake Exodus Update email

Extracting the attached archive leads to the application which was apparently created yesterday.

Spytool’s creation date

The application contains a mach-O binary with the filename “rtcfg”. The legitimate Exodus application, however, uses “Exodus”.

We checked out the strings and found a bunch of references to “realtime-spy-mac[.]com” website.

From the website, the developer described their software as a cloud-based surveillance and remote spy tool. Their standard offering costs $79.95 and comes with a cloud-based account where users can view the images and data that the tool uploaded from the target machine. The strings that was extracted from the Mac binary from the mail spam coincides with the features mentioned in the realtime-spy-mac[.]com tool.

Strings inside the Realtime-Spy tool

Searching for similar instances of the Mac keylogger in our repository yielded to other samples using these filenames:

  • taxviewer.app
  • picupdater.app
  • macbook.app
  • launchpad.app

Based on the spy tool’s website, it appears that it does not only support Mac, but Windows as well. It’s not the first time that we’ve seen Windows threats target Mac. As the crimeware threat actors in Windows take advantage of the cryptocurrency trend, they too seem to want to expand their reach, thus also ended up targeting Mac users.

Indicators of Compromise

SHA1:

  • b6f5a15d189f4f30d502f6e2a08ab61ad1377f6a – rtcfg
  • 3095c0450871f4e1d14f6b1ccaa9ce7c2eaf79d5 – Exodus-MacOS-1.64.1-update.zip
  • 04b9bae4cc2dbaedc9c73c8c93c5fafdc98983aa – picupdater.app.zip
  • c22e5bdcb5bf018c544325beaa7d312190be1030 – taxviewer.app.zip
  • d3150c6564cb134f362b48cee607a5d32a73da66 – launchpad.app.zip
  • bf54f81d7406a7cfe080b42b06b0a6892fcd2f37 – macbook.app.zip

Detection:

  • Monitor:OSX/Realtimespy.b6f5a15d18!Online

Domain:

  • realtime-spy-mac[.]com
  • update-exodus[.]io

 



Value-Driven Cybersecurity

Constructing an Alliance for Value-driven Cybersecurity (CANVAS) launched ~two years ago with F-Secure as a member. The goal of the EU project is “to unify technology developers with legal and ethical scholars and social scientists to approach the challenge of how cybersecurity can be aligned with European values and fundamental rights.” (That’s a mouthful, right?) Basically, Europe wants to align cybersecurity and human rights.

If you don’t see the direct connection between human rights and cybersecurity, consider this: the EU’s General Data Protection Regulation (GDPR) is human rights law. Everybody’s data is covered by GDPR. Meanwhile, in the USA… California’s legislature is working on a data privacy bill, and there’s now a growing amount of lobbyists fighting over how to define just what a “consumer” is. So, in the USA, data protection is not human rights law, it’s consumer protection law (and there are likely to be plenty of legal loopholes). And in the end, not everybody’s data will be covered.

So there you go, the EU sees cybersecurity as something that affects everybody, and the CANVAS project is part of its efforts to ensure that the rights of all are respected.

As part of the project, on May 28th & 29th of this year, a workshop was organized by F-Secure at our HQ on ethics-related challenges that cybersecurity companies and cooperating organizations face in their research and operations. Which is to say, what are the considerations that cybersecurity companies and related organizations must take into account to be upstanding citizens?

The theme made for excellent workshop material. Also, the weather was uncharacteristically cooperative (we picked May to increase the odds in our favor), the presentations were great, and the resulting discussions were lively.

Topics included:

  • Investigation of nation-state cyber operations.
  • Vulnerability disclosure and the creation of proof-of-concept code for: public awareness; incentivizing vulnerability fixing efforts; security research; penetration testing; and other purposes.
  • Control of personal devices. Backdoors and use of government sponsored “malware” as possible countermeasures to the ubiquitous use of encryption.
  • Ethics, artificial intelligence, and cybersecurity.
  • Assisting law enforcement agencies without violating privacy, a CERT viewpoint.
  • Targeted attacks and ethical choices arising due to attacker and defender operations.
  • Privacy and its assurance through data economy and encryption, balancing values with financial interests of companies.

The workshop participants included a mix of cybersecurity practitioners and representatives from policy focused organizations. The Chatham House rule (in particular, no recording policy) was used to allow for free and open discussion.

So, in that spirit, names and talks won’t be included in text of this post. But, for those who are interested in reading more, approved bios and presentation summaries can be found in the workshop report (final draft).

Next up on the CANVAS agenda for F-Secure?

CISO Erka Koivunen will be in Switzerland next week (September 5th & 6th) at The Bern University of Applied Sciences attending a workshop on: Cybersecurity Challenges in the Government Sphere – Ethical, Legal and Technical Aspects.

Erka has worked in government in the past, so his perspective covers both sides of the fence. His presentation is titled: Serve the customer by selling… tainted goods?! Why F-Secure too will start publishing Transparency Reports.

Tags:


Taking Pwnie Out On The Town

Black Hat 2018 is now over, and the winners of the Pwnie Awards have been published. The Best Client-Side Bug was awarded to Georgi Geshev and Rob Miller for their work called “The 12 Logic Bug Gifts of Christmas.”

Pwnies2018 The Pwnie Awards

Georgi and Rob work for MWR Infosecurity, which (as some of you might remember) was acquired by F-Secure earlier this year. Both MWR and F-Secure have a long history of geek culture. One thing we’ve done for years, is to take our trophies (or “Poikas”) for a trip around the town. For an example, see this.

So, while the Pwnie is still here in Helsinki before going home to UK, we took it around the town!

Pwnie2018 HQ

Pwnie at the F-Secure HQ.

Pwnie2018 Kanava

Pwnie at Ruoholahdenkanava.

Pwnie2018 Harbour

Pwnie at the West Harbour.

Pwnie2018 Baana

Pwnie looks at the Baana.

Pwnie2018 Museum

Pwnie and some Atari 2600s and laptops at the Helsinki computer museum.

Pwnie2018 MIG

Pwnie looking at a Mikoyan-Gurevich MiG-21 supersonic interceptor.

Pwnie2018 Tennispalatsi

Pwnie at the Art Museum.

Pwnie2018 awards

Pwnie chilling on our award shelf.

Pwnie2018 Parliament

Pwnie at the house of parliament.

pwnie2018_tuomiokirkko

Pwnie at the Tuomikirkko Cathedral.

Thanks for all the bugs, MWR!

Mikko

P.S. If you’re wondering about the word “Poika” we use for trophies, here’s a short documentary video that explains it in great detail.

Tags:


How To Locate Domains Spoofing Campaigns (Using Google Dorks) #Midterms2018

The government accounts of US Senator Claire McCaskill (and her staff) were targeted in 2017 by APT28 A.K.A. “Fancy Bear” according to an article published by The Daily Beast on July 26th. Senator McCaskill has since confirmed the details.

And many of the subsequent (non-technical) articles that have been published has focused almost exclusively on the fact that McCaskill is running for re-election in 2018. But, is it really conclusive that this hacking attempt was about the 2018 midterms? After all, Senator McCaskill is the top-ranking Democrat on the Homeland Security & Governmental Affairs Committee and also sits on the Armed Services Committee. Perhaps she and her staffers were instead targeted for insights into on-going Senate investigations?

Senator Claire McCaskill's Committee Assignments

Because if you want to target an election campaign, you should target the candidate’s campaign server, not their government accounts. (Elected officials cannot use government accounts/resources for their personal campaigns.) In the case of Senator McCaskill, the campaign server is: clairemccaskill.com.

Which appears to be a WordPress site.

clairemccaskill.com/robots.txt

Running on an Apache server.

clairemccaskill.com Apache error log

And it has various e-mail addresses associated with it.

clairemccaskill.com email addresses

That looks interesting, right? So… let’s do some Google dorking!

Searching for “clairemccaskill.com” in URLs while discarding the actual site yielded a few pages of results.

Google dork: inurl:clairemccaskill.com -site:clairemccaskill.com

And on page two of those results, this…

clairemccaskill.com.de

Definitely suspicious.

Whats is com.de? It’s a domain on the .de TLD (not a TLD itself).

.com.de

Okay, so… what other interesting domains associated with com.de are there to discover?

How about additional US Senators up for re-election such as Florida Senator Bill Nelson? Yep.

nelsonforsenate.com.de

Senator Bob Casey? Yep.

bobcasey.com.de

And Senator Sheldon Whitehouse? Yep.

whitehouseforsenate.com.de

But that’s not all. Democrats aren’t the only ones being spoofed.

Iowa Senate Republicans.

iowasenaterepublicans.com.de

And “Senate Conservatives“.

senateconservatives.com.de

Hmm. Well, while being no more closer to knowing whether or not Senator McCaskill’s government accounts were actually targeted because of the midterm elections – the domains shown above are definitely shady AF. And enough to give cause for concern that the 2018 midterms are indeed being targeted, by somebody.

(Our research continues.)

Meanwhile, the FBI might want to get in touch with the owners of com.de.



Video: Creating Graph Visualizations With Gephi

I wanted to create a how-to blog post about creating gephi visualizations, but I realized it’d probably need to include, like, a thousand embedded screenshots. So I made a video instead.



Pr0nbots2: Revenge Of The Pr0nbots

A month and a half ago I posted an article in which I uncovered a series of Twitter accounts advertising adult dating (read: scam) websites. If you haven’t read it yet, I recommend taking a look at it before reading this article, since I’ll refer back to it occasionally.

To start with, let’s recap. In my previous research, I used a script to recursively query Twitter accounts for specific patterns, and found just over 22,000 Twitter bots using this process. This figure was based on the fact that I concluded my research (stopped my script) after querying only 3000 of the 22,000 discovered accounts. I have a suspicion that my script would have uncovered a lot more accounts, had I let it run longer.

This week, I decided to re-query all the Twitter IDs I found in March, to see if anything had changed. To my surprise, I was only able to query 2895 of the original 21964 accounts, indicating that Twitter has taken action on most of those accounts.

In order to find out whether the culled accounts were deleted or suspended, I wrote a small python script that utilized the requests module to directly query each account’s URL. If the script encountered a 404 error, it indicated that the account was removed or renamed. A reply indicated that the account was suspended. Of the 19069 culled accounts checked, 18932 were suspended, and 137 were deleted/renamed.

I also checked the surviving accounts in a similar manner, using requests to identify which ones were “restricted” (by checking for specific strings in the html returned from the query). Of the 2895 surviving accounts, 47 were set to restricted and the other 2848 were not.

As noted in my previous article, the accounts identified during my research had creation dates ranging from a few days old to over a decade in age. I checked the creation dates of both the culled set and the survivor’s set (using my previously recorded data) for patterns, but I couldn’t find any. Here they are, for reference:

Based on the connectivity I recorded between the original bot accounts, I’ve created a new graph visualization depicting the surviving communities. Of the 2895 survivors, only 402 presumably still belong to the communities I observed back then. The rest of the accounts were likely orphaned. Here’s a representation of what the surviving communities might look like, if the entity controlling these accounts didn’t make any changes in the meantime.

By the way, I’m using Gephi to create these graph visualizations, in case you were wondering.

Erik Ellason (@slickrockweb) contacted me recently with some evidence that the bots I’d discovered might be re-tooling. He pointed me to a handful of accounts that contained the shortened URL in a pinned tweet (instead of in the account’s description). Here’s an example profile:

Fetching a user object using the Twitter API will also return the last tweet that account published, but I’m not sure it would necessarily return the pinned Tweet. In fact, I don’t think there’s a way of identifying a pinned Tweet using the standard API. Hence, searching for these accounts by their promotional URL would be time consuming and problematic (you’d have to iterate through their tweets).

Fortunately, automating discovery of Twitter profiles similar to those Eric showed me was fairly straightforward. Like the previous botnet, the accounts could be crawled due to the fact that they follow each other. Also, all of these new accounts had text in their descriptions that followed a predictable pattern. Here’s an example of a few of those sentences:

look url in last post
go on link in top tweet
go at site in last post

It was trivial to construct a simple regular expression to find all such sentences:

desc_regex = "(look|go on|go at|see|check|click) (url|link|site) in (top|last) (tweet|post)"

I modified my previous script to include the above regular expression, seeded it with the handful of accounts that Eric had provided me, and let it run. After 24 hours, my new script had identified just over 20000 accounts. Mapping the follower/following relationships between these accounts gave me the following graph:

As we zoom in, you’ll notice that these accounts are way more connected than the older botnet. The 20,000 or so accounts identified at this point map to just over 100 separate communities. With roughly the same amount of accounts, the previous botnet contained over 1000 communities.

Zooming in further shows the presence of “hubs” in each community, similar to in our previous botnet.

Given that this botnet showed a greater degree of connectivity than the previous one studied, I decided to continue my discovery script and collect more data. The discovery rate of new accounts slowed slightly after the first 24 hours, but remained steady for the rest of the time it was running. After 4 days, my script had found close to 44,000 accounts.

And eight days later, the total was just over 80,000.

Here’s another way of visualizing that data:


Here’s the size distribution of communities detected for the 80,000 node graph. Smaller community sizes may indicate places where my discovery script didn’t yet look. The largest communities contained over 1000 accounts. There may be a way of searching more efficiently for these accounts by prioritizing crawling within smaller communities, but this is something I’ve yet to explore.

I shut down my discovery script at this point, having queried just over 30,000 accounts. I’m fairly confident this rabbit hole goes a lot deeper, but it would have taken weeks to query the next 50,000 accounts, not to mention the countless more that would have been added to the list during that time.

As with the previous botnet, the creation dates of these accounts spanned over a decade.

Here’s the oldest account I found.

Using the same methodology I used to analyze the survivor accounts from the old botnet, I checked which of these new accounts were restricted by Twitter. There was an almost exactly even split between restricted and non-restricted accounts in this new set.

Given that these new bots show many similarities to the previously discovered botnet (similar avatar pictures, same URL shortening services, similar usage of the English language) we might speculate that this new set of accounts is being managed by the same entity as those older ones. If this is the case, a further hypothesis is that said entity is re-tooling based on Twitter’s action against their previous botnet (for instance, to evade automation).

Because these new accounts use a pinned Tweet to advertise their services, we can test this hypothesis by examining the creation dates of the most recent Tweet from each account. If the entity is indeed re-tooling, all of the accounts should have Tweeted fairly recently. However, a brief examination of last tweet dates for these accounts revealed a rather large distribution, tracing back as far as 2012. The distribution had a long tail, with a majority of the most recent Tweets having been published within the last year. Here’s the last year’s worth of data graphed.

Here’s the oldest Tweet I found:

This data, on it’s own, would refute the theory that the owner of this botnet has been recently retooling. However, a closer look at some of the discovered accounts reveals an interesting story. Here are a few examples.

This account took a 6 year break from Twitter, and switched language to English.

This account mentions a “url in last post” in its bio, but there isn’t one.

This account went from posting in Korean to posting in English, with a 3 year break in between. However, the newer Tweet mentions “url in bio”. Sounds vaguely familiar.

Examining the text contained in the last Tweets from these discovered accounts revealed around 76,000 unique Tweets. Searching these Tweets for links containing the URL shortening services used by the previous botnet revealed 8,200 unique Tweets. Here’s a graph of the creation dates of those particular Tweets.

As we can see, the Tweets containing shortened URLs date back only 21 days. Here’s a distribution of domains seen in those Tweets.

My current hypothesis is that the owner of the previous botnet has purchased a batch of Twitter accounts (of varying ages) and has been, at least for the last 21 days, repurposing those accounts to advertise adult dating sites using the new pinned-Tweet approach.

One final thing – I checked the 2895 survivor accounts from the previously discovered botnet to see if any had been reconfigured to use a pinned Tweet. At the time of checking, only one of those accounts had been changed.

If you’re interested in looking at the data I collected, I’ve uploaded names/ids of all discovered accounts, the follower/following mappings found between these accounts, the gephi save file for the 80,000 node graph, and a list of accounts queried by my script (in case someone would like to continue iterating through the unqueried accounts.) You can find all of that data in this github repo.



Marketing “Dirty Tinder” On Twitter

About a week ago, a Tweet I was mentioned in received a dozen or so “likes” over a very short time period (about two minutes). I happened to be on my computer at the time, and quickly took a look at the accounts that generated those likes. They all followed a similar pattern. Here’s an example of one of the accounts’ profiles:

This particular avatar was very commonly used as a profile picture in these accounts.

All of the accounts I checked contained similar phrases in their description fields. Here’s a list of common phrases I identified:

  • Check out
  • Check this
  • How do you like my site
  • How do you like me
  • You love it harshly
  • Do you like fast
  • Do you like it gently
  • Come to my site
  • Come in
  • Come on
  • Come to me
  • I want you
  • You want me
  • Your favorite
  • Waiting you
  • Waiting you at

All of the accounts also contained links to URLs in their description field that pointed to domains such as the following:

  • me2url.info
  • url4.pro
  • click2go.info
  • move2.pro
  • zen5go.pro
  • go9to.pro

It turns out these are all shortened URLs, and the service behind each of them has the exact same landing page:

“I will ban drugs, spam, porn, etc.” Yeah, right.

My colleague, Sean, checked a few of the links and found that they landed on “adult dating” sites. Using a VPN to change the browser’s exit node, he noticed that the landing pages varied slightly by region. In Finland, the links ended up on a site called “Dirty Tinder”.

Checking further, I noticed that some of the accounts either followed, or were being followed by other accounts with similar traits, so I decided to write a script to programmatically “crawl” this network, in order to see how large it is.

The script I wrote was rather simple. It was seeded with the dozen or so accounts that I originally witnessed, and was designed to iterate friends and followers for each user, looking for other accounts displaying similar traits. Whenever a new account was discovered, it was added to the query list, and the process continued. Of course, due to Twitter API rate limit restrictions, the whole crawler loop was throttled so as to not perform more queries than the API allowed for, and hence crawling the network took quite some time.

My script recorded a graph of which accounts were following/followed by which other accounts. After a few hours I checked the output and discovered an interesting pattern:

Graph of follower/following relationships between identified accounts after about a day of running the discovery script.

The discovered accounts seemed to be forming independent “clusters” (through follow/friend relationships). This is not what you’d expect from a normal social interaction graph.

After running for several days the script had queried about 3000 accounts, and discovered a little over 22,000 accounts with similar traits. I stopped it there. Here’s a graph of the resulting network.

Pretty much the same pattern I’d seen after one day of crawling still existed after one week. Just a few of the clusters weren’t “flower” shaped. Here’s a few zooms of the graph.

 

Since I’d originally noticed several of these accounts liking the same tweet over a short period of time, I decided to check if the accounts in these clusters had anything in common. I started by checking this one:

Oddly enough, there were absolutely no similarities between these accounts. They were all created at very different times and all Tweeted/liked different things at different times. I checked a few other clusters and obtained similar results.

One interesting thing I found was that the accounts were created over a very long time period. Some of the accounts discovered were over eight years old. Here’s a breakdown of the account ages:

As you can see, this group has less new accounts in it than older ones. That big spike in the middle of the chart represents accounts that are about six years old. One reason why there are fewer new accounts in this network is because Twitter’s automation seems to be able to flag behaviors or patterns in fresh accounts and automatically restrict or suspend them. In fact, while my crawler was running, many of the accounts on the graphs above were restricted or suspended.

Here are a few more breakdowns – Tweets published, likes, followers and following.

Here’s a collage of some of the profile pictures found. I modified a python script to generate this – far better than using one of those “free” collage making tools available on the Internets. 🙂

So what are these accounts doing? For the most part, it seems they’re simply trying to advertise the “adult dating” sites linked in the account profiles. They do this by liking, retweeting, and following random Twitter accounts at random times, fishing for clicks. I did find one that had been helping to sell stuff:

Individually the accounts probably don’t break any of Twitter’s terms of service. However, all of these accounts are likely controlled by a single entity. This network of accounts seems quite benign, but in theory, it could be quickly repurposed for other tasks including “Twitter marketing” (paid services to pad an account’s followers or engagement), or to amplify specific messages.

If you’re interested, I’ve saved a list of both screen_name and id_str for each discovered account here. You can also find the scraps of code I used while performing this research in that same github repo.



How To Get Twitter Follower Data Using Python And Tweepy

In January 2018, I wrote a couple of blog posts outlining some analysis I’d performed on followers of popular Finnish Twitter profiles. A few people asked that I share the tools used to perform that research. Today, I’ll share a tool similar to the one I used to conduct that research, and at the same time, illustrate how to obtain data about a Twitter account’s followers.

This tool uses Tweepy to connect to the Twitter API. In order to enumerate a target account’s followers, I like to start by using Tweepy’s followers_ids() function to get a list of Twitter ids of accounts that are following the target account. This call completes in a single query, and gives us a list of Twitter ids that can be saved for later use (since both screen_name and name an be changed, but the account’s id never changes). Once I’ve obtained a list of Twitter ids, I can use Tweepy’s lookup_users(userids=batch) to obtain Twitter User objects for each Twitter id. As far as I know, this isn’t exactly the documented way of obtaining this data, but it suits my needs. /shrug

Once a full set of Twitter User objects has been obtained, we can perform analysis on it. In the following tool, I chose to look at the account age and friends_count of each account returned, print a summary, and save a summarized form of each account’s details as json, for potential further processing. Here’s the full code:

from tweepy import OAuthHandler
from tweepy import API
from collections import Counter
from datetime import datetime, date, time, timedelta
import sys
import json
import os
import io
import re
import time

# Helper functions to load and save intermediate steps
def save_json(variable, filename):
    with io.open(filename, "w", encoding="utf-8") as f:
        f.write(unicode(json.dumps(variable, indent=4, ensure_ascii=False)))

def load_json(filename):
    ret = None
    if os.path.exists(filename):
        try:
            with io.open(filename, "r", encoding="utf-8") as f:
                ret = json.load(f)
        except:
            pass
    return ret

def try_load_or_process(filename, processor_fn, function_arg):
    load_fn = None
    save_fn = None
    if filename.endswith("json"):
        load_fn = load_json
        save_fn = save_json
    else:
        load_fn = load_bin
        save_fn = save_bin
    if os.path.exists(filename):
        print("Loading " + filename)
        return load_fn(filename)
    else:
        ret = processor_fn(function_arg)
        print("Saving " + filename)
        save_fn(ret, filename)
        return ret

# Some helper functions to convert between different time formats and perform date calculations
def twitter_time_to_object(time_string):
    twitter_format = "%a %b %d %H:%M:%S %Y"
    match_expression = "^(.+)\s(\+[0-9][0-9][0-9][0-9])\s([0-9][0-9][0-9][0-9])$"
    match = re.search(match_expression, time_string)
    if match is not None:
        first_bit = match.group(1)
        second_bit = match.group(2)
        last_bit = match.group(3)
        new_string = first_bit + " " + last_bit
        date_object = datetime.strptime(new_string, twitter_format)
        return date_object

def time_object_to_unix(time_object):
    return int(time_object.strftime("%s"))

def twitter_time_to_unix(time_string):
    return time_object_to_unix(twitter_time_to_object(time_string))

def seconds_since_twitter_time(time_string):
    input_time_unix = int(twitter_time_to_unix(time_string))
    current_time_unix = int(get_utc_unix_time())
    return current_time_unix - input_time_unix

def get_utc_unix_time():
    dts = datetime.utcnow()
    return time.mktime(dts.timetuple())

# Get a list of follower ids for the target account
def get_follower_ids(target):
    return auth_api.followers_ids(target)

# Twitter API allows us to batch query 100 accounts at a time
# So we'll create batches of 100 follower ids and gather Twitter User objects for each batch
def get_user_objects(follower_ids):
    batch_len = 100
    num_batches = len(follower_ids) / 100
    batches = (follower_ids[i:i+batch_len] for i in range(0, len(follower_ids), batch_len))
    all_data = []
    for batch_count, batch in enumerate(batches):
        sys.stdout.write("\r")
        sys.stdout.flush()
        sys.stdout.write("Fetching batch: " + str(batch_count) + "/" + str(num_batches))
        sys.stdout.flush()
        users_list = auth_api.lookup_users(user_ids=batch)
        users_json = (map(lambda t: t._json, users_list))
        all_data += users_json
    return all_data

# Creates one week length ranges and finds items that fit into those range boundaries
def make_ranges(user_data, num_ranges=20):
    range_max = 604800 * num_ranges
    range_step = range_max/num_ranges

# We create ranges and labels first and then iterate these when going through the whole list
# of user data, to speed things up
    ranges = {}
    labels = {}
    for x in range(num_ranges):
        start_range = x * range_step
        end_range = x * range_step + range_step
        label = "%02d" % x + " - " + "%02d" % (x+1) + " weeks"
        labels[label] = []
        ranges[label] = {}
        ranges[label]["start"] = start_range
        ranges[label]["end"] = end_range
    for user in user_data:
        if "created_at" in user:
            account_age = seconds_since_twitter_time(user["created_at"])
            for label, timestamps in ranges.iteritems():
                if account_age > timestamps["start"] and account_age < timestamps["end"]:
                    entry = {} 
                    id_str = user["id_str"] 
                    entry[id_str] = {} 
                    fields = ["screen_name", "name", "created_at", "friends_count", "followers_count", "favourites_count", "statuses_count"] 
                    for f in fields: 
                        if f in user: 
                            entry[id_str][f] = user[f] 
                    labels[label].append(entry) 
    return labels

if __name__ == "__main__": 
    account_list = [] 
    if (len(sys.argv) > 1):
        account_list = sys.argv[1:]

    if len(account_list) < 1:
        print("No parameters supplied. Exiting.")
        sys.exit(0)

    consumer_key=""
    consumer_secret=""
    access_token=""
    access_token_secret=""

    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    auth_api = API(auth)

    for target in account_list:
        print("Processing target: " + target)

# Get a list of Twitter ids for followers of target account and save it
        filename = target + "_follower_ids.json"
        follower_ids = try_load_or_process(filename, get_follower_ids, target)

# Fetch Twitter User objects from each Twitter id found and save the data
        filename = target + "_followers.json"
        user_objects = try_load_or_process(filename, get_user_objects, follower_ids)
        total_objects = len(user_objects)

# Record a few details about each account that falls between specified age ranges
        ranges = make_ranges(user_objects)
        filename = target + "_ranges.json"
        save_json(ranges, filename)

# Print a few summaries
        print
        print("\t\tFollower age ranges")
        print("\t\t===================")
        total = 0
        following_counter = Counter()
        for label, entries in sorted(ranges.iteritems()):
            print("\t\t" + str(len(entries)) + " accounts were created within " + label)
            total += len(entries)
            for entry in entries:
                for id_str, values in entry.iteritems():
                    if "friends_count" in values:
                        following_counter[values["friends_count"]] += 1
        print("\t\tTotal: " + str(total) + "/" + str(total_objects))
        print
        print("\t\tMost common friends counts")
        print("\t\t==========================")
        total = 0
        for num, count in following_counter.most_common(20):
            total += count
            print("\t\t" + str(count) + " accounts are following " + str(num) + " accounts")
        print("\t\tTotal: " + str(total) + "/" + str(total_objects))
        print
        print

Let’s run this tool against a few accounts and see what results we get. First up: @realDonaldTrump

realdonaldtrump_age_ranges

Age ranges of new accounts following @realDonaldTrump

As we can see, over 80% of @realDonaldTrump’s last 5000 followers are very new accounts (less than 20 weeks old), with a majority of those being under a week old. Here’s the top friends_count values of those accounts:

realdonaldtrump_friends_counts

Most common friends_count values seen amongst the new accounts following @realDonaldTrump

No obvious pattern is present in this data.

Next up, an account I looked at in a previous blog post – @niinisto (the president of Finland).

Age ranges of new accounts following @niinisto

Many of @niinisto’s last 5000 followers are new Twitter accounts. However, not in as large of a proportion as in the @realDonaldTrump case. In both of the above cases, this is to be expected, since both accounts are recommended to new users of Twitter. Let’s look at the friends_count values for the above set.

Most common friends_count values seen amongst the new accounts following @niinisto

In some cases, clicking through the creation of a new Twitter account (next, next, next, finish) will create an account that follows 21 Twitter profiles. This can explain the high proportion of accounts in this list with a friends_count value of 21. However, we might expect to see the same (or an even stronger) pattern with the @realDonaldTrump account. And we’re not. I’m not sure why this is the case, but it could be that Twitter has some automation in place to auto-delete programmatically created accounts. If you look at the output of my script you’ll see that between fetching the list of Twitter ids for the last 5000 followers of @realDonaldTrump, and fetching the full Twitter User objects for those ids, 3 accounts “went missing” (and hence the tool only collected data for 4997 accounts.)

Finally, just for good measure, I ran the tool against my own account (@r0zetta).

Age ranges of new accounts following @r0zetta

Here you see a distribution that’s probably common for non-celebrity Twitter accounts. Not many of my followers have new accounts. What’s more, there’s absolutely no pattern in the friends_count values of these accounts:

Most common friends_count values seen amongst the new accounts following @r0zetta

Of course, there are plenty of other interesting analyses that can be performed on the data collected by this tool. Once the script has been run, all data is saved on disk as json files, so you can process it to your heart’s content without having to run additional queries against Twitter’s servers. As usual, have fun extending this tool to your own needs, and if you’re interested in reading some of my other guides or analyses, here’s full list of those articles.



Improving Caching Strategies With SSICLOPS

F-Secure development teams participate in a variety of academic and industrial collaboration projects. Recently, we’ve been actively involved in a project codenamed SSICLOPS. This project has been running for three years, and has been a joint collaboration between ten industry partners and academic entities. Here’s the official description of the project. “The Scalable and Secure […]

2018-02-26

Searching Twitter With Twarc

Twarc makes it really easy to search Twitter via the API. Simply create a twarc object using your own API keys and then pass your search query into twarc’s search() function to get a stream of Tweet objects. Remember that, by default, the Twitter API will only return results from the last 7 days. However, […]

2018-02-16

NLP Analysis Of Tweets Using Word2Vec And T-SNE

In the context of some of the Twitter research I’ve been doing, I decided to try out a few natural language processing (NLP) techniques. So far, word2vec has produced perhaps the most meaningful results. Wikipedia describes word2vec very precisely: “Word2vec takes as its input a large corpus of text and produces a vector space, typically of several […]

2018-01-30

NLP Analysis And Visualizations Of #presidentinvaalit2018

During the lead-up to the January 2018 Finnish presidential elections, I collected a dataset consisting of raw Tweets gathered from search words related to the election. I then performed a series of natural language processing experiments on this raw data. The methodology, including all the code used, can be found in an accompanying blog post. […]

2018-01-30

How To Get Tweets From A Twitter Account Using Python And Tweepy

In this blog post, I’ll explain how to obtain data from a specified Twitter account using tweepy and Python. Let’s jump straight into the code! As usual, we’ll start off by importing dependencies. I’ll use the datetime and Counter modules later on to do some simple analysis tasks. from tweepy import OAuthHandler from tweepy import […]

2018-01-26

How To Get Streaming Data From Twitter

I occasionally receive requests to share my Twitter analysis tools. After a few recent requests, it finally occurred to me that it would make sense to create a series of articles that describe how to use Python and the Twitter API to perform basic analytical tasks. Teach a man to fish, and all that. In […]

2018-01-17

Further Analysis Of The Finnish Themed Twitter Botnet

In a blog post I published yesterday, I detailed the methodology I have been using to discover “Finnish themed” Twitter accounts that are most likely being programmatically created. In my previous post, I called them “bots”, but for the sake of clarity, let’s refer to them as “suspicious accounts”. These suspicious accounts all follow a […]

2018-01-12

Someone Is Building A Finnish-Themed Twitter Botnet

Finland will hold a presidential election on the 28th January 2018. Campaigning just started, and candidates are being regularly interviewed by the press and on the TV. In a recent interview, one of the presidential candidates, Pekka Haavisto, mentioned that both his Twitter account, and the account of the current Finnish president, Sauli Niinistö had […]

2018-01-11

Some Notes On Meltdown And Spectre

The recently disclosed Meltdown and Spectre vulnerabilities can be viewed as privilege escalation attacks that allow an attacker to read data from memory locations that aren’t meant to be accessible. Neither of these vulnerabilities allow for code execution. However, exploits based on these vulnerabilities could allow an adversary to obtain sensitive information from memory (such […]

2018-01-09

Don’t Let An Auto-Elevating Bot Spoil Your Christmas

Ho ho ho! Christmas is coming, and for many people it’s time to do some online shopping. Authors of banking Trojans are well aware of this yearly phenomenon, so it shouldn’t come as a surprise that some of them have been hard at work preparing some nasty surprises for this shopping season. And that’s exactly […]

2017-12-18