Processing Quote Tweets With Twitter API

I’ve been writing scripts to process Twitter streaming data via the Twitter API. One of those scripts looks for patterns in metadata and associations between accounts, as streaming data arrives. The script processes retweets, and I decided to add functionality to also process quote Tweets.

Retweets “echo” the original by embedding a copy of the Tweet in a field called retweeted_status:

twitter_API_retweeted_status

Twitter’s API reference entry for retweeted_status

According to Twitter’s own API documentation, a quote Tweet should work in a similar way. (A quote Tweet is like wrapping your tweet around somebody else’s.) A Tweet object containing the quoted Tweet should be available in the quoted_status field.

twitter_API_quoted_status

Twitter’s API reference entry for quoted_status

I some wrote code to fetch and process quoted_status in a similar way to how I was already processing retweeted_status, but it didn’t work. I “asked” Google for answers, but didn’t really find anything, so I decided to dig into what the API was actually returning in the quoted_status field.

It turns out it’s not a Tweet object. Here’s what a quoted_status field actually looks like:

{u'contributors': None, 
 u'truncated': False, 
 u'text': u'', 
 u'is_quote_status': False, 
 u'in_reply_to_status_id': None, 
 u'id': 0, 
 u'favorite_count': 0, 
 u'source': u'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 
 u'retweeted': False, 
 u'coordinates': None, 
 u'entities': {u'user_mentions': [], 
               u'symbols': [], 
               u'hashtags': [], 
               u'urls': []}, 
 u'in_reply_to_screen_name': None, 
 u'id_str': u'', 
 u'retweet_count': 0, 
 u'in_reply_to_user_id': None, 
 u'favorited': False, 
 u'user': {u'follow_request_sent': None, 
           u'profile_use_background_image': True, 
           u'default_profile_image': False, 
           u'id': 0, 
           u'verified': True, 
           u'profile_image_url_https': u'https://pbs.twimg.com/profile_images/', 
           u'profile_sidebar_fill_color': u'FFFFFF', 
           u'profile_text_color': u'FFFFFF', 
           u'followers_count': 0, 
           u'profile_sidebar_border_color': u'FFFFFF', 
           u'id_str': u'0', 
           u'profile_background_color': u'FFFFFF', 
           u'listed_count': 0, 
           u'profile_background_image_url_https': u'https://abs.twimg.com/images/', 
           u'utc_offset': -18000, 
           u'statuses_count': 0, 
           u'description': u"", 
           u'friends_count': 0, 
           u'location': None, 
           u'profile_link_color': u'FFFFFF', 
           u'profile_image_url': u'http://pbs.twimg.com/profile_images/', 
           u'following': None, 
           u'geo_enabled': True, 
           u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/', 
           u'profile_background_image_url': u'http://abs.twimg.com/images/', 
           u'name': u'', 
           u'lang': u'en', 
           u'profile_background_tile': False, 
           u'favourites_count': 0, 
           u'screen_name': u'', 
           u'notifications': None, 
           u'url': None, 
           u'created_at': u'Fri Nov 27 23:14:06 +0000 2009', 
           u'contributors_enabled': False, 
           u'time_zone': u'', 
           u'protected': False, 
           u'default_profile': True, 
           u'is_translator': False}, 
 u'geo': None, 
 u'in_reply_to_user_id_str': None, 
 u'lang': u'en', 
 u'created_at': u'Thu Jun 22 00:33:13 +0000 2017', 
 u'filter_level': u'low', 
 u'in_reply_to_status_id_str': None, 
 u'place': None}

So, it’s a data structure that contains some of the information you might find in a Tweet object. But it’s not an actual Tweet object. Kinda makes sense if you think about it. A quote Tweet can quote other quote Tweets, which can quote other quote Tweets. (Some folks created rather long quote Tweet chains when the feature was first introduced.) So, if the API would return a fully-hydrated Tweet object for a quoted Tweet, that object could contain another Tweet object in its own quoted_status field, and so on, and so on.

Here’s a small piece of python code that looks for retweets and quote Tweets in a stream and retrieves the screen_name of the user who published the original Tweet, if it finds one. It illustrates the differences between handling retweets and quote Tweets.

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy import API

consumer_key="add your own key here"
consumer_secret="add your own secret here"
access_token="add your own token here"
access_token_secret="add your own secret here"

class StdOutListener(StreamListener):
    def on_status(self, status):
        screen_name = status.user.screen_name

        if hasattr(status, 'retweeted_status'):
            retweet = status.retweeted_status
            if hasattr(retweet, 'user'):
                if retweet.user is not None:
                    if hasattr(retweet.user, "screen_name"):
                        if retweet.user.screen_name is not None:
                            retweet_screen_name = retweet.user.screen_name
                            print screen_name + " retweeted " + retweet_screen_name

        if hasattr(status, 'quoted_status'):
            quote_tweet = status.quoted_status
            if 'user' in quote_tweet:
                if quote_tweet['user'] is not None:
                    if "screen_name" in quote_tweet['user']:
                        if quote_tweet['user']['screen_name'] is not None:
                            quote_tweet_screen_name = quote_tweet['user']['screen_name']
                            print screen_name + " quote tweeted " + quote_tweet_screen_name
        return True

    def on_error(self, status):
        print status

if __name__ == '__main__':
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    auth_api = API(auth)
    print "Signing in as: "+auth_api.me().name
    print "Preparing stream"

    stream = Stream(auth, l, timeout=30.0)
    searches = ['donald', 'trump', ]
    while True:
      if 'searches' in locals():
        print"Filtering on:" + str(searches)
        stream.filter(track=searches)
      else:
        print"Getting 1% sample"
        stream.sample()


Super Awesome Fuzzing, Part One

An informative guide on using AFL and libFuzzer.

Posted on behalf of Atte Kettunen (Software Security Expert) & Eero Kurimo (Lead Software Engineer) – Security Research and Technologies.


The point of security software is to make a system more secure. When developing software, one definitely doesn’t want to introduce new points of failure, or to increase the attack surface of the system the software is running on. So naturally, we take secure coding practices and software quality seriously. One good example of how we strive to improve software quality and security at F-Secure is our Vulnerability Reward Program that’s been running for almost two years now. And it’s still running, so do participate if you have a chance! Earlier this year, we posted an article detailing what we learned during the first year. It goes without saying that we have many processes in-house to catch potential bugs and vulnerabilities in our software. In this article, we’d like to explain one of the many processes we use in-house to find vulnerabilities before they reach our customers, and our dear bug bounty hunters.

One method for bug hunting that has proven to be very effective is a technique called fuzzing, where the target program is injected with unexpected or malformed data, in order to reveal input handling errors leading to, for example, an exploitable memory corruption. To create fuzz test cases, a typical fuzzer will either mutate existing sample inputs, or generate test cases based on a defined grammar or ruleset. An even more effective way of fuzzing is coverage guided fuzzing, where program execution paths are used to guide the generation of more effective input data for test cases. Coverage guided fuzzing tries to maximize the code coverage of a program, such that every code branch present in the program is tested. With the emergence of [Google’s!!] open source coverage guided fuzzing tools such as American Fuzzy Lop (AFL), LLVM libFuzzer, and HonggFuzz, using coverage guided fuzzing has never been easier or more approachable. You no longer need to master arcane arts, spend countless hours writing test case generator rules, or collecting input samples that cover all functionality of the target. In the simplest cases you can just compile your existing tool with a different compiler, or isolate the functionality you want to fuzz, write just a few lines of code, and then compile and run the fuzzer. The fuzzer will execute thousands or even tens of thousands of test cases per second, and collect a set of interesting results from triggered behaviors in the target.

If you’d want to get started with coverage guided fuzzing yourself, here’s a couple of examples showing how you’d fuzz libxml2, a widely used XML parsing and toolkit library, with two fuzzers we prefer in-house: AFL and LLVM libFuzzer.

Fuzzing with AFL

Using AFL for a real world example is straightforward. On Ubuntu 16.04 Linux you can get fuzzing libxml2 via its xmllint utility with AFL with just seven commands.

First we install AFL and get the source code of libxml2-utils.

$ apt-get install -y afl
$ apt-get source libxml2-utils

Next we configure libxml2 build to use AFL compilers and compile the xmllint utility.

$ cd libxml2/
$ ./configure CC=afl-gcc CXX=afl-g++
$ make xmllint

Lastly we create a sample file with content “<a></a>” for AFL to start with and run the afl-fuzz.

$ echo "" > in/sample
$ LD_LIBRARY_PATH=./.libs/ afl-fuzz -i ./in -o ./out -- ./.libs/lt-xmllint -o /dev/null @@

AFL will continue fuzzing indefinitely, writing inputs that trigger new code coverage in ./out/queue/, crash triggering inputs in ./out/crashes/ and inputs causing hangs in /out/hangs/. For more information on how to interpret the AFL’s status screen, see: http://lcamtuf.coredump.cx/afl/status_screen.txt

Fuzzing with LLVM libFuzzer

Let’s now fuzz libxml2 with the LLVM libFuzzer. To start fuzzing, you’ll first need to introduce a target function, LLVMFuzzerTestOneInput, that receives the fuzzed input buffer from libFuzzer. The code looks like this.

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
 DoSomethingInterestingWithMyAPI(Data, Size);
 return 0; // Non-zero return values are reserved for future use.
 }

For fuzzing libxml2, Google’s fuzzer test suite provides a good example fuzzing function.

// Copyright 2016 Google Inc. All Rights Reserved.
 // Licensed under the Apache License, Version 2.0 (the "License");
 #include 
 #include 
 #include "libxml/xmlversion.h"
 #include "libxml/parser.h"
 #include "libxml/HTMLparser.h"
 #include "libxml/tree.h"

void ignore (void * ctx, const char * msg, ...) {}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
 xmlSetGenericErrorFunc(NULL, &ignore);
 if (auto doc = xmlReadMemory(reinterpret_cast(data), size, "noname.xml", NULL, 0))
 xmlFreeDoc(doc);
 return 0;
 }

Before compiling our target function, we need to compile all dependencies with clang and -fsanitize-coverage=trace-pc-guard, to enable SanitizerCoverage coverage tracing. It is a good idea to also use -fsanitize=address,undefined in order to enable both the AddressSanitizer(ASAN) and the UndefinedBehaviorSanitizer(UBSAN) that catch many bugs that otherwise might be hard to find.

 $ git clone https://github.com/GNOME/libxml2 libxml2
 $ cd libxml2
 $ FUZZ_CXXFLAGS="-O2 -fno-omit-frame-pointer -g -fsanitize=address,undefined -fsanitize-coverage=trace-pc-guard"
 $ ./autogen.sh
 $ CXX="clang++-5.0 $FUZZ_CXXFLAGS" CC="clang-5.0 $FUZZ_CXXFLAGS" CCLD="clang++-5.0 $FUZZ_CXXFLAGS" ./configure
 $ make

As of writing this post, libFuzzer is not shipped with precompiled clang-5.0 packages of http://apt.llvm.org/, so you’ll still need to checkout and compile libFuzzer.a yourself as documented in http://llvm.org/docs/LibFuzzer.html#getting-started, but this might change in the near future.

The second step is to compile our target function, with the same flags, and link it with both the libFuzzer runtime and the libxml2 we compiled earlier.

$ clang++-5.0 -std=c++11 $FUZZ_CXXFLAGS -lFuzzer ./libxml-test.cc -I ./include ./.libs/libxml2.a -lz -llzma -o libxml-fuzzer

Now we are ready to run our fuzzer.

$ mkdir ./output
$ ./libxml-fuzzer ./output/

We didn’t use any sample inputs, so libFuzzer starts by generating random data in order to find inputs that trigger new code paths in our libxml2 target function. All inputs that trigger new coverage are stored as sample files in ./output. As libFuzzer runs in-process, if a bug is found, it saves the test case and exits. On a high-end laptop, a single instance of libFuzzer reached over 5000 executions per second, slowing down to around 2000 once it started to generate test cases with more coverage. For more information on how to interpret the output see: http://llvm.org/docs/LibFuzzer.html#output

Creating a corpus

If your target is fast, meaning hundreds or even thousands of executions per second, you can try generating a base corpus out of thin air. With coverage guided fuzzing it is possible to do this even with more complex formats like the AFL author Michał Zalewski did with JPEG-files, but to save time, you should get a good representation of typical files for the application that are as small as possible. The smaller the files, the faster they are to fuzz.

AFL does not give any additional flags to tinker with when generating corpus out of thin air. Just give it a small sample input, for example “<a></a>” as an XML sample, and run AFL like you normally would.

With libFuzzer you have more flags to experiment with. For example, for XML you might want to try with ‘-only_ascii=1‘. One good technique for most formats is to execute multiple short runs while incrementing the maximum sample size of our fuzzer on each round and then merge all the results to form the output corpus.

$ for foo in 4 8 16 32 64 128 256 512; do \
./libxml-fuzzer -max_len=$foo -runs=500000 ./temp-corpus-dir; \
done
$ ./libxml-fuzzer -merge=1 ./corpus ./temp-corpus-dir

With this approach, we first collect interesting inputs with maximum length of 4 bytes, the second run analyses the 4 byte inputs and uses those as a base for 8 byte inputs and so on. This way we discover “easy” coverage with faster smaller inputs and when we move to larger files we have a better initial set to start with.

To get some numbers for this technique, we did three runs with the example script.

On average, running the corpus generation script took about 18 minutes on our laptop. LibFuzzer was still frequently discovering new coverage at the end of iterations where -max_len was larger than 8 bytes, which suggests that, for those lengths, libFuzzer should be allowed to run longer.

For comparison, we also took the libFuzzer with default settings and ran it for three rounds, which took about 18 minutes.

$ ./libxml-fuzzer -max_total_time=1080 ./temp-corpus-dir
$ ./libxml-fuzzer -merge=1 ./corpus ./temp-corpus-dir;

From these results we see that our runs with the corpus generation script on average executed more test cases, generated a larger set of files, that triggers more coverage and features than the set generated with the default values. This is due to the size of test cases generated by libFuzzer using default settings. Previously libFuzzer used default -max_len of 64 bytes, but at the time of writing libFuzzer was just updated to have a default -max_len of 4096 bytes. In practice sample sets generated by this script have been very working starting points for fuzzing, but no data has been collected how the effects differ in comparison to default setting in long continuous fuzzing.

Corpus generation out of thin air is an impressive feat, but if we compare these results to the coverage from W3C XML test suite we see that it is a good idea to also to include sample files from different sources to your initial corpus, as you’ll get much better coverage before you’ve even fuzzed the target.

$ wget https://www.w3.org/XML/Test/xmlts20130923.tar.gz -O - | tar -xz
$ ./libxml-fuzzer -merge=1 ./samples ./xmlconf
$ ./libxml-fuzzer -runs=0 ./samples
  #950        DONE   cov: 18067 ft: 74369 corp: 934/2283Kb exec/s: 950 rss: 215Mb

Merging our generated corpus into the W3C test suite increased the block coverage to 18727, so not that much, but we still got a total of 83972 features, increasing the total throughput of these test cases. Both improvements are most probably due to small samples triggering error conditions that were not covered by the W3C test suite.

Trimming your corpus

After fuzzing the target for a while, you’ll end up with a huge set of fuzzed files. A lot of these files are unnecessary, and trimming them to a much smaller set will provide you with the same code coverage of the target. To achieve this, both projects provide corpus minimization tools.

AFL gives you the afl-cmin shell script that you can use to minimize your corpus. For the previous example, to minimize the corpus generated in the ./out directory, you can generate a minimized set of files into the ./output_corpus directory.

$ afl-cmin -i ./out/queue -o ./output_corpus -- ./.libs/lt-xmllint -o /dev/null @@

AFL also offers another tool afl-tmin that can be used to minimize individual files while maintaining the same coverage as observed initially. Be aware that running afl-tmin on a large set of files can take a very long time, so first do couple of iterations with afl-cmin before trying afl-tmin.

LibFuzzer doesn’t have an external trimming tool – it has the corpus minimization feature, called merge, built-in.

$ ./libxml-fuzzer -merge=1 <output directory> <input directory 1> <input directory 2> ... <input directory n>

LibFuzzer merge is a little easier to use since it looks for files recursively from any number of input directories. Another nice feature in libFuzzer merge is the -max_len flag. Using -max_len=X, libFuzzer will only use the first X bytes from each sample file, so you can collect random samples without caring about their sizes. Without the max_len flag, libFuzzer uses a default maximum length of 1048576 bytes when doing a merge.

With libFuzzer merge, you can use the same technique as you did to generate a corpus out of thin air.

$ for foo in 4 8 16 32 64 128 256 512 1024; do
mkdir ./corpus_max_len-$foo;
./libxml-fuzzer -merge=1 -max_len=$foo ./corpus-max_len-$foo ./corpus-max_len-* <input-directories>;
done
$ mkdir output_corpus;
$ ./libxml-fuzzer -merge=1 ./output_corpus ./corpus-max_len-*;

With this trimming strategy libFuzzer will first collect new coverage triggering 2 byte chunks from each input sample, then merge those samples to 4 byte chunks, and so on, until you have the optimized set out of all the different length chunks.

A simple merge won’t always help you with performance issues. Sometimes your fuzzer can stumble upon very slow code paths, causing collected samples to start decaying your fuzzing throughput. If you don’t mind sacrificing a few code blocks for performance, libFuzzer can be easily used to remove too slow samples from your corpus. When libFuzzer is run with a list of files as an argument instead of a folder, it will execute every file individually and print out execution time for each file.

$ ./libxml-fuzzer /* 
INFO: Seed: 3825257193
INFO: Loaded 1 modules (237370 guards): [0x13b3460, 0x149b148), 
./libxml2/libxml-fuzzer: Running 1098 inputs 1 time(s) each.
Running: ./corpus-dir/002ade626996b33f24cb808f9a948919799a45da
Executed ./corpus-dir/002ade626996b33f24cb808f9a948919799a45da in 1 ms
Running: ./corpus-dir/0068e3beeeaecd7917793a4de2251ffb978ef133
Executed ./corpus-dir/0068e3beeeaecd7917793a4de2251ffb978ef133 in 0 ms

With a snippet of awk, this feature can be used to print out names of files that that took too long to run, in our example 100 milliseconds, and then we can just remove those files.

$ ./libxml-fuzzer /* 2>&1 | awk  '$1 == "Executed" && $4 > 100 {print $2}' | xargs -r -I '{}' rm '{}'

Running both fuzzers in parallel

Now that you have a good base corpus, and you know how to maintain it, you can kick off some continuous fuzzing runs. You could run your favorite fuzzer alone, or run both fuzzers separately, but if you’ve got enough hardware available you can also easily run multiple fuzzers simultaneously on the same corpus. That way you get to combine best of both worlds while the fuzzers can share all the new coverage they find.

It’s easy to implement a simple script that will run both fuzzers simultaneously, while restarting the fuzzers every hour to refresh their sample corpus.

$ mkdir libfuzzer-output; echo "" > .libfuzzer-output/1
$ while true; do \
afl-fuzz -d -i ./libfuzzer-output/ -o ./afl-output/ -- ./libxml/afl-output/bin/xmllint -o /dev/null @@ 1>/dev/null & \
./libxml/libxml-fuzzer -max_total_time=3600 ./libfuzzer-output/; \
pkill -15 afl-fuzz; \
sleep 1; \
mkdir ./libfuzzer-merge; \
./libxml/libxml-fuzzer -merge=1 ./libfuzzer-merge ./libfuzzer-output/ ./afl-output/; \
rm -rf ./afl-output ./libfuzzer-output; \
mv ./libfuzzer-merge ./libfuzzer-output; \
done

Because the example script only runs one hour per iteration, AFL is used in “quick & dirty mode” to skip all the deterministic steps. Even one large file can cause AFL to spend hours, or even days, on deterministic steps, so it it’s more reliable to run AFL without them when running on time budget. Deterministic steps can be run manually, or automatically on another instance that copies new samples to ‘./libfuzzer_output‘.

Dictionaries

You have your corpus, and you’re happily fuzzing and trimming. Where do you go from here?

Both AFL and libFuzzer support user-provided dictionaries. These dictionaries should contain keywords, or other interesting byte patterns, that would be hard for the fuzzer to determine. For some useful examples, take a look at Google libFuzzer’s XML dictionary and this AFL blog post about dictionaries.

Since these tools are quite popular nowadays, some good base dictionaries can be already found online. For example, Google has collected quite a few dictionaries: https://chromium.googlesource.com/chromium/src/+/master/testing/libfuzzer/fuzzers/dicts. Also, AFL source code contains few example dictionaries. If you don’t have the source code, you can check out afl mirror from github: https://github.com/rc0r/afl-fuzz/tree/master/dictionaries

Both AFL and libFuzzer also collect dictionary during execution. AFL collects dictionary when performing deterministic fuzzing steps, while libFuzzer approach is to instrument.

When running libFuzzer with time or test case limit, libFuzzer will output a recommended dictionary upon exit. This feature can be used to collect interesting dictionary entries, but it is recommended to do manual sanity checks over all automatically collected entries. libFuzzer builds those dictionary entries as it discovers new coverage, so those entries often build up towards the final keyword.

"ISO-"
"ISO-1"
"ISO-10"
"ISO-1064"
"ISO-10646-"
"ISO-10646-UCS-2"
"ISO-10646-UCS-4"

We tested dictionaries with three 10 minute runs: without dictionary, with the recommended dictionary from first run and with the Google’s libFuzzer XML dictionary. Results can be seen from the table below.

Surprisingly, there was no significant difference between the results from the run without dictionary and the run with recommended dictionary from the first run, but with a “real” dictionary there is a dramatic change in the amount of coverage discovered during the run.

Dictionaries can really change the effectiveness of fuzzing, at least on short runs, so they are worth the investment. Shortcuts, like the libFuzzer recommended dictionary, can help, but you still need to do the extra manual effort to leverage the potential in dictionaries.

Fuzzing experiment

Our goal was to do a weekend long run on a couple of laptops. We ran two instances of AFL and libFuzzer, fuzzing the above example. The first instance was started without any corpus, and the second one with trimmed corpus from W3C XML Test Suite. The results could then be compared by performing a dry run for minimized corpus from all four sets. Results from these fuzzers are not directly comparable since both fuzzers use different instrumentation to detect executed code paths and features. libFuzzer measures two things for assessing new sample coverage, block coverage, that is isolated blocks of code visited, the and feature coverage, that is a combination of different code path features like transitions between code blocks and hit counts. AFL doesn’t offer direct count for the observed coverage, but we use overall coverage map density in our comparisons. The map density indicates how many branch tuples we have hit, in proportion to how many tuples the coverage map can hold.

Our first run didn’t go quite as expected. After 2 days and 7 hours we were reminded about the downsides of using deterministic fuzzing on large files. Our afl-cmin minimized corpus contained a couple of over 100kB samples that caused AFL to slow down to crawl after processing only under 38% of the first round. It would have taken days for AFL to get through a single file, and we had four of those in our sample set, so we decided to restart instances, after we removed all over 10kB samples. Sadly, on Sunday night at 11PM, “backup first” wasn’t the first thing in our mind and the AFL plot data was accidentally overwritten, so no cool plots from the first round. We managed to save the AFL UI before aborting.

Full results of our 2 day fuzzing campaign can be found from the image/table below.

We had actually never tried to pit these fuzzers against each other before. Both fuzzers were surprisingly even in our experiment. Starting from the W3C samples, the difference between discovered coverage, as measured by libFuzzer, was only 1.4%. Also both fuzzers found pretty much the same coverage. When we merged all the collected files from the four runs, and the original W3C samples, the combined coverage was only 1.5% higher than the coverage discovered by libFuzzer alone. Another notable thing is that without initial samples, even after 2 days, neither libFuzzer or AFL had discovered more coverage than our previous demonstration in generating a corpus out of thin air did repeatedly in 10 minutes.

We also generated a chart from coverage discovery during libFuzzer fuzzing run with the the W3C samples.

Which one should I use?

As we detailed, AFL is really simple to use, and can be started with virtually no setup. AFL takes care of handling found crashes and stuff like that. However, if you don’t have a ready command line tool like xmllint, and would need to write some code to enable fuzzing, it often makes sense to use libFuzzer for superior performance.

In comparison to AFL, libFuzzer has built-in support for sanitizers, such as AddressSanitizer and UndefinedBehaviorSanitizer, which help in finding subtle bugs during fuzzing. AFL has some support for sanitizers, but depending on your target there might be some serious side effects. AFL documentation suggests on running fuzzing without sanitizers and running the output queue separately with sanitizer build, but there is no actual data available to determine whether that technique can catch the same issues as ASAN enabled fuzzing. For more info about AFL and ASAN you can check docs/notes_for_asan.txt from the AFL sources.

In many cases however it makes sense to run both fuzzers, as their fuzzing, crash detection and coverage strategies are slightly different.

If you end up using libFuzzer, you really should check the Google’s great libFuzzer tutorial.

Happy fuzzing!
Atte & Eero



TrickBot Goes Nordic… Once In A While

We’ve been monitoring the banking trojan TrickBot since its appearance last summer.

During the past few months, the malware underwent several internal changes and improvements, such as more generic info-stealing, support for Microsoft Edge, and encryption/randomization techniques to make analysis and detection more difficult. Unlike the very fast expansion of banks targeted during the first few months of activity, this number remained rather constant since then… until two weeks ago.

Initially we saw PayPal appearing in the configuration, the first and only financial transaction website victimized by TrickBot so far which is not a traditional bank. A surprising development, but apparently just a little taste of what was coming next. Last Wednesday, we observed a change in the list of targeted banks which is probably the largest expansion in TrickBot’s history thus far.

Those familiar with TrickBot meanwhile know that the trojan features two different MitB injection techniques, similar to those as seen in the Dyre trojan: “Static Injection” to replace login pages by rogue ones, and “Dynamic Injection” to redirect browser requests to the C&C. Both injection configurations now contain banks located in at least 9 countries that were not part of the rather questionable list of TrickBot’s victims before.

In the Dynamic Injection list, the following French banks were added:

  • allianzbanque.fr
  • banque-*.fr
  • banquedelareunion.fr
  • banquepopulaire.fr
  • barclays.fr
  • ca-*.fr
  • cic.fr
  • cm-cic-bail.com
  • creatis.fr
  • credit-agricole.fr
  • credit-du-nord.fr
  • creditmutuel.fr
  • lcl.fr
  • palatine.fr
  • smc.fr
  • tarneaud.fr

And one bank located in Bahrain:

  • bank-abc.com

The Static Injection list suddenly tripled from 109 bank login URLs to a whopping 333, and these are not only added entries – the list is in fact entirely different. A closer look reveals that everything in Australia, New Zealand, Singapore, India, and Canada disappeared – the only leftovers are banks from the UK and Ireland. Instead, new countries include Switzerland, France, Lithuania, the Netherlands, and Luxembourg, but particularly interesting for us as a Finnish company are the 40 new Nordic banks. These are the targeted Finnish domains:

  • aktia.fi
  • danskebank.fi
  • nordea.fi
  • op-pohjola.fi
  • op.fi
  • osuuspankki.fi
  • pohjola.fi
  • poppankki.fi
  • s-pankki.fi
  • saastopankki.fi
  • seb.fi

Sweden:

  • carnegie.se
  • catella.se
  • dnb.se
  • ekobanken.se
  • expressfaktura.se
  • folksam.se
  • forex.se
  • marginalen.se
  • maxm.se
  • nb.se
  • nordea.se
  • penser.se
  • plusgirot.se
  • resurs.se
  • seb.se
  • sparbankenoresund.se
  • volvofinans.se

Norway:

  • banknorwegian.no
  • danskebank.no
  • netfonds.no
  • nettkonto.no
  • nordea.no
  • remember.no
  • sbm.no
  • seb.no
  • skandiabanken.no
  • storebrand.no

Denmark:

  • danskebank.dk
  • nordea.dk

The complete Static Injection configuration can be found here: https://gist.github.com/hexlax/e93f4b0ccbf54cea55b2084121b1b863

The Static Injection technique replaces the actual login page with a rogue version created by the attackers. Here a few examples – left is the original page, right is the TrickBot version.

There are only some very subtle differences: the Chrome icon on the upper right indicating that some elements on the page are not from a secure source, the slightly different date format at the bottom of the Nordea page… not exactly things that an average user pays attention to.

But just when you thought that the TrickBot authors provided us enough surprises… nothing could be further from the truth. Last Friday, all new entries in the Static Injection had disappeared again, which basically reverted the list to its previous state of 109 URLs. And the story is not over. Yesterday evening, another new version popped up, this time with 235 URLs, that’s about 100 less than before. Several UK banks that were added last week didn’t make it in the new list, but all Nordic banks did. In other words, TrickBot’s attack on the Nordic banks started last Wednesday, but was suspended over the weekend.

So why that rollback on Friday? Was the updated configuration a mistake by the authors? A test? The C&C could not handle the sudden rise of traffic? Or perhaps they just wanted an easy weekend? We can only guess, but it will be interesting to see which tricks this bot has in store.

By the way, these recent changes in the configuration are not a coincidence. New malware versions are often accompanied by a campaign – this time was no different. On Wednesday we observed large spam campaigns delivering TrickBot, which can be seen in the graph below. The spam was spread using the Necurs botnet, which is also quite remarkable as we have seen it only distributing a very limited number of malware families, such as Dridex and Jaff.

TrickBot Linegraph

Graph, and screenshots below, courtesy of Päivi.

Again, the emails have a rather generic subject, but enough to attract the victim’s attention. A few examples of the spam content.

Opening the attached document eventually leads to launching a script which downloads the TrickBot binary, an infection chain we also found in recent campaigns delivering the ransomware Jaff. Since we had already detections for these documents in place, customers of our security products were protected.



OSINT For Fun And Profit: Hung Parliament Edition

The 2017 UK general election just concluded, with the Conservatives gaining the most votes out of all political parties. But they didn’t win enough seats to secure a majority. The result is a hung parliament.

Both the Labour and Conservative parties gained voters compared to the previous general election. Some of those wins came from defecting UKIP supporters. The rest, most of which went to Labour, came from young voters. And that was definitely reflected in social media.

The #VoteLabour hashtag was immensely popular in the lead-up to the elections.

#VoteLabour and #VoteConservative hashtags in the two weeks leading up to the 2017 UK election.

#VoteLabour continued to make a strong appearance during the week of the election and increased significantly as election day approached.

#VoteLabour and #VoteConservative hashtags during the election week.

The #VoteLabour hashtag completely overshadowed all other party hashtags all the way until polls closed.

Party hashtags on election day 2017.

On the day of the election, the #voted hashtag trended. Of those that tweeted the hashtag (in conjunction with election-related hashtags such as #GE2017), a majority of tweets referenced the Labour party. Here are the numbers when the polls closed (recorded during the day of the election).

Labour = 530
Conservative = 50
Libdems = 44
SNP = 111
UKIP = 19

Did we see any obvious external interference in the 2017 UK elections? Nope.

The top URLs shared over the two weeks leading up to the election included the following:

  • BBC’s Election Coverage website (3 links)
  • A number of skwawkbox.org pages, including the following headlines:
    • MAY REPORTED TO POLICE FOR ABBOTT COMMENT ELECTORAL BREACH #GE17 #BBCQT
    • BBC STILL MISREPORTING KUENSSBERG’S KNOWN-FALSE #CORBYN #SHOOTTOKILL REPORT #GE17
    • EXPLOSIVE: RUDD TRIES TO CENSOR ELECTION OPPONENT TO HIDE SAUDI TERROR ALLEGATIONS #GE17
    • 9 TORIES OF 120 #BBCQT AUDIENCE ASK 29% OF QUESTIONS. STILL #CORBYNWINS #GE17
  • Labour party campaign site (http://www.electiondaypledge.co.uk/)
  • A YouTube video about Tory NHS cuts
  • A guide for tactically voting against the Tories (https://voteforeurope.website)

Most of the popular URLs shared on Twitter were supportive of the Labour party, a reflection of Labour’s strong social media campaign. These findings support the fact that young voter turnout was, across the board, higher than in previous elections. Labour-run campaigns encouraging young people to vote were clearly successful, and in some constituencies, the youth vote actually changed the outcome.

Non-authoritative opinion-piece articles made up less than 10% of all URLs shared during the same time period. Notable examples included:

  • Sputnik: “Labour’s Poll Surge Has Establishment ‘Pundits’ in a Flap” (pro-Labour)
  • RT: “Tories ‘gagged’ us to prevent criticism of Theresa May, charities claim” (pro-Labour)
  • Daily Express: “Corbyn ready to hit homes with new garden tax which could TREBLE average council tax bills” (pro-Conservative)
  • RT: “BBC presenter confesses broadcaster ignores complaints of bias” (pro-Conservative)
  • RT: “Tory record on terrorism ‘very weak, deeply worrying,’ security expert tells RT” (pro-Labour)
  • RT: “Revealed! Big money bankrolling Tory campaign linked to claims of fraud, tax dodging” (pro-Labour)

Although the headlines look sensational, they’re nothing compared to what politically-oriented UK tabloids (such as The Sun and The Mirror) usually print.

In general, “non-authoritative” articles linked in Twitter weren’t politically biased towards one particular party. This is in stark contrast to the French presidential elections, where the majority of URLs shared on Twitter pointed to anti-Macron articles.

Articles from “US alt-right” sources (such as Breitbart) that dominated Twitter during the French elections were notably absent on Twitter.

I couldn’t find any popular hashtags exhibiting bot-like behavior. An insignificant number of the top Twitter posters during the past two weeks were from outside the UK. Those Twitter users who did post on regular intervals were news agencies and self-confessed bots designed to Tweet on regular schedules.

No blaming “outside interference” for this election outcome.



Why Is Somebody Creating An Army Of Twitter Bots?

There’s been some speculation this week regarding Donald Trump’s Twitter account. Why? Because its follower count “dramatically” increased (according to reports) due to a bunch of bots. Since Twitter analytics are my thing at the moment, I decided to do some digging.

Sean examined some of Trump’s new followers and found they had something in common. They aren’t just following Donald Trump, they’re following lots of popular accounts.

Popular person, Barack Obama

So, I wrote and ran a script that queried Twitter for the last 5,000 accounts to follow the “top 100” Twitter accounts (Twitter accounts with the highest number of followers). The output of that script was a list of roughly 200,000 unique accounts.

Of those 200,000, over 20,000 accounts follow 5 or more of the top 100 Twitter accounts. Roughly 8,000 of those 20,000 accounts were created on the 1st of June 2017, have a default profile, no profile picture, and haven’t Tweeted.

947 of those accounts follow @realDonaldTrump.

Over 2000, or roughly a quarter of the above 8,000 accounts follow exactly 21 Twitter users (436 of those follow @realDonaldTrump).

New Twitter Bots

My scripts harvested tons of this… stuff in just a few hours.

What do these accounts have in common?

  • Many of the accounts are named using Arabic or Chinese characters.
  • Most of the accounts have no followers. Those accounts that are being followed have picked up p0rnbots that automatically follow new Twitter accounts. The p0rnbot accounts don’t appear to be affiliated with the group creating these new Twitter accounts.
  • Some of the accounts are “themed”. For instance I came across a few that were following NASA and a number of science-related Twitter accounts. I found others following mostly celebrities or musicians. I also found African and Indian themes (accounts following politicians/groups in those regions).
  • Checking back in at a later time, I noticed that these accounts are slowly being “evolved” to look more “natural” (by liking Tweets and adding followers/followed).

Apparently somebody’s real busy cultivating a huge number of Twitter accounts at this very moment. As to why they’re doing it, we can only speculate.

  • The accounts will be sold off at a later date.
  • They’re being prepared for use by follower-boosting services.
  • They’re being cultivated for later “political” use.

Whatever the reason, this stuff isn’t being done in a very stealthy manner. And creating new Twitter accounts is easily automated. (I just created a new account using a Gmail alias; account_name+alias@gmail.com.)

I assume the folks at Twitter must see this activity. And I’m just wondering why they’re not doing anything about it. Creating accounts doesn’t even require a CAPTCHA.

P.S. – As an added bonus for those who like numbers p0rn, I checked which of the 200,000 unique accounts followed at least 10 of the top 100 accounts. It turns out roughly 7,000 of them do. Of these 7,000, around 3,000 were created on 1st June 2017, have a default profile, no profile picture, and haven’t Tweeted. 367 of those users follow @realDonaldTrump.

About 1,000 of those accounts follow exactly 21 other Twitter accounts. 160 of those follow @realDonaldTrump.



Now Hiring: Developers, Researchers, Data Scientists

We’re hiring right now, and if you check out our careers page, you’ll find over 30 new positions ranging from marketing (meh) to malware analysis (woot!). A select number of these new positions are in F-Secure Labs. If you’re on the lookout for a job in cyber security, you might find one of these jobs interesting.

Our Cloud Platforms team builds and maintains a lot of the back end infrastructure used by Labs. They also build the systems that help power our breach detection technologies, and they’re looking for a couple of developers to add to that effort. Below are two open positions on that team. Both are located here in beautiful and temperate Helsinki.

Our Threat Protection team is in charge of researching and reverse engineering threats, and designing new and interesting ways to thwart them. They’re looking for researchers familiar with Windows as well as Linux, Android, and macOS. If you’re interested in this sort of stuff, you’ll find two open positions (also in Helsinki) below.

And then for those of you who feel that Helsinki is just too far south, we have a position available in the lovely city of Oulu. Our AML (Android, Mac, and Linux) Security Core team designs and builds anti-malware technologies for non-Windows platforms. This is a junior position, and a great way to get your foot in the door of the cyber security field.

Finally, you might have noticed several data scientist positions listed on our recruitment pages. We’re heavily bolstering our capabilities in the field of machine learning and data science, and we’ve formed a whole new department just for it. Matti Aksela, the head of that department, recently penned an article about what we’re doing in the field of AI. He’s also hiring.

Of course, as I mentioned above, we’re looking for great people to fill a whole range of open positions. And if you read this blog, you’re probably exactly the sort of person we’re looking for. So don’t delay – head on over and apply now!



WannaCry, Party Like It’s 2003

Let’s take a moment to collect what we know about WannaCry (W32/WCry) and what we can learn from it.

When looked at from a technical perspective, WCry (in its two binary components) has the following properties.

  • Comprised of two Windows binaries.
    • mssecsvc.exe: a worm that handles spreading and drops the payload.
    • tasksche.exe: a ransomware trojan that is dropped by the worm.
  • The Windows binaries had some variants, but in limited number and no polymorphism was used.
  • Spreads over SMB port 445 using a vulnerability against MS17-010 that was publicly available at exploit-db[.]com.
    • Originally the exploit was sourced from NSA tools leaked by a group called “Shadow Brokers”.
    • The MS17-010 vulnerability was in all versions of Windows up to Windows Server 2016, so this attack affected more than just Windows XP. But XP/Server 2003 had no public updates available prior to May 14th.
  • Includes “kill switch” functionality by checking www[.]iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea[.]com which the WannaCry author(s) had not registered and thus could be used by @malwaretechblog, to halt its spread.
    • There have been seen some “hex-edit idiot” variants of WCry; people modifying the original malware with a file editor in hopes of disabling the kill switch or otherwise changing its functionality.
  • Scans local network and internet for new hosts to infect.
  • Despite the early reports of distribution over email, no-one has been able to confirm any other infection vector than SMB.

All in all, writing the above makes me feel like it’s 2003 rather than 2017. In a perfect world, this malware outbreak should not have been able to happen. And the fact that the outbreak wasn’t even worse is thanks to the diligence of IT admins everywhere applying patches and keeping up firewall configurations. Without their work the outbreak would have been far worse. For example, a low ball estimate for computers infected by the W32/Blaster worm was 8 million computers and could have been as high as 16 million.

With the exception of the ransomware payload, the worm is very similar to the W32/Blaster worm from 2003, which attacked a vulnerability in RPC/DCOM, but otherwise was very similar to WCry. All in all the attackers were not exactly super hackers. It is it rather obvious that the attackers did not know what they were dealing with when they created the worm, just used an exploit they found, and were not expecting this kind of massive distribution and attention. It feels like somebody using a sledgehammer for a fly swatter. It is very likely that the attackers are running for the hills right now, as law enforcement around the world are definitely going to coordinate to hunt them down.

The answer to why WCry’s outbreak was able to happen is most likely the same as why e-mail based attacks first died back in 2008-2010 and are now again a prevalent vector. Security systems that do not get challenged are not seen as critical and thus tend to atrophy. Major internet and local network worms have not been a problem for several years, and thus organizations have neglected firewall configuration maintenance. Also, often host firewall configuration is done lazily, SMB port 445 is needed as outbound from workstation to file server and often administrators allow it to be bi-directional just in case.

The initial run of WCry is now on the decline, but the vulnerable systems remain, so it is important to reflect back on the measures that killed past network worms over time.

And the most important thing that killed network worms was the host firewall configurations that were done according to recommended best practices.

Which shortly put are…

  • Workstations are not supposed to accept inbound traffic, except from administration workstations and the domain controller.
  • Servers are supposed to be servers and not to make outbound traffic to workstations or other servers, except where needed.

This means that workstations should have inbound ports 135, 137, 138, and 445 blocked from everything but sources that are supposed to use those services for maintenance purposes. And servers obviously need to have those ports open for which they need for providing service, but even as inbound traffic is allowed outbound should be blocked.

With this kind of configuration, even if there would be a host infected with a network worm, it is unable to infect other workstations, and even as it would be able to infect a server, this server cannot pass the infection back to other workstations. This configuration also makes it difficult for an APT attacker to do lateral movement, especially if you block Windows Remote management ports 5985 and 5986 from anything but administration workstations.

Of course there are special cases such as certain hospital MRI machines which have Windows XP which cannot be patched and is running SMB server for access to the MRI images. And as these systems cannot be touched, it is critical to make sure that every system that is allowed to connect to such a resource is well protected. If all systems that can connect to such an MRI device have been protected by their own firewalls, they cannot be infected by WCry or other copy-cat attacks, and thus cannot pass infection to a device that cannot be protected.



WCry: Knowns And Unknowns

WCry, WannaCry, Wana Decrypt0r. I’m sure at this point you’ve heard something about what the industry has dubbed the largest crypto ransomware outbreak in history. Following its debut yesterday afternoon, a lot of facts have been flying around. Here’s what we know, and don’t know.

WCry has currently made a measly $25,000

The spread of WCry was slowed by the actions of an “accidental hero” who registered a “killswitch” domain name he found in the code.

But, it only takes a small edit of that code, and a re-release to get the thing spreading like wildfire again.

It’s been featured in many public places, such as a train station in Frankfurt…

…in high street stores…

…and in academia.

It is reportedly super-easy to reverse engineer.

Microsoft has released a patch for Windows XP because of this malware…

…to the relief of many…

…including the guys running the Trident program.

Even Microsoft haven’t figured out the initial entry vector.

In case you were wondering, yes, F-Secure’s products block the WCry ransomware trojan. In fact, we block multiple mechanisms in the infection vector. Here are the WCry-associated detection names our systems have reported so far:

Gen:Variant.Graftor.374377
Trojan.GenericKD.5054801
Gen:Variant.Graftor.369176
Application:W32/Generic.e889544aff!Online
Gen:Variant.Ransom.WannaCryptor.1
Trojan.Ransom.WannaCryptor.A
Gen:Trojan.Heur.RP.JtW@aePsbmpi
Trojan.GenericKD.5057843
Application:W32/Generic.5ff465afaa!Online
Suspicious:W32/Malware.c5e6c97e27!Online
Application:W32/Generic.47a9ad4125!Online
Trojan.Ransom.WannaCryptor.D
Gen:Trojan.Heur.RP.JtW@aePsbmp
Trojan.GenericKD.5057554
Suspicious:W32/Malware.e889544aff!Online
Suspicious:W32/Malware.5ff465afaa!Online
Suspicious:W32/Malware.51e4307093!Online
Application:W32/Generic.e3712f9d19!Online

Here’s where we’ve been blocking it.

As a final note, the usual advice still applies. Patch your systems. Don’t run XP. And don’t click “enable content”.

You can also check out our other blog post about this outbreak.

Update: Here’s a link to our threat description.



OSINT For Fun And Profit: #Presidentielle2017 Edition

As I mentioned in a previous post, I’m writing scripts designed to analyze patterns in Twitter streams. One of the goals of my research is to follow Twitter activity around a newsworthy event, such as an election. For example, last weekend France went to the polls to vote for a new president. And so I tuned the parameters of my scripts to see what I could find.

The script in question receives a stream of Tweets based on a list of search parameters. Here are the parameters I gave it:

[‘macron’, ‘lepen’, ‘presidentielle2017’, ‘presidentielles2017’, ‘MarineLePen’, ‘Marine2017’, ‘ ElectionPresidentielle2017’ ‘enmarche’, ‘aunomdupeuple’, ‘jevote’, ’emmanuelmacron’, ‘choisirlafrance’, ‘MLP’, ‘debat2017’, ‘debatpresidentiel’, ‘jevotepour’]

I kicked the script off on the afternoon of Friday May 5th, just before 14:00 French time, and terminated it at 22:00 on Sunday May 7th, a few hours after election results had been called. The script received a stream of Twitter status objects matching the search terms above. The number of Tweets per hour varied from about 18,000 (in the middle of the night, French time) to as much as 79,000 (in the last few hours before the polls closed). Processing involved extracting metadata such as tweet language, hashtags, URLs, and mentions to a set of output files.

Quite quickly after starting the script it became apparent that there were a fair number of URLS pointing to English language political opinion pieces being shared on the stream. As the weekend went on, it was obvious that a majority of them were positive of Le Pen and negative of Macron. Here are some examples of the sort of headlines that were being shared:

  • BREAKING: WikiLeaks confirms leaked Macron emails authentic”
  • “BREAKING: Macron emails lead to allegations of drug use, homosexual adventurism and Rothschild money”
  • “Betting Markets Flip to Marine Le Pen in Final Hours Before Election”
  • “French Police Defy Their Unions to Vote For Le Pen”
  • “BOMBSHELL REPORT : Email Leak Shows Macron on Gay Lifestyle Mailing List”

One article, who’s headline read “Macron Whistleblower Dies Under Suspicious Circumstances”, insinuated that a member of the Macron campaign had been assassinated using a “heart-attack gun”. Here’s a quote from that story:

“Intelligence agencies have been using ‘heart-attack gun’ technology for years, according to a Congressional testimony video filmed in 1975. Could it be that Corinne Erhel was the victim of such technology?”

Right. Anyway, moving on…

Regardless of the configured search terms, my scripts tend to always pick up a fair amount of URLS pointing to non-authoritative opinion pieces. This stuff is usually “background noise”, but last weekend, the volume had definitely been turned up. It wasn’t until late Sunday evening that stories in French, by French publications started to show up in the URL feed.

Since I was monitoring data about the French elections, I figured it would be interesting to see how many Tweets were in French as opposed to English. On the whole, there were more Tweets flagged as ‘fr’ by Twitter than those flagged as ‘en’. One particular moment during the weekend caught my eye, though. Have a look at this graph that depicts Tweets by language between the afternoon of Saturday May 6th and the afternoon of Sunday May 7th.

Tweets by language Saturday 6th May 14:00 - Sunday 7th May 20:00

The orange line is clearly what we’d expect – after midnight on the 6th of May, the number of Tweets in French start to drop off as people presumably went to sleep. That number then picks up again on the morning of Sunday May 7th, as people began their day. The blue line shows Tweets in English, which spike at 01:00 French time. I don’t know what caused this spike, but the time zone lines up with early evening on the American continent.

Interesting patterns were also observed with regards to hashtags. When I started the script up, and for the first few hours, top hashtags included #Macron, #LePen and #Presidentielle2017. Later in the evening of Friday May 5ht, the #MacronGate hashtag started showing up. DFR Lab wrote a great article explaining the mechanisms behind this phenomenon. I highly recommend reading it. (tl;dr Bots!) The data I collected also points to patterns indicating the use of automation to push this hashtag. For instance, take a look at the following graph.

Selected hashtags per hour 03:00-11:00 Sunday May 7th 2017

The above graph shows the number of times my script saw one of the four hashtags during each hour between 03:00 and 11:00 French time on May 7th, 2017. What you’ll notice is that the #Macron, #LePen, and #Presidentielle2017 hashtags were low-volume during the night (again, as expected, since everyone was probably asleep), and picked up as folks woke up. However, the #MacronLeaks hashtag maintained a fairly steady volume across this entire time-slice. In fact, the #Macron hashtag remained at the same steady volume all the way from it’s introduction on Friday evening until the election results were called. It then dropped like a stone to less than 5% of it’s previous volume during that hour, as the bot infrastructure was shut off.

Both the URLs and #MacronLeaks hashtags were predominantly shared by “American Alt-Right” Twitter accounts. In some cases, these accounts even tweeted/retweeted in French. At the end of the whole weekend, the most shared URL was a link to a YouTube video entitled “The Truth About Macron”. Next was the pastebin page containing links to the stolen Macron data. Seven out of the ten top shared URLs were links to non-authoritative news sources. Luckily, DFR Labs’ article made it into sixth position.

While the above analysis looks to be pretty doom and gloom, things really aren’t as bad as you might think. A vast majority of Twitter users probably wouldn’t have noticed the URL and hashtag flooding going on at all. Why? Well, performing a search in Twitter provides “Top” results by default, which ranks Tweets using an algorithm. And that algorithm appears to filter by some sort of quality (that tends to separate the wheat from the chaff). All that spamming by bot accounts going on in the background doesn’t appear to register. The same also goes for the “News” tab and the list of top 10 trending hashtags. The only place you’ll readily see the background noise is in the “Latest” tab.

So, if all that noise no longer generates much signal, why even still create it in the first place? The answer lies in the fact that the press and the media do spend the effort to dig into raw data looking for a story to run. When they find this otherwise “hidden” data, they run with it. In effect, the press are doing the bots’ jobs for them.

The French presidential election was an ideal moment for me to refine the scripts I’ve been writing to find the usage patterns associated with “active measures” in upcoming elections and world events. The UK general election is in just a few weeks, so I’ll get to see how well my changes work. I’m sure I’ll have sometime interesting to report on after that event happens!



Unicode Phishing Domains Rediscovered

There is a variant of phishing attack that nowadays is receiving much attention in the security community. It’s called IDN homograph attack and it takes advantage of the fact that many different Unicode characters look alike. The use of Unicode in domain names makes it easier to spoof websites as the visual representation of an internationalized domain name in a web browser may appear indistinguishable to the legitimate site. For example, Unicode character U+0117 which is Latin small letter E with dot above, looks similar to Latin small letter E in ASCII. Hence it is possible to register domain such as labsblog.xn--f-secur-z8a.com which is equivalent to labsblog.f-securė.com.

This topic has already been thoroughly discussed. Security researchers have had been warning about it for over a decade, but it has only relatively recently gained more attention – also from the bad guys. To trace this dangerous trend, we’re going to use a combination of DNS reconnaissance tool dnstwist (which I created some time ago) as well as some command line kung fu to gather and analyze all the information we find.

Grabbing the data

We will start by pulling a list of the most popular websites worldwide published by Alexa Internet. This seems to be a good representative group because the very top of them should be a tempting target for phishing attacks.

The ZIP file contains a million of domain names so we’ll just narrow that down to a reasonable scope of 100. This will give us something that looks like this.

Grabbing Alexa Top100

Finding Unicode phishing domains

We will use dnstwist which provides a convenient way for generating domain name variations using a range of techniques including Unicode homograph attack. The idea is quite straightforward. The tool will use previously prepared list of 100 domains as a seed, generate a list of potential phishing domains and then query WHOIS servers for registration dates.

dnstwist data extracted

An hour later we have 100 files named with the corresponding domain names. Since we’re focusing on Unicode domains we need to filter out domain names which when encoded with Punycode start with xn-- string. This data is comma delimited so we cut out the column with registration date. Finally we group it by year and count the number of occurrences in order to plot a nice graph.

Conclusions

The data collected clearly shows that attackers have been using Unicode-based domains for a long time.

Unicode phishing domains by month

The top three phishing targets are Google, Facebook and Amazon.

Unicode phishing domains by target

Due to the fact that the life span of a phishing domain is rather short and the lack of data from a wider period it is difficult to demonstrate a clear upward trend. However, given the recent interest in the subject, it can be assumed that attacks of this nature will occur more often.

Side note

At the time of conducting this research, we inadvertently discovered a domain running an active phishing site that seems to target Facebook users in China. We have notified Facebook’s security team about this incident.

Facebook .cn phish



F-Secure XFENCE (Little Flocker)

I use Macs both at home and at work, and as a nerd, I enjoy using interesting stand-alone tools and apps to keep my environment secure. Some of my favorites are knockknock, ransomwhere?, and taskexplorer, from the objective-see website. I’ve also been recently playing around with (and enjoying)  Monitor.app from FireEye. When I heard that […]

2017-04-25

Ransomware Timeline: 2010 – 2017

I’ve seen numerous compliments for this graphic by Micke, so… here’s a high-res version. Enjoy! Source: State of Cyber Security 2017

2017-04-18

The Callisto Group

We’ve published a White Paper today titled: The Callisto Group. And who/what is the Callisto Group? A good question, here’s the paper’s summary. Heavy use of spear phishing, and malicious attachments sent via legitimate, but compromised, email accounts. Don’t click “OK”.

2017-04-13

OSINT For Fun & Profit: @realDonaldTrump Edition

I’ve just started experimenting with Tweepy to write a series of scripts attempting to identify Twitter bots and sockpuppet rings. It’s been a while since I last played around with this kind of stuff, so I decided to start by writing a couple of small test scripts. In order to properly test it, I needed to point […]

2017-04-10

“Cloud Hopper” Example Of Upstream Attack

There’s news today of a BAE/PWC report detailing a Chinese-based hacking group campaign dubbed “Operation Cloud Hopper”. Chinese Group Is Hacking Cloud Providers to Reach Into Secure Enterprise Networks https://t.co/Le4E4Se2Hc pic.twitter.com/adpDyWYa6C — News from the Lab (@FSLabs) April 5, 2017 This operation is what’s known as an upstream attack, a method of compromise that we […]

2017-04-05

Massive Dridex Spam Runs, Targeting UK

Yesterday, between 9:00 and midnight GMT, we observed three massive malware spam runs. The magnitude clearly stood out the average daily amount of spam with attachments. The campaigns were largely sent to accounts with email address in the co.uk TLD. The first run, with subject lines such as “Your Booking 938721” (numbers vary) started at […]

2017-03-31

Real-Time Location Sharing Redux

Google announced on Wednesday that it will soon add real-time location sharing to Google Maps. The feature set appears to be very reminiscent of Google Latitude, which was introduced (way back) in 2009. Location sharing will undoubtedly be a popular option for many, but, it may come with OPSEC considerations for others. Here’s what I wrote about […]

2017-03-23

It’s Not New To Us

A Turkish hacking group is reportedly attempting to extort Apple over a compromised cache of iCloud account data. This activity is on the heels of last week’s Turkish related Twitter account hacks via a service called Twitter Counter. And that brings to mind this article (by Andy)… OVER THE PAST FEW YEARS, you’ve probably heard […]

2017-03-22

FAQ Related To CIA WikiLeaks Docs

We’ve been asked numerous questions about WikiLeaks’ March 7th CIA document dump. Did the news surprise you? No. Spies spy. And that spies use hacking tools… is expected. (“Q” does cyber these days.) Does this mean that the CIA will have to start over and rebuild a completely new set of tools? Does it need […]

2017-03-09

Apple, Google, And The CIA

Apple and Google have issued statements to the media regarding WikiLeaks’ March 7th publication of CIA documents. Here’s Apple’s statement via BuzzFeed News. According to Apple, its “products and software are designed to quickly get security updates” to its customers. So, just how well does that statement hold up to what we see in-the-wild? Well, […]

2017-03-09