Mitigations Against Adversarial Attacks

This is the fourth and final article in a series of four articles on the work we’ve been doing for the European Union’s Horizon 2020 project codenamed SHERPA. Each of the articles in this series contain excerpts from a publication entitled “Security Issues, Dangers And Implications Of Smart Systems”. For more information about the project, the publication itself, and this series of articles, check out the intro post here.

This article explores currently proposed methods for hardening machine learning systems against adversarial attacks.


Most machine learning models ‘in the wild’ at present are trained without regard to possible adversarial inputs. As noted in the previous article in this series, the methods required to attack machine learning models are fairly straightforward, and work in a multitude of scenarios. Research into mitigation against commonly proposed attacks on machine learning models has proceeded hand-in-hand with studies on performing those attacks. Naturally, a lot more thinking has gone into understanding how to defend systems that are under attack on a daily basis compared to those being attacked in purely academic settings.

Adversarial attacks against machine learning models are hard to defend against because there are very many ways for attackers to force models into producing incorrect outputs. Most of the time, machine learning models work very well on a small subset of all possible inputs they might encounter. As models become more complex, and must partition between more possible inputs, hardening against potential attacks becomes more difficult. Unfortunately, most of the mitigations that have been proposed to date are not adaptive, and they are only able to close a small subset of all potential vulnerabilities.

From the implementation point of view, a machine learning model itself can be prone to the types of bugs attackers leverage in order to gain system access (such as buffer overflows), just like any other computer program. However, the task of hardening a machine learning model extends beyond the task of hardening a traditionally developed application. Penetration testing processes (such as fuzzing – the technique of providing invalid, unexpected, or random inputs into a computer program) and thorough code reviewing are commonly used to identify vulnerabilities in traditionally developed applications. The process of hardening a machine learning model additionally involves identifying inputs that cause the model to produce incorrect verdicts, such as false positives, false negatives, or incorrect policy decisions, and identifying whether confidential data can be extracted from the model.


Adversarial training

One proposed method for mitigating adversarial attacks is to create adversarial samples and include them in the training set. This approach allows a model to be trained to withstand common adversarial sample creation techniques. Unfortunately, there are plenty of other adversarial samples that can be created that still fool a model created in this way, and hence adversarial training itself only provides resilience against the simplest of attack methods.

Adversarial training is a natural accompaniment to data augmentation – the process of modifying samples in a training set in order to improve the generalization and robustness of a model. For instance, when training an image classifier, data augmentation is achieved by flipping, cropping, and adjusting the brightness of each input sample, and adding these to the training set.


Gradient masking

Gradient masking is a method designed to create models that are resistant to white box probing for decision boundaries, typically in neural network-based models. Mapping a target model’s decision boundaries involves crafting new input samples based on the gradient observed across outputs from previously crafted samples. Gradient masking hampers this process by creating sharper decision boundaries as illustrated below.

Commonly referenced techniques for masking gradients include defensive distillation and defensive dropout. Defensive distillation is a process whereby a second model is created from the output of one or more initially trained models. The second model is trained on modified Softmax output values of the first model (as opposed to the hard labels that were used to train the initial model). Dropout – the process of randomly disabling a portion of the model’s cells – is a method commonly used during model training as a regularization technique to encourage models to generalize better. Defensive dropout applies the dropout technique during the model inference phase. Stochastic activation pruning is another gradient masking technique similar to defensive dropout. We note that gradient masking techniques do not make a model resistant to adversarial samples in general.

Detecting and cleaning adversarial inputs

A machine learning model can be shielded from adversarial inputs by placing safeguard mechanisms between the public interface to the model’s input and the model itself. These mechanisms detect and clean adversarial perturbations in the raw input, prior to it reaching the model. Detection and cleaning can be performed in separate steps, or as part of a single step.

Generative Adversarial Networks (GANs) are a type of machine learning model designed to generate images, or other types of data. Training a GAN involves training two neural network models simultaneously. One model, the generator, attempts to generate samples (e.g. images) from random noise. A second model, the discriminator, is fed both real samples and the samples created by the generator model. The discriminator decides which samples are real, and which are fake. As training proceeds, the generator gets better at fooling the discriminator, while the discriminator gets better at figuring out which samples are real or fake. At the end of training, the generator model will be able to accurately generate samples (for instance, convincing high-resolution photographs), and the trained discriminator model will be able to accurately detect the difference between real and fake inputs. Thus, discriminator models can be used to detect adversarial perturbations.

Suggested cleaning methods include using the output of the GAN generator model as the input to the target model, using a separate mechanism to generate an image similar to the original input, or modifying the input image to remove perturbations.

This two- (or three-) step process can actually be accomplished using a single step. Introspective neural networks are classifiers that contain built-in discriminator and generator components as part of a single end-to-end model. Trained models can be used both as a generator, and a classifier, and are resistant to adversarial inputs due to the presence of the discriminator component.

Another proposed solution, replaces the ‘detect and clean’ approach with a simple sanitization step that normalizes inputs prior to their reaching the safeguarded model.

Differential privacy

Differential privacy is a general statistical technique that aims to provide means to maximize the accuracy of query responses from statistical databases while measuring (and thereby hopefully minimizing) the privacy impact on individuals whose information is in the database. It is one proposed method for mitigating against confidentiality attacks.

One method for implementing differential privacy with machine learning models is to train a series of models against separate, unique portions of training data. At inference time, the input data is fed into each of the trained models, and a small amount of random noise is added to each model’s output. The resulting values become ‘votes’, the highest of which becomes the output. A detailed description of differential privacy, and why it works, can be found here.


One possible implementation of differential privacy for machine learning models. Source:

Differential privacy is a hot topic at the moment, and online services such as OpenMined have sprung up to facilitate the generation of privacy-protected models based on this technique.


Cryptographic techniques for privacy-preserving model training and inference

Cryptographic techniques are a natural choice for ensuring confidentiality and integrity, and there is a growing interest in applying those techniques to data and model protection in machine learning. Several cryptographic methods have been successfully utilized, individually and in combination, for scenarios where data and model owners are different entities which do not trust each other. The two main use cases can be Informally described as follows:

  • model training, when multiple data owners either provide their data to a single party constructing a model or exchange parts of their data for learning a model in a distributed fashion;
  • inference, when a trained model is used for processing inputs provided by data owners to produce an output, such as a classification decision or a prediction.

Confidentiality of data is clearly a concern in both cases, and, in addition, model stealing concerns often need to be addressed in scenario (ii). Conceptually, both (i) and (ii) can be considered instances of the secure multi-party computation problem (secure MPC), where a number of parties want to jointly compute a function over their inputs while keeping the inputs private. This problem has been extensively studied in the cryptographic community since the 1970s, and a number of protocols have recently found application in machine learning scenarios. We will now introduce several popular approaches.

Homomorphic encryption makes it possible to compute functions on encrypted data. This enables data owners to encrypt their data and send the encrypted inputs to a model owner and, possibly, other data owners. The model is then applied to the encrypted inputs (or, more generally, a desired function is computed on the encrypted inputs), and the result is communicated to appropriate parties, which can decrypt it and obtain desired information, for example, the model output in scenario (ii). So-called fully homomorphic encryption enables parties to compute a broad class of functions, covering essentially all cases of practical interest. However, all of the currently developed fully homomorphic encryption techniques are very computationally expensive and their use is limited. A more practical alternative is so-called semi-homomorphic encryption methods. While those are suitable only for computing narrower classes of functions, they are utilized, usually in combination with other techniques, in several machine learning applications, for example, in collaborative filtering.

Secret sharing-based approaches can be used to distribute computation among a set of non-colluding servers, which operate on cryptographically derived shares of data owner inputs (thus, having no information about the actual inputs) and generate partial results. Such partial results can then be combined by another party to obtain the final result. One example of this approach is a privacy-preserving system for performing Principle Component Analysis developed on top of the ShareMind technology by Cybernetica.

Garbled circuit protocols, based on the oblivious transfer technique, are used for secure two-party computation of functions presented as Boolean circuits and can be employed in scenario (ii). In such protocols, one party prepares a garbled (encrypted) version of a circuit that implements the function to be computed, garbles their own input, and collaborates with the other party to garble their input in a privacy-preserving manner. The other party uses then the garbled circuit and inputs for computing a garbled output and collaborates with the first party to derive the actual function output. Garbling methods are often used for privacy-preserving machine learning in combination with semi-homomorphic encryption.

There are several noteworthy limitations of the use of cryptographic techniques in machine-learning-based systems. In particular, most of such techniques are applicable only to certain (a small number of) machine learning algorithms and are computationally expensive.  Besides, one has to carefully check assumptions, which the security guarantees of cryptographic methods are based on. For example, the non-collusion assumption in secret sharing-based approaches may or may not be plausible in specific applications.

Despite the limitations and challenges, a number of platforms and tools for privacy-preserving machine learning have been developed (such as Faster CryptoNets and Gazelle), and this remains a domain of active theoretical and applied research.


Defending against poisoning attacks

Poisoning attacks have been popular for many years. Some of the largest tech companies in the world put a great deal of effort into building defences against these attacks. Mitigation strategies against poisoning attacks can be grouped into four categories – rate limiting, regression testing, anomaly detection, and human intervention.

Rate limiting strategies attempt to limit the influence entities or processes have over the model or algorithm. Numerous mechanisms exist to do this. The defender can:

  • Take steps to ensure that a small group of entities, including IPs or users, cannot account for a large fraction of the model training data.
  • Put mechanisms in place to prevent over-weighting of false positives and false negatives reported by users.
  • Limit the number of examples that each user can contribute, for instance, by the use of decaying weights.
  • Slow potential attacks or suspicious activity via mechanisms such as CAPTCHA.
  • Give higher weighting to registered, or ’high-quality’ users.
  • Calculate validity scores for registered accounts based on relevant metrics such as activity patterns, connecting IP addresses, behaviour, and so on.

In order to curb poisoning attacks, regression testing is a useful practice. It is less likely that attacks might slip through the cracks if newly trained models are checked against baseline standards. Good regression testing practices include:

  • Compare each newly trained model to the previous one to estimate how much has changed. Alert on larger than expected changes and inspect the training data if an alert happens.
  • Use A/B testing to compare the output of a previous and new model on real-world inputs.
  • Implement continuous testing against a dataset containing attacks and normal behavioural data that a model must accurately handle.

Anomaly detection methods can be useful in finding suspicious usage patterns. Maintainers of social networks and other online services prone to poisoning attacks should be able to implement fairly intelligent anomaly detection methods using metadata they have available. These can include:

  • IP-based anomaly detection.
  • Heuristic analyses (look for ‘unpopular’ items that suddenly have many co-visitations with other items).
  • Analysis of temporal dynamics of visits and co-visits.
  • Implementation of one or more of the many proposed graph-based Sybil attack detection methods.

Although data analysis techniques and machine learning methods can be used to detect some suspicious activity, understanding how attacks are being carried out, and finding edge cases that are being abused is an activity most suited to humans. Much of the manual work required to defend against poisoning attacks relies on the creation of hand-written rules, and human moderation.

Whenever humans are involved in moderation activities and the processing of data, ethical considerations apply. Often, companies are faced with decisions such as what data a human moderator can work with, how good or bad content is defined, how to write rules that automatically filter ‘bad’ content, and how to handle feedback. These issues often force companies to tread a fine line between political beliefs and definitions of free speech. However, as long as attackers are human, it will take other humans to think as creatively as the attackers in order to defend their systems from attack. This fact will not change in the near future.



Adversarial attacks against machine learning models are hard to defend against because there are very many ways for attackers to force models into producing incorrect outputs. Complex problems can sometimes only be solved with the application of sophisticated machine learning models. However, such models are difficult to harden against attack. Research into mitigations against commonly proposed attacks has proceeded hand-in-hand with studies on performing those attacks.

It is important to bear in mind that methods of defending machine-learning-based systems against attacks and mitigating malicious use of machine learning may lead to serious ethical issues. For instance, tight security monitoring may negatively affect users’ privacy and certain security response activities may weaken their autonomy.

In an effort to remain competitive, companies or organizations may forgo ethical principles, ignore safety concerns, or abandon robustness guidelines in order to push the boundaries of their work, or to ship a product ahead of a competitor. This trend towards low quality, fast time-to-market is already prevalent in the Internet of Things industry, and is considered highly problematic by most cyber security practitioners. Similar recklessness in the AI space could be equally negatively impactful, especially when we consider the fact that our knowledge of how to mitigate adversarial attacks against machine learning models is so young. It took many years, if not decades, for traditional software development process to include practises such as threat modelling and fuzz testing. We would hope that similar practises are introduced into the AI development process much more quickly.

This concludes our series of articles. As mentioned, if you’re interested in reading the entire document, you can find it from here.

Adversarial Attacks Against AI

This article is the third in a series of four articles on the work we’ve been doing for the European Union’s Horizon 2020 project codenamed SHERPA. Each of the articles in this series contain excerpts from a publication entitled “Security Issues, Dangers And Implications Of Smart Systems”. For more information about the project, the publication itself, and this series of articles, check out the intro post here.

This article explains how attacks against machine learning models work, and provides a number of interesting examples of potential attacks against systems that utilize machine learning methodologies.


Machine learning models are being used to make automated decisions in more and more places around us. As a result of this, human involvement in decision processes will continue to decline. It is only natural to assume that adversaries will eventually become interested in learning how to fool machine learning models. Indeed, this process is well underway. Search engine optimization attacks, which have been conducted for decades, are a prime example. The algorithms that drive social network recommendation systems have also been under attack for many years. On the cyber security front, adversaries are constantly developing new ways to fool spam filtering and anomaly detection systems. As more systems adopt machine learning techniques, expect to see new, previously un-thought-of attacks surface.

This article details how attacks against machine learning systems work, and how they might be used for malicious purposes.


Types of attacks against AI systems

Depending on the adversary’s access, attacks against machine learning models can be launched in either ‘white box’ or ‘black box’ mode.

White Box Attacks

White-box attack methods assume the adversary has direct access to a model, i.e. the adversary has local access to the code, the model’s architecture, and the trained model’s parameters. In some cases, the adversary may also have access to the data set that was used to train the model. White-box attacks are commonly used in academia to demonstrate attack-crafting methodologies.

Black Box Attacks

Black box attacks assume no direct access to the target model (in many cases, access is limited to performing queries, via a simple interface on the Internet, to a service powered by a machine learning model), and no knowledge of its internals, architecture, or the data used to train the model. Black box attacks work by performing iterative queries against the target model and observing its outputs, in order to build a copy of the target model. White box attack techniques are then performed on that copy.

Techniques that fall between white box and black box attacks also exist. For instance, a standard pre-trained model similar to the target can be downloaded from the Internet, or a model similar to the target can be built and trained by an attacker. Attacks developed against an approximated model often work well against the target model, even if the approximated model is architecturally different to the target model, and even if both models were trained with different data (assuming the complexity of both models is similar).


Attack classes

Attacks against machine learning models can be divided into four main categories based on the motive of the attacker.

Attacks against machine learning models

Confidentiality attacks expose the data that was used to train the model. Confidentiality attacks can be used to determine whether a particular input was used during the training of the model.

Scenario: obtain confidential medical information about a high-profile individual for blackmail purposes

An adversary obtains publicly available information about a politician (such as name, social security number, address, name of medical provider, facilities visited, etc.), and through an inference attack against a medical online intelligent system, is able to ascertain that the politician has been hiding a long-term medical disorder. The adversary blackmails the politician. This is a confidentiality attack.

Integrity attacks cause a model to behave incorrectly due to tampering with the training data. These attacks include model skewing (subtly retraining an online model to re-categorize input data), and supply chain attacks (tampering with training data while a model is being trained off-line). Adversaries employ integrity attacks when they want certain inputs to be miscategorised by the poisoned model. Integrity attacks can be used, for instance, to avoid spam or malware classification, to bypass network anomaly detection, to discredit the model / SIS owner, or to cause a model to incorrectly promote a product in an online recommendation system.

Scenario: discredit a company or brand by poisoning a search engine’s auto-complete functionality

An adversary employs a Sybil attack to poison a web browser’s auto-complete function so that it suggests the word “fraud” at the end of an auto-completed sentence with a target company name in it. The targeted company doesn’t notice the attack for some time, but eventually discovers the problem and corrects it. However, the damage is already done, and they suffer long-term negative impact on their brand image. This is an integrity attack (and is possible today).

Availability attacks refer to situations where the availability of a machine learning model to output a correct verdict is compromised. Availability attacks work by subtly modifying an input such that, to a human, the input seems unchanged, but to the model, it looks completely different (and thus the model outputs an incorrect verdict). Availability attacks can be used to ‘disguise’ an input in order to evade proper classification, and can be used to, for instance, defeat parental control software, evade content classification systems, or provide a way of bypassing visual authentication systems (such a facial or fingerprint recognition). From the attacker’s goal point of view, availability attacks are similar to integrity ones, but the techniques are different: poisoning the model vs. crafting the inputs.

Scenario: trick a self-driving vehicle

An adversary introduces perturbations into an environment, causing self-driving vehicles to misclassify objects around them. This is achieved by, for example, applying stickers or paint to road signs, or projecting images using light or laser pointers. This attack may cause vehicles to ignore road signs, and potentially crash into other vehicles or objects, or cause traffic jams by fooling vehicles into incorrectly determining the colour of traffic lights. This is an availability attack.

Replication attacks allow an adversary to copy or reverse-engineer a model. One common motivation for replication attacks is to create copy (or substitute) models that can then be used to craft attacks against the original system, or to steal intellectual property.

Scenario: steal intellectual property

An adversary employs a replication attack to reverse-engineer a commercial machine-learning based system. Using this stolen intellectual property, they set up a competing company, thus preventing the original company from earning all the revenue they expected to. This is a replication attack.

Availability attacks against classifiers

Classifiers are a type of machine learning model designed to predict the label of an input (for instance, when a classifier receives an image of a dog, it will output a value indicative of having detected a dog in that image). Classifiers are some of the most common machine learning systems in use today, and are used for a variety of purposes, including web content categorization, malware detection, credit risk analysis, sentiment analysis, object recognition (for instance, in self-driving vehicles), and satellite image analysis. The widespread nature of classifiers has given rise to a fair amount of research on the susceptibility of these systems to attack, and possible mitigations against those attacks.

Classifier models often partition data by learning decision boundaries between data points during the training process.

A decision boundary.

Adversarial samples can be created by examining these decision boundaries and learning how to modify an input sample such that data points in the input cross these decision boundaries. In white box attacks, this is done by iteratively applying small changes to a test input, and observing the output of the target model until a desired output is reached. In the example below, notice how the value for “dog” decreases, while the value for “cat” increases after a perturbation is applied. An adversary wishing to misclassify this image as “cat” will continue to modify the image until the value for “cat” exceeds the value for “dog”.

The creation of adversarial samples often involves first building a ‘mask’ that can be applied to an existing input, such that it tricks the model into producing the wrong output. In the case of adversarially created image inputs, the images themselves appear unchanged to the human eye. The below illustration depicts this phenomenon. Notice how both images still look like pandas to the human eye, yet the machine learning model classifies the right-hand image as “gibbon”.

Source: Explaining and Harnessing Adversarial Examples, Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy,

Adversarial samples created in this way can even be used to fool a classifier when the image is printed out, and a photo is taken of the printout. Even simpler methods have been found to create adversarial images. In February 2018, research was published demonstrating that scaled and rotated images can cause misclassification. In February 2018, researchers at Kyushu University discovered a number of one-pixel attacks against image classifiers. The ease, and the number of ways in which adversarial samples designed to fool image recognition models can be created, illustrates that should these models be used to make important decisions (such as in content filtering systems), mitigations (described in the fourth article in this eries) should be carefully considered before production deployment.

Scenario: bypass content filtering system

An attacker submits adversarially altered pornographic ad banners to a popular, well-reputed ad provider service. The submitted images bypass their machine learning-based content filtering system. The pornographic ad banner is displayed on frequently visited high-profile websites. As a result, minors are exposed to images that would usually have been blocked by parental control software. This is an availability attack.

Researchers have recently demonstrated that adversarial samples can be crafted for areas other than image classification. In August 2018, a group of researchers at the Horst Görtz Institute for IT Security in Bochum, Germany, crafted psychoacoustic attacks against speech recognition systems, allowing them to hide voice commands in audio of birds chirping. The hidden commands were not perceivable to the human ear, so the audio tracks were perceived differently by humans and machine-learning-based systems.

Scenario: perform a targeted attack against an individual using hidden voice commands

An attacker embeds hidden voice commands into video content, uploads it to a popular video sharing service, and artificially promotes the video (using a Sybil attack). When the video is played on the victim’s system, the hidden voice commands successfully instruct a digital home assistant device to purchase a product without the owner knowing, instruct smart home appliances to alter settings (e.g. turn up the heat, turn off the lights, or unlock the front door), or to instruct a nearby computing device to perform searches for incriminating content (such as drugs or child pornography) without the owner’s knowledge (allowing the attacker to subsequently blackmail the victim). This is an availability attack.

Scenario: take widespread control of digital home assistants

An attacker forges a ‘leaked’ phone call depicting plausible scandalous interaction involving high-ranking politicians and business people. The forged audio contains embedded hidden voice commands. The message is broadcast during the evening news on national and international TV channels. The attacker gains the ability to issue voice commands to home assistants or other voice recognition control systems (such as Siri) on a potentially massive scale. This is an availability attack.

Availability attacks against natural language processing systems

Natural language processing (NLP) models are used to parse and understand human language. Common uses of NLP include sentiment analysis, text summarization, question/answer systems, and the suggestions you might be familiar with in web search services. In an anonymous submission to ICLR (the International Conference on Learning Representations) during 2018, a group of researchers demonstrated techniques for crafting adversarial samples to fool natural language processing models. Their work showed how to replace words with synonyms in order to bypass spam filtering, change the outcome of sentiment analysis, and fool a fake news detection model. Similar results were reported by a group of researchers at UCLA in April, 2018.

Scenario: evade fake news detection systems to alter political discourse

Fake news detection is a relatively difficult problem to solve with automation, and hence, fake news detection solutions are still in their infancy. As these techniques improve and people start to rely on verdicts from trusted fake news detection services, tricking such services infrequently, and at strategic moments would be an ideal way to inject false narratives into political or social discourse. In such a scenario, an attacker would create a fictional news article based on current events, and adversarially alter it to evade known respected fake news detection systems. The article would then find its way into social media, where it would likely spread virally before it can be manually fact-checked. This is an availability attack.

Scenario: trick automated trading algorithms that rely on sentiment analysis

Over an extended period of time, an attacker publishes and promotes a series of adversarially created social media messages designed to trick sentiment analysis classifiers used by automated trading algorithms. One or more high-profile trading algorithms trade incorrectly over the course of the attack, leading to losses for the parties involved, and a possible downturn in the market. This is an availability attack.

Availability attacks – reinforcement learning

Reinforcement learning is the process of training an agent to perform actions in an environment. Reinforcement learning models are commonly used by recommendation systems, self-driving vehicles, robotics, and games. Reinforcement learning models receive the current environment’s state (e.g. a screenshot of the game) as an input, and output an action (e.g. move joystick left). In 2017, researchers at the University of Nevada published a paper illustrating how adversarial attacks can be used to trick reinforcement learning models into performing incorrect actions. Similar results were later published by Ian Goodfellow’s team at UC Berkeley.

Two distinct types of attacks can be performed against reinforcement learning models.

A strategically timed attack modifies a single or small number of input states at a key moment, causing the agent to malfunction. For instance, in the game of pong, if a strategic attack is performed as the ball approaches the agent’s paddle, the agent will move its paddle in the wrong direction and miss the ball.

An enchanting attack modifies a number of input states in an attempt to “lure” the agent away from a goal. For instance, an enchanting attack against an agent playing Super Mario could lure the agent into running on the spot, or moving backwards instead of forwards.

Scenario: hijack autonomous military drones

By use of an adversarial attack against a reinforcement learning model, autonomous military drones are coerced into attacking a series of unintended targets, causing destruction of property, loss of life, and the escalation of a military conflict. This is an availability attack.

Scenario: hijack an autonomous delivery drone

By use of a strategically timed policy attack, an attacker fools an autonomous delivery drone to alter course and fly into traffic, fly through the window of a building, or land (such that the attacker can steal its cargo, and perhaps the drone itself). This is an availability attack.

Availability attacks – wrap up

The processes used to craft attacks against classifiers, NLP systems, and reinforcement learning agents are similar. As of writing, all attacks crafted in these domains have been purely academic in nature, and we have not read about or heard of any such attacks being used in the real world. However, tooling around these types of attacks is getting better, and easier to use. During the last few years, machine learning robustness toolkits have appeared on github. These toolkits are designed for developers to test their machine learning implementations against a variety of common adversarial attack techniques. IBM Adversarial Robustness Toolbox, developed by IBM, contains implementations of a wide variety of common evasion attacks and defence methods, and is freely available on github. Cleverhans, a tool developed by Ian Goodfellow and Nicolas Papernot, is a Python library to benchmark machine learning systems’ vulnerability to adversarial examples. It is also freely available on github.


Replication attacks: transferability attacks

Transferability attacks are used to create a copy of a machine learning model (a substitute model), thus allowing an attacker to “steal” the victim’s intellectual property, or craft attacks against the substitute model that work against the original model. Transferability attacks are straightforward to carry out, assuming the attacker has unlimited ability to query a target model.

In order to perform a transferability attack, a set of inputs are crafted, and fed into a target model. The model’s outputs are then recorded, and that combination of inputs and outputs are used to train a new model. It is worth noting that this attack will work, within reason, even if the substitute model is not of absolutely identical architecture to the target model.

It is possible to create a ‘self-learning’ attack to efficiently map the decision boundaries of a target model with relatively few queries. This works by using a machine learning model to craft samples that are fed as input to the target model. The target model’s outputs are then used to guide the training of the sample crafting model. As the process continues, the sample crafting model learns to generate samples that more accurately map the target model’s decision boundaries.


Confidentiality attacks: inference attacks

Inference attacks are designed to determine the data used during the training of a model. Some machine learning models are trained against confidential data such as medical records, purchasing history, or computer usage history. An adversary’s motive for performing an inference attack might be out of curiosity – to simply study the types of samples that were used to train a model – or malicious intent – to gather confidential data, for instance, for blackmail purposes.

A black box inference attack follows a two-stage process. The first stage is similar to the transferability attacks described earlier. The target model is iteratively queried with crafted input data, and all outputs are recorded. This recorded input/output data is then used to train a set of binary classifier ‘shadow’ models – one for each possible output class the target model can produce. For instance, an inference attack against an image classifier than can identify ten different types of images (cat, dog, bird, car, etc.) would create ten shadow models – one for cat, one for dog, one for bird, and so on. All inputs that resulted in the target model outputting “cat” would be used to train the “cat” shadow model, and all inputs that resulted in the target model outputting “dog” would be used to train the “dog” shadow model, etc.

The second stage uses the shadow models trained in the first step to create the final inference model. Each separate shadow model is fed a set of inputs consisting of a 50-50 mixture of samples that are known to trigger positive and negative outputs. The outputs produced by each shadow model are recorded. For instance, for the “cat” shadow model, half of the samples in this set would be inputs that the original target model classified as “cat”, and the other half would be a selection of inputs that the original target model did not classify as “cat”. All inputs and outputs from this process, across all shadow models, are then used to train a binary classifier that can identify whether a sample it is shown was “in” the original training set or “out” of it. So, for instance, the data we recorded while feeding the “cat” shadow model different inputs, would consist of inputs known to produce a “cat” verdict with the label “in”, and inputs known not to produce a “cat” verdict with the label “out”. A similar process is repeated for the “dog” shadow model, and so on. All of these inputs and outputs are used to train a single classifier that can determine whether an input was part of the original training set (“in”) or not (“out”).

This black box inference technique works very well against models generated by online machine-learning-as-a-service offerings, such as those available from Google and Amazon. Machine learning experts are in low supply and high demand. Many companies are unable to attract machine learning experts to their organizations, and many are unwilling to fund in-house teams with these skills. Such companies will turn to machine-learning-as-a-service’s simple turnkey solutions for their needs, likely without the knowledge that these systems are vulnerable to such attacks.


Poisoning attacks against anomaly detection systems

Anomaly detection algorithms are employed in areas such as credit card fraud prevention, network intrusion detection, spam filtering, medical diagnostics, and fault detection. Anomaly detection algorithms flag anomalies when they encounter data points occurring far enough away from the ‘centers of mass’ of clusters of points seen so far. These systems are retrained with newly collected data on a periodic basis. As time goes by, it can become too expensive to train models against all historical data, so a sliding window (based on sample count or date) may be used to select new training data.

Poisoning attacks work by feeding data points into these systems that slowly shift the ‘center of mass’ over time. This process is often referred to as a boiling frog strategy. Poisoned data points introduced by the attacker become part of periodic retraining data, and eventually lead to false positives and false negatives, both of which render the system unusable.


Attacks against recommenders

Recommender systems are widely deployed by web services (e.g., YouTube, Amazon, and Google News) to recommend relevant items to users, such as products, videos, and news. Some examples of recommender systems include:

  • YouTube recommendations that pop up after you watch a video
  • Amazon “people who bought this also bought…”
  • Twitter “you might also want to follow” recommendations that pop up when you engage with a tweet, perform a search, follow an account, etc.
  • Social media curated timelines
  • Netflix movie recommendations
  • App store purchase recommendations

Recommenders are implemented in various ways:

Recommendation based on user similarity
This technique finds users most similar to a target user, based on items they’ve interacted with. They then predict the target user’s rating scores for other items based on the rating scores of those similar users. For instance, if user A and user B both interacted with item 1, and user B also interacted with item 2, recommend item 2 to user A.

Recommendation based on item similarity
This technique finds common interactions between items and then recommends a target user items based on those interactions. For instance, if many users have interacted with both items A and B, then if a target user interacts with item A, recommended B.

Recommendation based on both user and item similarity
These techniques use a combination of both user and item similarity-matching logic. This can be done in a variety of ways. For instance, rankings for items a target user has not interacted with yet are predicted via a ranking matrix generated from interactions between users and items that the target already interacted with.

An underlying mechanism in many recommendation systems is the co-visitation graph. It consists of a set of nodes and edges, where nodes represent items (products, videos, users, posts) and edge weights represent the number of times a combination of items were visited by the same user.

The most widely used attacks against recommender systems are Sybil attacks (which are integrity attacks, see above). The attack process is simple – an adversary creates several fake users or accounts, and has them engage with items in patterns designed to change how that item is recommended to other users. Here, the term ‘engage’ is dependent on the system being attacked, and could include rating an item, reviewing a product, browsing a number of items, following a user, or liking a post. Attackers may probe the system using ‘throw-away’ accounts in order to determine underlying mechanisms, and to test detection capabilities. Once an understanding of the system’s underlying mechanisms has been acquired, the attacker can leverage that knowledge to perform efficient attacks on the system (for instance, based on knowledge of whether the system is using co-visitation graphs). Skilled attackers carefully automate their fake users to behave like normal users in order to avoid Sybil attack detection techniques.

Motives include:

  • promotion attacks – trick a recommender system into promoting a product, piece of content, or user to as many people as possible
  • demotion attacks – cause a product, piece of content, or user to not be promoted as much as it should
  • social engineering – in theory, if an adversary already has knowledge on how a specific user has interacted with items in the system, an attack can be crafted to target that user with a recommendation such as a YouTube video, malicious app, or imposter account to follow.

Numerous attacks are already being performed against recommenders, search engines, and other similar online services. In fact, an entire industry exists to support these attacks. With a simple web search, it is possible to find inexpensive purchasable services to poison app store ratings, restaurant rating systems, and comments sections on websites and YouTube, inflate online polls, and engagement (and thus visibility) of content or accounts, and manipulate autocomplete and search results.

Buying YouTube views is cheap and easy. Source: first hit from a search on Google.

The prevalence and cost of such services indicates that they are probably widely used. Maintainers of social networks, e-commerce sites, crowd-sourced review sites, and search engines must be able to deal with the existence of these malicious services on a daily basis. Detecting attacks on this scale is non-trivial and takes more than rules, filters, and algorithms. Even though plenty of manual human labour goes into detecting and stopping these attacks, many of them go unnoticed.

From celebrities inflating their social media profiles by purchasing followers, to Cambridge Analytica’s reported involvement in meddling with several international elections, to a non-existent restaurant becoming London’s number one rated eatery on TripAdvisor, to coordinated review brigading ensuring that conspiratorial literature about vaccinations and cancer were highly recommended on Amazon , to a plethora of psy-ops attacks launched by the alt-right, high profile examples of attacks on social networks are becoming more prevalent, interesting, and perhaps disturbing. These attacks are often discovered long after the fact, when the damage is already done. Identifying even simple attacks while they are ongoing is extremely difficult, and there is no doubt many attacks are ongoing at this very moment.


Attacks against federated learning systems

Federated learning is a machine learning setting where the goal is to train a high-quality centralized model based on models locally trained in a potentially large number of clients, thus, avoiding the need to transmit client data to the central location. A common application of federated learning is text prediction in mobile phones. Each phone contains a local machine learning model that learns from its user (for instance, which recommended word they clicked on). The phone transmits its learning (the phone’s model’s weights) to an aggregator system, and periodically receives a new model trained on the learning from all of the other phones participating.

Attacks against federated learning can be viewed as poisoning or supply chain (integrity) attacks. A number of Sybils, designed to poison the main model, are inserted into a federated learning network. These Sybils collude to transmit incorrectly trained model weights back to the aggregator which, in turn, pushes poisoned models back to the rest of the participants. For a federated text prediction system, a number of Sybils could be used to perform an attack that causes all participants’ phones to suggest incorrect words in certain situations. The ultimate solution to preventing attacks in federated learning environments is to find a concrete method of establishing and maintaining trust amongst the participants of the network, which is clearly very challenging.


The understanding of flaws and vulnerabilities inherent in the design and implementation of systems built on machine learning and the means to validate those systems and to mitigate attacks against them are still in their infancy, complicated – in comparison with traditional systems –  by the lack of explainability to the user, heavy dependence on training data, and oftentimes frequent model updating. This field is attracting the attention of researchers, and is likely to grow in the coming years. As understanding in this area improves, so too will the availability and ease-of-use of tools and services designed for attacking these systems.

As artificial-intelligence-powered systems become more prevalent, it is natural to assume that adversaries will learn how to attack them. Indeed, some machine-learning-based systems in the real world have been under attack for years already. As we witness today in conventional cyber security, complex attack methodologies and tools initially developed by highly resourced threat actors, such as nation states, eventually fall into the hands of criminal organizations and then common cyber criminals. This same trend can be expected for attacks developed against machine learning models.

This concludes the third article in this series. The fourth and final article explores currently proposed methods for hardening machine learning systems against adversarial attacks


Malicious Use Of AI

This article is the second in a series of four articles on the work we’ve been doing for the European Union’s Horizon 2020 project codenamed SHERPA. Each of the articles in this series contain excerpts from a publication entitled “Security Issues, Dangers And Implications Of Smart Systems”. For more information about the project, the publication itself, and this series of articles, check out the intro post here.

This article explores how machine-learning techniques and services that utilize machine learning techniques might be used for malicious purposes.


The tools and resources needed to create sophisticated machine learning models have become readily available over the last few years. Powerful frameworks for creating neural networks are freely available, and easy to use. Public cloud services offer large amounts of computing resources at inexpensive rates. More and more public data is available. And cutting-edge techniques are freely shared – researchers do not just communicate ideas through their publications nowadays – they also distribute code, data, and models. As such, many people who were previously unaware of machine learning techniques are now using them.

Organizations that are known to perpetuate malicious activity (cyber criminals, disinformation organizations, and nation states) are technically capable enough to verse themselves with these frameworks and techniques, and may already be using them. For instance, we know that Cambridge Analytica used data analysis techniques in order to target specific Facebook users with political content via Facebook’s targeted advertising service (a service which allows ads to be sent directly to users whose email addresses are already known). This simple technique proved to be a powerful political weapon. Just recently, these techniques were still being used by pro-leave Brexiteer campaigners, to drum up support for a no-deal Brexit scenario.

An example of a targeted adverts sent to users of Facebook during the UK referendum in 2016. Source:

As the capabilities of machine-learning-powered systems evolve, we will need to understand how they might be used maliciously. This is especially true for systems that can be considered dual-use. The AI research community should already be discussing and developing best practices for distribution of data, code, and models that may be put to harmful use. Some of this work has already begun with efforts such as RAIL (Responsible AI Licenses).

This article suggests some forward-thinking examples of the potential malicious use of machine learning.

Intelligent automation

Machine learning methodologies have significant potential in the realm of offensive cyber security (a proactive and adversarial approach to protecting computer systems, networks and individuals from cyber attacks.) Password-guessing suites have recently been improved with Generative Adversarial Network (GAN) techniques, fuzzing tools now utilize genetic algorithms to generate payloads, and web penetration testing tools have started to implement reinforcement learning methodologies. Offensive cyber security tools are a powerful resource for both ‘black-’ and ‘white hat’ hackers. While advances in these tools will make cyber security professionals more effective in their jobs, cyber criminals will also benefit from these advances. Better offensive tools will enable more vulnerabilities to be discovered and responsibly fixed by the white hat community. However, at the same time, black hats may use these same tools to find software vulnerabilities for nefarious uses.

Intelligent automation will eventually allow current “advanced” CAPTCHA prompts to be solved automatically (most of the basic ones are already being solved with deep learning techniques). This will lead to the introduction of yet more cumbersome CAPTCHA mechanisms, hell-bent on determining whether or not we are robots.

The future of intelligent automation promises a number of potential malicious applications:

  • Swarm intelligence capabilities might one day be added to botnets to deliver optimized DDoS attacks and spam campaigns, and to automatically discover new targets to infect.
  • Malware of the future may be designed to function as an adaptive implant – a self-contained process that learns from the host it is running on in order to remain undetected, search for and classify interesting content for exfiltration, search for and infect new targets, and discover new pathways or methods for lateral movement.
  • A report published in February, 2019 by ESET claimed that the Emotet malware exhibited behaviour that would be difficult to achieve without the aid of machine learning. The author explained that, because different types of infected hosts received different payloads (in particular, to prevent security researchers from analysing the malware), the malware’s authors must have developed some sort of machine learning logic to decide which payload each victim received. From these claims, one might imagine that Emotet’s back ends employ host profiling logic that is derived by clustering a set of features received from connecting hosts, assigning labels to each identified cluster, and then deploying specific payloads to each machine, based on its cluster label. Even though it is more likely that Emotet’s back ends simply use hand-written rules to determine which payloads each infected host receives, this story illustrates a practical, and easy to implement use of machine learning in malicious infrastructure.
  • Futuristic end-to-end models could be designed to learn optimal strategies for the automated generation of efficient, undetectable poisoning attacks against search engines, recommenders, anomaly detection systems and federated learning systems.

Analytics, disinformation, and fake news

Data analysis and machine learning methods can be used for both benign and malicious purposes. Analytics techniques used to plan marketing campaigns can be used to plan and implement effective regional or targeted spam campaigns. Data freely available on social media platforms can be used to target users or groups with scams, phishing, or disinformation. Data analysis techniques can also be used to perform efficient reconnaissance and develop social engineering strategies against organizations and individuals in order to plan a targeted attack.

The potential impact of combining powerful data analysis techniques with carefully crafted disinformation is huge. Disinformation now exists everywhere on the Internet and remains largely unchecked. The processes required to understand the mechanisms used in organized disinformation campaigns are, in many cases, extremely complex. After news of potential social media manipulation of opinions during the 2016 US elections, the 2016 UK referendum on Brexit, and elections across Africa, and Germany many governments are now worried that well-organized disinformation campaigns may target their voters during an upcoming election. Election meddling via social media disinformation is common in Latin American countries. However, in the west, disinformation on social media and the Internet is no longer solely focused on altering the course of elections – it is about creating social divides, causing confusion, manipulating people into having more extreme views and opinions, and misrepresenting facts and the perceived support that a particular opinion has.

An example of training material published by the alt-right prior to the 2016 US Presidential elections. Source:

Social engineering campaigns run by entities such as the Internet Research Agency, Cambridge Analytica, and the far-right demonstrate that social media advert distribution platforms (such as those on Facebook) have provided a weapon for malicious actors which is incredibly powerful, and damaging to society. The disruption caused by these recent political campaigns has created divides in popular thinking and opinion that may take generations to repair. Now that the effectiveness of these social engineering tools is apparent, what we have seen so far is likely just an omen of what is to come.

The disinformation we hear about is only a fraction of what is actually happening. It requires a great deal of time and effort for researchers to find evidence of these campaigns. Twitter data is open and freely available, and yet it can still be extremely tedious to find evidence of disinformation and sentiment amplification campaigns on that platform. Facebook’s targeted ads are only seen by the users who were targeted in the first place. Unless those who were targeted come forward, it is almost impossible to determine what sort of ads were published, who they were targeted at, and what the scale of the campaign was. Although social media platforms now enforce transparency on political ads, the source of these ads must still be determined in order to understand what content is being targeted at whom.

An example of the sort of data that is used in targeted political advertising. Source:

Many individuals on social networks share links to “clickbait” headlines that align with their personal views or opinions, sometimes without having read the content behind the link. Fact checking can be cumbersome for people who do not have a lot of time. As such, inaccurate or fabricated news, headlines, or “facts” propagate through social networks so quickly that even if they are later refuted, the damage is already done. Fake news links are not just shared by the general public – celebrities and high-profile politicians may also knowingly or unknowingly share such content. This mechanism forms the very basis of malicious social media disinformation. A well-documented example of this was the UK’s “Leave” campaign that was run before the Brexit referendum in 2016. Some details of that campaign are documented in the recent Channel 4 film: “Brexit: The Uncivil War”. The problem is now so acute that in February, 2019 the Council of Europe published a warning about the risk of algorithmic processes being used to manipulate social and political behaviours.

Despite what we know about how social media manipulation tactics were used during the Brexit referendum, multiple pro-Leave organizations are still funding social media ads promoting a “no deal” Brexit on a massive scale. The source of these funds, and the groups that are running these campaigns are not documented.

A new pro-Leave UK astroturfing campaign, “Turning Point UK”, funded by the far-right in both the UK and US, was kicked off in February 2019. It created multiple accounts on social media platforms to push its agenda. At the time of writing, right-wing groups are heavily manipulating sentiment on social media platforms in Venezuela. Across the globe, the alt-right continues to manipulate social media, and artificially amplify pro-right-wing sentiment. For instance, in the US, multitudes of high-volume #MAGA (Make America Great Again) accounts amplify sentiment. In France, at the beginning of 2019 a pro-LePen #RegimeChange4France hashtag amplification push was documented on Twitter, clearly originating from agents working outside of France. In the UK during early 2019, a far-right advert was promoted on YouTube. This five-and-a-half minute anti-Muslim video was unskippable.

A node-edge graph of global far-right Twitter amplification captured in May 2019.

During the latter half of 2018, malicious actors uploaded multiple politically motivated videos to YouTube, and amplified their engagement through views and likes. These videos, designed to evade YouTube’s content detectors, showed up on recommendation lists for average YouTube users.


Disinformation campaigns will become easier to run and more prevalent in coming years. As the blueprints laid out by companies such as Cambridge Analytica are followed, we might expect these campaigns to become even more widespread and socially damaging.

A potentially dystopian outcome of social networks was outlined in a blog post written by François Chollet in May 2018, in which he describes social media becoming a “Psychological Panopticon”. The premise for his theory is that the algorithms that drive social network recommendation systems have access to every user’s perceptions and actions. Algorithms designed to drive user engagement are currently rather simple, but if more complex algorithms (for instance, based on reinforcement learning) were to be used to drive these systems, they may end up creating optimization loops for human behaviour, in which the recommender observes the current state of each target and keeps tuning the information that is fed to them, until the algorithm starts observing the opinions and behaviours it wants to see. In essence the system will attempt to optimize its users. Here are some ways these algorithms may attempt to ‘train’ their targets:

  • The algorithm may choose to only show a target user content that it believes the user will engage or interact with, based on the algorithm’s notion of the target’s identity or personality. Thus, it will cause reinforcement of certain opinions or views in the target, based on the algorithm’s own logic. (This is partially occurring already).
  • If the target user publishes a post containing a viewpoint that the algorithm does not ‘wish’ the user to hold, it will only share it with users who would view the post negatively. The target will, after being flamed or down-voted enough times, stop sharing such views.
  • If the target user publishes a post containing a viewpoint the algorithm ‘wants’ the user to hold, it will only share it with other users that view the post positively. The target will, after some time, likely share more of the same views.
  • The algorithm may place a target user in an ‘information bubble’ where the user only sees posts from associates that share the target’s views (and that are desirable to the algorithm).
  • The algorithm may notice that certain content it has shared with a target user caused their opinions to shift towards a state (opinion) the algorithm deems more desirable. As such, the algorithm will continue to share similar content with the user, moving the target’s opinion further in that direction. Ultimately, the algorithm may itself be able to generate content to those ends.

Chollet goes on to mention that, although social network recommenders may start to see their users as optimization problems, a bigger threat still arises from external parties gaming those recommenders in malicious ways. The data available about users of a social network can already be used to predict when a user is suicidal, or when a user will fall in love or break up with their partner, and content delivered by social networks can be used to change users’ moods. We also know that this same data can be used to predict which way a user will vote in an election, and the probability of whether that user will vote or not.

If this optimization problem seems like a thing of the future, bear in mind that, at the beginning of 2019, YouTube made changes to its recommendation algorithms precisely because of problems it was causing for certain members of society. Guillaume Chaslot posted a Twitter thread in February 2019 that described how YouTube’s algorithms favoured recommending conspiracy theory videos, guided by the behaviours of a small group of hyper-engaged viewers. Fiction is often more engaging than fact, especially for users who spend substantial time watching YouTube. As such, the conspiracy videos watched by this group of chronic users received high engagement, and thus were pushed up by the recommendation system. Driven by these high engagement numbers, the makers of these videos created more and more content, which was, in-turn, viewed by this same group of users. YouTube’s recommendation system was optimized to pull more and more users into chronic YouTube addiction. Many of the users sucked into this hole have since become indoctrinated with right-wing extremist views. One such user became convinced that his brother was a lizard, and killed him with a sword. In February, 2019 the same algorithmic misgiving was found to have assisted the creation of a voyeur ring for minors on YouTube. Chaslot has since created a tool that allows users to see which of these types of videos are being promoted by YouTube.

Between 2008 and 2013, over 120 bogus computer-generated papers were submitted, peer-reviewed, and published by the Springer and Institute of Electrical and Electronics Engineers (IEEE) organizations. These computer-generated papers were likely created using simple procedural methods, such as context-free grammars or Markov chains. Text synthesis methods have matured considerably since 2013. A 2015 blog post by Andrej Karpathy [4] illustrated how recurrent neural networks can be used to learn from specific text styles, and then synthesize new, original text in a similar style. Andrej illustrated this technique with Shakespeare, and then went on to train models that were able to generate C source code, and Latex sources for convincing-looking algebraic geometry papers. It is entirely possible that these text synthesis techniques could be used to submit more bogus papers to IEEE in the future.

Andrej Karpathys’ rnn-generated algebraic geometry papers. Source:

A 2018 blog post by Chengwei Zhang demonstrated how realistic Yelp reviews can be easily created on a home computer using standard machine learning frameworks. The blog post included links to all the tools required to do this. Given that there are online services willing to pay for fake reviews, it is plausible that these tools are already being used by individuals to make money (while at the same time, corrupting the integrity of Yelp’s crowdsourced ranking systems.)

In 2017, Jeff Kao discovered that over a million ‘pro-repeal net neutrality’ comments submitted to the Federal Communications Commission (FCC) were auto-generated. The methodology used to generate the comments was not machine learning – the sentences were ‘spun’ by randomly replacing words and phrases with synonyms. A quick search on Google reveals that there are commercial tools available precisely to auto-generate content in this manner. The affiliates of this software suite provide almost every tool you might potentially need to run a successful disinformation campaign.

An example of some of the fake comments submitted to the FCC. Phrases in the same colour are synonyms. Source:

The use of machine learning will certainly hinder the possibility of detecting fake textual content. In February 2019, OpenAI published an article about a text synthesis model (GPT-2) they had created that was capable of generating realistic written English. The model, designed to predict the next word in a sentence, was trained on over 40GB of text. The results were impressive – feed the model a few sentences of seed text, and it will generate as many pages of prose as you want, all following the theme of the input. The model was also able to remember names it had quoted, and re-used them in the same text, despite having no in-built memory mechanisms.

OpenAI chose not to release the trained model to the public, and instead opted to offer private demos of the technology to visiting journalists. This was seen by many as a controversial move. While OpenAI acknowledged that their work would soon be replicated by others, they stated that they preferred to open a dialog about the potential misuse of such a model, and what might be done to curb this misuse, instead of putting the model directly in the hands of potentially malicious actors. While the GPT-2 model may not be perfect, it represents a significant step forward in this field.

An example of text generated by OpenAI’s GPT-2 model. Source:

Unfortunately, the methods developed to synthesize written text (and other types of content) are far outpacing technologies that can determine whether that text is real or synthesized. This will start to prove problematic in the near future, should such synthesis methods see widespread adoption.

Phishing and spam

Phishing is the practise of fraudulently attempting to obtain sensitive information such as usernames, passwords and credit card details, or access to a user’s system (via the installation of malicious software) by masquerading as a trustworthy entity in an electronic communication. Phishing messages are commonly sent via email, social media, text message, or instant message, and can include an attachment or URL, along with an accompanying message designed to trick the recipient into opening the attachment or clicking on the link. The victim of a phishing message may end up having their device infected with malware, or being directed to a site designed to trick them into entering login credentials to a service they use (such as webmail, Facebook, Amazon, etc.) If a user falls for a phishing attack, the adversary who sent the original message will gain access to their credentials, or to their computing device. From there, the adversary can perform a variety of actions, depending on what they obtained, including: posing as that user on social media (and using the victim’s account to send out more phishing messages to that user’s friends), stealing data and/or credentials from the victim’s device, attempting to gain access to other accounts belonging to the victim (by re-using the password they discovered), stealing funds from the victim’s credit card, or blackmailing the victim (with stolen data, or by threatening to destroy their data).

Phishing messages are often sent out in bulk (for instance, via large spam email campaigns) in order to trawl in a small percentage of victims. However, a more targeted form of phishing, known as spear phishing, can be used by more focused attackers in order to gain access to specific individuals’ or companies’ accounts and devices. Spear phishing attacks are generally custom-designed to target only a handful of users (or even a single user) at a time. On the whole, phishing messages are hand-written, and often carefully designed for their target audiences. For instance, phishing emails sent in large spam runs to recipients in Sweden might commonly be written in the Swedish language, use a graphical template similar to the Swedish postal service, and claim that the recipient has a parcel waiting for them at the post office, along with a malicious link or attachment. A certain percentage of recipients of such a message may have been expecting a parcel, and hence may be fooled into opening the attachment, or clicking on the link.

In 2016, researchers at the cyber security company ZeroFOX created a tool called SNAP_R (Social Network Automated Phishing and Reconnaissance). Although mostly academic in nature, this tool demonstrated an interesting proof of concept for the generation of tailored messages for social engineering engagement purposes. Although such methodology would be currently too cumbersome for cyber criminals to implement (compared to current phishing techniques), in the future one could envision an easy way to use the tool that implements an end-to-end reinforcement learning and natural language generation model to create engaging messages specifically optimized for target groups or individuals. There is already evidence that threat actors are experimenting with social network bots that talk to each other. If they could be designed to act naturally, it will become more and more difficult to separate real accounts from fake ones.

One of the most feared applications of written content generation is that of automated spam generation. If one envisions the content classification cat-and-mouse game running to its logical conclusion, it might look something like this:

Attacker: Generate a single spam message and send it to thousands of mailboxes.
Defender: Create a regular expression or matching rule to detect the message.

Attacker: Replace words and phrases based on a simple set of rules to generate multiple messages with the same meaning.
Defender: Create more complex regular expressions to handle all variants seen.

Attacker: Use context-free grammars to generate many different looking messages with different structures.
Defender: Use statistical models to examine messages.

Attacker: Train an end-to-end model that generates adversarial text by learning the statistical distributions a spam detection model activates on.
Defender: ???

By and large, the spam cat-and-mouse game still operates at the first stage of the above illustration.

Generation of audio-visual content

Machine learning techniques are opening up new ways to generate images, videos, and human voices. As this section will show, these techniques are rapidly evolving, and have the potential to be combined to create convincing fake content.

Generative Adversarial Networks (GANs) have evolved tremendously in the area of image generation since 2014, and are now at the level where they can be used to generate photo-realistic images.

How GAN-generated images have evolved over the past four years. Source:

Common Sybil attacks against online services involve the creation of multiple ‘sock puppet’ accounts that are controlled by a single entity. Currently, sock puppet accounts utilize avatar pictures lifted from legitimate social media accounts, or from stock photos. Security researchers can often identify sock puppet accounts by reverse-image searching their avatar photos. It is now possible to generate unique profile pictures generated by GANs, using online services such as These pictures are not reverse-image searchable, and hence it will become increasingly difficult to determine whether sock puppet accounts are real or fake. In fact, in March 2019, a sockpuppet account was discovered using a GAN-generated avatar picture, and linking to a website containing seemingly machine-learning synthesized text. This discovery was probably one of the first of its kind.

A sockpuppet using a GAN-generated avatar. Source:

A collage of faces generated from

GANs can be used for a variety of other image synthesis purposes. For instance, a model called CycleGAN can modify existing images to change the weather in a landscape scene, perform object transfiguration (e.g. turn a horse into a zebra, or an apple into an orange), and to convert between paintings and photos. A model called pix2pix, another technique based on GANs, has enabled developers to create image editing software which can build photo-realistic cityscapes from simple drawn outlines.

CycleGAN examples. Source:

The ability to synthesize convincing images opens up many social engineering possibilities. Scams already exist that send messages to social media users with titles such as “Somebody just put up these pictures of you drunk at a wild party! Check ’em out here!” in order to entice people to click on links. Imagine how much more convincing these scams would be if the actual pictures could be generated. Likewise, such techniques could be used for targeted blackmail, or to propagate faked scandals.

DeepFakes is a machine learning-based image synthesis technique that can be used to combine and superimpose existing images and videos onto source images or videos. DeepFakes made the news in 2017, when it was used to swap the faces of actors in pornographic movies with celebrities’ faces. Developers working in the DeepFakes community created an app, allowing anyone to create their own videos with ease. The DeepFakes community was subsequently banned from several high-profile online communities. In early 2019, a researcher created the most convincing face-swap video to date, featuring a video of Jennifer Lawrence, with Steve Buscemi’s face superimposed, using the aforementioned DeepFakes app.

Since the introduction of DeepFakes, video synthesis techniques have become a lot more sophisticated. It is now possible to map the likeness of one individual onto the full-body motions of another, and to animate an individual’s facial movements to mimic arbitrary speech patterns.

In the area of audio synthesis, it is now possible to train speech synthesizers to mimic an individual’s voice. Online services, such as, provide a simple web interface that allows any user to replicate their own voice by repeating a handful of phrases into a microphone (a process that only takes a few minutes). Lyrebird’s site includes fairly convincing examples of voices synthesized from high-profile politicians such a Barack Obama, Hilary Clinton, and Donald Trump. lyrebird’s synthesized voices aren’t flawless, but one can imagine that they would sound convincing enough if transmitted over a low-quality signal (such as a phone line), with some added background noise. Using audio synthesis techniques, one might appreciate how easy it will be, in the near future, to create faked audio of conversations for political or social engineering purposes.

Impersonation fraud is a social engineering technique used by scammers to trick an employee of a company into transferring money into a criminal’s bank account. The scam is often perpetrated over the phone – a member of a company’s financial team is called by a scammer, posing as a high-ranking company executive or CEO, and is convinced to transfer money urgently in order to secure a business deal. The call is often accompanied by an email that adds to the believability and urgency of the request. These scams rely on being able to convince the recipient of the phone call that they are talking to the company’s CEO, and would fail if the recipient noticed something wrong with the voice on the other end of the call. Voice synthesis techniques could drastically improve the reliability of such scams.

A combination of object transfiguration, scene generation, pose mimicking, adaptive lip-syncing, and voice synthesis opens up the possibility for creation of fully generated video content. Content generated in this way would be able to place any individual into any conceivable situation. Fake videos will become more and more convincing as these techniques evolve (and new ones are developed), and, in turn, determining whether a video is real or fake will become much more difficult.


In August 2018, IBM published a proof-of-concept design for malware obfuscation that they dubbed “DeepLocker”. The proof of concept consisted of a benign executable containing an encrypted payload, and a decryption key ‘hidden’ in a deep neural network (also embedded in the executable). The decryption key was generated by the neural network when a specific set of ‘trigger conditions’ (for example, a set of visual, audio, geolocation and system-level features) were met. Guessing the correct set of conditions to successfully generate the decryption key is infeasible, as is deriving the key from the neural network’s saved parameters. Hence, reverse engineering the malware to extract the malicious payload is extremely difficult. The only way to access the extracted payload would be to find an actual victim. Sophisticated nation-state cyber attacks sometimes rely on distributing hidden payloads (in executables) that activate only under certain conditions. As such, this technique may attract interest from nation-state adversaries.


While recent innovations in the machine learning domain have enabled significant improvements in a variety of computer-aided tasks, machine learning systems present us with new challenges, new risks, and new avenues for attackers. The arrival of new technologies can cause changes and create new risks for society, even when they are not deliberately misused. In some areas, artificial intelligence has become powerful to the point that trained models have been withheld from the public over concerns of potential malicious use. This situation parallels to vulnerability disclosure, where researchers often need to make a trade-off between disclosing a vulnerability publicly (opening it up for potential abuse) and not disclosing it (risking that attackers will find it before it is fixed).

Machine learning will likely be equally effective for both offensive and defensive purposes (in both cyber and kinetic theatres), and hence one may envision an “AI arms race” eventually arising between competing powers. Machine-learning-powered systems will also affect societal structure with labour displacement, privacy erosion, and monopolization (larger companies that have the resources to fund research in the field will gain exponential advantages over their competitors).

The use of machine learning methods and technologies are well within the capabilities of the engineers that build malware and its supporting infrastructure. Tools in the offensive cyber security space already use machine learning techniques, and these tools are as available to malicious actors as they are to security researchers and specialists. Since it is almost impossible to observe how malicious actors operate, no evidence of the use of such methods have yet been witnessed (although some speculation exists to support that possibility). Thus, we speculate that by-and-large, machine learning techniques are still not being utilized heavily for malicious purposes.

Text synthesis, image synthesis, and video manipulation techniques have been strongly bolstered by machine learning in recent years. Our ability to generate fake content is far ahead of our ability to detect whether content is real or faked. As such, we expect that machine-learning-powered techniques will be used for social engineering and disinformation in the near future. Disinformation created using these methods will be sophisticated, believable, and extremely difficult to refute.

This concludes the second article in this series. The next article explains how attacks against machine learning models work, and provides a number of interesting examples of potential attacks against systems that utilize machine learning methodologies.


Bad AI

This article is the first in a series of four articles on the work we’ve been doing for the European Union’s Horizon 2020 project codenamed SHERPA. Each of the articles in this series contain excerpts from a publication entitled “Security Issues, Dangers And Implications Of Smart Systems”. For more information about the project, the publication itself, and this series of articles, check out the intro post here.

This article details the types of flaws that can arise when developing machine learning models. It also includes some advice on how to avoid introducing flaws while developing your own models.

A brief recap of what machine learning is

Machine learning is the process of training an algorithm (model) to learn from data without the need for rules-based programming. In traditional software development processes, a developer hand-writes code such that a known set of inputs are transformed into desired outputs. With machine learning, an algorithm is iteratively configured to transform a set of known inputs into a set of outputs optimizing desired characteristics. Many different machine learning architectures exist, ranging from simple logistic regression to complex neural network architectures (sometimes referred to as “deep learning”). Common uses of machine learning include:

  • Classification – assigning a label (class) to an input (e.g. determining whether there is a dog in an image)
  • Sequential – predicting a sequence (e.g. translating a sentence from English to French, predicting the next words in a sentence, the next notes in a musical sequence, or the next price of a stock)
  • Policy – controlling an agent in an environment (e.g. playing a video game, driving a car)
  • Clustering – grouping a number of inputs by similarity (e.g. finding anomalies in network traffic, identifying demographic groups)
  • Generative – generating artificial outputs based on inputs the model was trained on (e.g. generating face images)

Methods employed to train machine learning models depend on the problem space and available data. Supervised learning techniques are used to train a model on fully labelled data. Semi-supervised learning techniques are used to train a model with partially labelled data. Unsupervised learning techniques are used to process completely unlabelled data. Reinforcement learning techniques are used to train agents to interact with environments (such as playing video games or driving a car).

Recent innovations in the machine learning domain have enabled significant improvements in a variety of computer-aided tasks, including:

  • Image and video recognition, tagging, labelling, and captioning systems
  • Speech-to-text and speech-to-speech conversion
  • Language translation
  • Linguistic analysis
  • Text synthesis
  • Chatbots and natural language understanding tasks
  • Financial modelling and automated trading
  • Image synthesis
  • Content generation and artistic tools
  • Image and video manipulation
  • Game playing
  • Self-driving vehicles
  • Robot control systems
  • Marketing analytics
  • Recommendation systems and personal digital assistants
  • Network anomaly detection
  • Penetration testing tools
  • Content categorization, filtering, and spam detection

Machine learning-based systems are already deployed in many domains, including finance, commerce, science, military, healthcare, law enforcement, and education.  In the future, more and more important decisions will be made with the aid of machine learning. Some of those decisions may even lead to changes in policies and regulations. Hence it will be important for us to understand how machine learning models make decisions, predict ways in which flaws and biases may arise, and determine whether flaws or biases are present in finished models. A growing interest in understanding how to develop attacks against machine learning systems will also accompany this evolution, and, as machine learning techniques evolve they will inevitably be adopted by ‘bad actors’, and used for malicious purposes.

This series of articles explore how flaws and biases might be introduced into machine learning models, how machine learning techniques might, in the future, be used for offensive or malicious purposes, how machine learning models can be attacked, and how those attacks can presently be mitigated. Machine learning systems present us with new challenges, new risks, and new avenues for cyber attackers. Further articles in this series will explore the implications of attacks against these systems and how they differ from attacks against traditional systems.

Bad Artificial Intelligence (AI)

If a machine learning model is designed or trained poorly, or used incorrectly, flaws may arise. Designing and training machine learning models is often a complex process, and there are numerous ways in which flaws can be introduced.

A flawed model, if not identified as such, can pose risks to people, organizations, or even society. In recent years, machine-learning-as-a-service (such as Amazon SageMaker, Azure Machine Learning Service, and Google Cloud Machine Learning Engine) offerings have enabled individuals to train machine learning models on their own data, without the need for deep technical domain knowledge. While these services have lowered the barrier to adoption of machine learning techniques, they may have also inadvertently introduced the potential for widespread misuse of those techniques.

This article enumerates the most common errors made while designing, training, and deploying machine learning models. Common flaws can be broken into three categories – incorrect design decisions, deficiencies in training data, and incorrect utilization choices.

Flaws arising from design decisions

Machine learning models are essentially functions that accept a set of inputs, and return a set of outputs. It is up to a machine learning model’s designer to select the features that are used as inputs to a model, such that it can be trained to generate accurate outputs. This process is often called feature engineering. If a designer of a model chooses features that are signal-poor (have little effect on the decision that is made by the model), irrelevant (have no effect on the decision), or introduce bias (inclusion or omission of inputs and / or features that favour/disfavour certain results), the model’s outputs will be inaccurate.

If features do not contain information relevant to solving the task at hand, they are essentially useless. For instance, it would be impossible to build a model that can predict the optimal colour for targeted advertisements with data collected from customer’s calls for technical support. Unfortunately, the misconception that throwing more data at a problem will suddenly make it solvable is all too real, and such examples do occur in real life.

A good example of poor feature engineering can be observed in some online services that are designed to determine whether Twitter users are fake or bots. Some of these services are based on machine learning models whose features are derived only from the data available from a Twitter account’s “user” object (the length of the account’s name, the date the account was created, how many Tweets the account has published, how many followers and friends the account has, and whether the account has set a profile picture or description). These input features are relatively signal-poor for determining whether an account is a bot, which often manifests in incorrect classification.

This is the information (circled) that some “bot or not” services use to determine whether an account is a bot. Or not.

Another common design flaw is inappropriately or suboptimally chosen model architecture and parameters. A potentially huge number of combinations of architectures and parameters are available when designing a machine learning model, and it is almost impossible to try every possible combination. A common approach to solving this problem is to find an architecture that works best, and then use an iterative process, such as grid search or random search to narrow down the best parameters. This is a rather time-consuming process – in order to test each set of parameters, a new model must be trained – a process that can take hours, days, or even weeks. A designer who is not well-practiced in this field may simply copy a model architecture and parameters from elsewhere, train it, and deploy it, without performing proper optimization steps.

An illustration of some of the design decisions available when building a machine learning model.


Incorrect choices in a model’s architecture and parameters can often lead to the problem of overfitting, when a model learns to partition the samples it has been shown accurately, but fails to generalize on real-world data. Overfitting can also arise from training a model on data that contains only a limited set of representations of all possible inputs, which can happen even when a training set is large if there’s a lack of diversity in that dataset. Problems related to training data will be discussed in greater detail in the next subsection.


Overfitting can be minimized by architectural choices in the model – such as dropout in the case of neural networks. It can also be minimized by data augmentation – the process of creating additional training data by modifying existing training samples. For instance, in order to augment the data used to train an image classification model, you might create additional training samples by flipping each image, performing a set of crops on each image, and brightening/darkening each image.

Flaws arising from deficient training data

It is common practice to evaluate a model on a separate set of samples after training (often called a test set). However, if the test set contains equally limited sample data, the trained model will appear to be accurate (until put into production). Gathering a broad enough set of training examples is often extremely difficult. However, model designers can iteratively test a model on real-world data, update training and test sets with samples that were incorrectly classified, and repeat this process until satisfactory real-world results are achieved. This process can be time-consuming, and thus may not always be followed in practice.

Supervised learning methods require a training set that consists of accurately labelled samples. Labelled data is, in many cases, difficult or costly to acquire – the process of creating a labelled set can include manual work by human beings. If a designer wishes to create a model using supervised learning, but doesn’t have access to an appropriate labelled set of data, one must be created. Here, shortcuts may be taken in order to minimize the cost of creating such a set. In some cases, this might mean “working around” the process of manually labelling samples (i.e. blanket collection of data based on the assumption that it falls under a specific label). Without manually checking data collected in this way, it is possible that the model will be trained with mislabelled samples.

If a machine learning model is trained with data that contains imbalances or assumptions, the output of that model will reflect those imbalances or assumptions. Imbalances can be inherent in the training data, or can be “engineered” into the model via feature selection and other designers’ choices. For example, evidence from the US suggests that models utilized in the criminal justice system are more likely to incorrectly judge black defendants as having a higher risk of reoffending than white defendants. This flaw is introduced into their models both by the fact that the defendant’s race is used as an input feature, and the fact that the historical data might excessively influence decision-making.

In another recent example, Amazon attempted to create a machine learning model to classify job applicants. Since the model was trained on the company’s previous hiring decisions, it led them to building a recruitment tool that reinforced their company’s historical hiring policies. The model penalized CVs that included the word “women’s”, downgraded graduates from women’s colleges, and highly rated aggressive language. It also highly rated applicants with the name “Jared” who had previously played lacrosse.

A further example of biases deeply embedded in historical data can be witnessed in natural language processing (NLP) tasks. The creation of word vectors is a common precursor step to other NLP tasks. Word vectors are usually more accurate when trained against a very large text corpus, such as a large set of scraped web pages and news articles (for example, the “Google News data” set). However, when running simple NLP tasks, such as sentiment analysis, using word vectors created in this manner, bias in English-language news reporting becomes apparent. Simple experiments reveal that word vectors trained against the Google News text corpus exhibit gender stereotypes to a disturbing extent (such as associating the phrase “computer programmer” to man and the word “homemaker” to woman).

Word vector examples. Source:


The idea that bias can exist in training data, that it can be introduced into models, and that biased models may be used to make important decisions in the future is the subject of much attention. Anti-bias initiatives already exist (such as AlgorithmWatch (Berlin), and Algorithmic Justice League, (US), and several technical solutions to identify and fix bias in machine learning models are now available (such as IBM’s Fairness 360 kit, Facebook’s Fairness Flow, an as-yet-unnamed tool from Microsoft, and Google’s “what if” tool). Annual events are also arranged to discuss such topics, such as FAT-ML (Fairness, Accountability, and Transparency in Machine Learning). Groups from Google and IBM have proposed a standardized means of communicating important information about their work, such as a model’s use cases, a dataset’s potential biases, or an algorithm’s security considerations. Here are links to a few papers on the subject.

AI is reportedly transforming many industries, including lending and loans, criminal justice, and recruitment. However, participants in a recent Twitter thread started by Professor Gina Neff  discussed the fact that imbalances in datasets is incredibly difficult to find and fix, given that it arises for social and organizational reasons, in addition to technical reasons. This was illustrated by the analogy that despite being technically rooted, both space shuttle accidents were ultimately caused by societal and organizational failures. The thread concluded that bias in datasets (and thus the machine learning models trained on those datasets) is a problem that no single engineer, company or even country can conceivably fix.

Flaws arising from incorrect utilization of a machine learning model

Machine learning models are very specific to the data they were trained on and, more generally, the machine learning paradigm has serious limitations. This is often difficult for humans to grasp – their overly high expectations come from naively equating machine intelligence with human intelligence. For example, humans are able to recognize people they know regardless of different weather and lighting conditions. The fact that someone you know is a different colour under nightclub lighting, or is wet because they have been standing in the rain does not make it any more difficult for you to recognize them. However, this is not necessarily the case for machine learning models. It is also important to observe that modelling always involves certain assumptions, so applying a machine-learning-based model in situations when the respective assumptions do not hold will likely lead to poor results.

Going beyond the above examples, people sometimes attempt to solve problems that simply cannot (or should not) be solved with machine learning, perhaps due to a lack of understanding of what can and cannot be done with current methodologies.

One good example of this is automated grading of essays, a task where machine learning with its current limitations should not be used at all. School districts in certain parts of the world have however created machine learning models using historically collected data – essays, and the grades that were assigned to them. The trained model takes a new essay as input and outputs a grade. The problem with this approach is that the model is unable to understand the content of the essay (a task that is far beyond the reach of current machine learning capabilities), and simply grades it based on patterns found in the text – sentence structure, usage of fancy words, paragraph lengths, and usage of punctuation and grammar. In some cases, researchers have written tools to generate nonsensical text designed to always score highly in specific essay grading systems.

What to keep in mind when planning, building and utilizing machine learning systems

The process of developing and deploying a machine learning model differs from standard application development in a number of ways. The designer of a machine learning model starts by collecting data or building a scenario that will be used to train the model, and writes the code that implements the model itself. The developer then runs a training phase, where the model is exposed to the previously prepared training data or scenario and, through an iterative process, configures its internal parameters in order to fit the model. Once training has ended, the resulting model is tested for the key task-specific characteristics, such as accuracy, recall, efficiency, etc. The output of training a machine learning model is the code that implements the model, and a serialized data structure that describes the learned parameters of that model. If the resulting model fails to pass tests, the model’s developer adjusts its parameters and/or architecture and perhaps even modifies the training data or scenario and repeats the training process until a satisfactory outcome is achieved. When a suitable model has been trained, it is ready to be deployed into production. The model’s code and parameters are plugged into a system that accepts data from an external source, processes it into inputs that the model can accept, feeds the inputs into the model, and then routes the model’s outputs to intended recipients.

Depending on the type and complexity of a chosen model’s architecture, it may or may not be possible for the developer to understand or modify the model’s logic. As an example, decision trees are often highly readable and fully editable. At the other end of the spectrum, complex neural network architectures can contain millions of internal parameters, rendering them almost incomprehensible. Models that are readable are also explainable, and it becomes much easier to detect flaws and bias in such models. However, these models are often relatively simple, and may be unable to handle more complex tasks. Tools exist to partially inspect the workings of complex neural networks, but finding bias and flaws in such models can be an arduous task that may often involve guesswork. Hence, rigorous testing is required to ensure the absence of potential flaws and biases. Testing a machine learning model against all possible inputs is impossible. In contrast, where an interface exists in traditionally built applications, defined processes and tools are available that enable developers to identify inputs that can catch all potential errors and corner cases.

An example of a decision tree. Source:

Machine learning models receive inputs that have been pre-processed and then vectorized into fixed-size structures. Vectorization is the process of converting an input (such as an image, piece of text, audio signal, or game state) into a set of numerical values, often in the form of an array, matrix (two-dimensional array), or tensor (multi-dimensional array). Bugs may be introduced into the code that interfaces the model with external sources or performs vectorization; these may find their way in via code invoking machine learning methods implemented in popular libraries, or may be introduced in decision-making logic. Detecting such bugs is non-trivial.

Based on SHERPA partners’ experiences and knowledge gained while working in the field, we recommend following these guidelines while planning, building and utilizing machine learning models, so that they function correctly and do not exhibit bias:

Understand the problem domain

  • Familiarize yourself with basic guidelines and practices in the field of machine learning.
  • Understand how different machine learning techniques work, how they can be used, and what their limitations are.
  • Understand your problem domain and whether the problem is even possible to solve using machine learning techniques.
  • Research and read up on similar work in your problem area. Understand the methodologies that were used to solve the problem. Pay close attention to any experiments detailed in the research, and how they were conducted.

Prepare your training data

  • Understand that a lot of published work is based on standard, well-labelled academic datasets. If your model requires a training set that is not one of these, understand the steps that will be required to create a good training set for your purposes.
  • Evaluate whether the inputs you have available to you are relevant to the task you wish to accomplish.
  • If you need to create your own labelled set, propose methods to accurately label the dataset, and to validate the accuracy of the labels.
  • If your process includes choosing features for a model’s input, think about whether those features might contribute to social bias (e.g. the use of race, gender, religion, age, country of origin, home address, area code, etc. as inputs). If you do choose a feature that is known to introduce social bias, be prepared to explain why that input is relevant to your process, and why it won’t introduce social bias.

Design your model

  • Start by prototyping your own model based on an existing model that was used for similar purposes. Experiment with changing the model’s architecture and parameters during prototyping. Get a feeling for the amount of training that might be required across your own dataset, and the accuracy and other important model characteristics you might achieve, based on your prototypes.
  • Check your prototype models early against real-world data, if possible. Start iteratively improving your training set along with your model.
  • Once you’ve settled on an architecture, inputs, and a well-rounded training set, use automation to explore model parameters (such as random search).
  • Check for overfitting. If it is a problem, try to understand what is causing it, and take appropriate measures to alleviate it.

Implement production processes

  • Use a bias detection framework or develop your own methodology to explore potential bias and accuracy issues on your trained model, during development, to pinpoint and fix issues. Be prepared to provide details on the steps taken to remove bias and inaccuracies from your model.
  • Have defined processes in place to quickly fix any issues found in your model.
  • Consider implementing a process that can allow third parties to audit your model.
  • Strongly consider implementing mechanisms that enable your model to explain how it made each decision. Note that explainability can sometimes trade-off with model quality, so care should be taken.
  • If you’re doing work in the NLP domain, check for biases that might be introduced by word vectors. Consider using unbiased word vectors such as those being developed in projects such as ConceptNet.

The above guidelines do not include measures that designers might want to take to safeguard machine learning models from adversarial attacks. Adversarial attack techniques and mitigations against them are discussed in later articles in this series.

It is worth noting that design decisions made at an early stage of a model’s development will affect the robustness of the systems powered by that model. For instance, if a model is being developed to power a facial recognition system (which is used in turn to determine access to confidential data), the model should be robust enough to differentiate between a person’s face and a photograph. In this example, the trade-off between model complexity and efficiency must be considered at this early stage.

Some application areas may also need to consider the trade-off between privacy and feature set. An example of such a trade-off can be illustrated by considering the design of machine learning applications in the cyber security domain. In some cases, it is only possible to provide advanced protection capabilities to users or systems when fine-grained details about the behaviour of those users or systems are available to the model. If the transmission of such details to a back-end server (used to train the model) is considered to be an infringement of privacy, the model must be trained locally on the system that needs to be protected. This may or may not be possible, based on the resources available on that system.


There are many variables that can affect the outcome of a machine learning model. Designing and training a successful model often takes a great deal of time, effort, and expertise. There’s a lot of competition in the machine learning space, especially with regards to implementing novel services based on existing machine learning techniques. As such, shortcuts may be taken in the design and development of the machine learning models powering services that are rushed to market. These services may contain flaws, biases, or may even attempt to do things they shouldn’t. Due to the inherent complexity of the field, it is unlikely that lay people will understand how or why these services are flawed, and simply use them. This could lead to potentially unwelcome outcomes.

This concludes the first article in this series. The next article explores how machine-learning techniques and services that utilize machine learning techniques might be used for malicious purposes.

Security Issues, Dangers, And Implications of Smart Information Systems

F-Secure is participating in an EU-funded Horizon 2020 project codenamed SHERPA (as mentioned in a previous blog post). F-Secure is one of eleven partners in the consortium. The project aims to develop an understanding of how machine learning will be used in society in the future, what ethical issues may arise, and how those issues might be addressed.

One of the initial aims of the project was to develop an understanding of how machine learning is being used at present, and extrapolate that baseline into a series of potential future scenarios (a look at what things might be like in the year 2025). Some of the scenarios are already online (and make for some interesting reading). Examples include the use of machine learning in education, policing, warfare, human assistance, and transport. Some of the other SHERPA deliverables, such as a series of case studies, have also already been published.

F-Secure are the technical partner in this project and, as such, we are there to provide technical advice to the other partners (such as explanations on how machine learning methodologies work, what can and can’t be done with these methodologies, and how they might be improved or innovated on in the future). As part of this project, we also aim to propose technical solutions to address some of the ethical concerns that are discovered.

One of F-Secure’s first tasks in this project was to conduct a study of security issues, dangers, and implications of the use of data analytics and artificial intelligence, which included assessing applications in the cyber security domain. The research project primarily examined:

  • ways in which machine learning systems are commonly mis-implemented (and recommendations on how to prevent this from happening)
  • ways in which machine learning models and algorithms can be adversarially attacked (and mitigations against such attacks)
  • how artificial intelligence and data analysis methodologies might be used for malicious purposes

The output of this task was a report that was written in collaboration with our partners. The document covers both technical and ethical implications of machine learning technologies and uses as they exist today, with some minor extrapolation into the future. The full document can be found here. It is quite a lengthy read, so we’ve decided to post a short series of articles that contain excerpts. There are four articles in this series (in addition to this introduction). Each article covers a different section in the final report. We’ve opted to keep this series technical, so if you’re interested in reading the ethical findings, you can find them in the original document. The articles cover the following topics:

  • Bad AI
    • This article details the types of flaws that can arise when developing machine learning models. It includes some advice on how to avoid introducing flaws while developing your own models.
  • Malicious use of AI
    • This article explores how machine-learning techniques and services that utilize machine learning techniques might be used for malicious purposes.
  • Adversarial attacks against AI
    • This article explains how attacks against machine learning models work, and provides a number of interesting examples of potential attacks against systems that utilize machine learning methodologies.
  • Mitigations against adversarial attacks
    • This article explores currently proposed methods for hardening machine learning systems against adversarial attacks.

Each article contains a link to the next, so if you wish to read the series in sequence, just follow the link at the end of this article. If one particular topic is  of interest to you, just follow one of the links above. Of course, you can also download the entire document for reading offline by going here. Enjoy!



Yesterday, a colleague of mine, Eero Kurimo, told me about something odd he’d seen on Twitter. Over the past few days, a number of pictures of cute puppies had shown up on his timeline as promoted tweets. Here’s an example:

“Mainostettu” is the Finnish word Twitter uses to denote that a tweet has been promoted. Eero checked a few of the promoted tweets to find out why they had shown up on his timeline. One of them was targeted at people over 35 years of age living in Finland:

Another had been promoted at men living in Finland.

Humorously, Twitter didn’t translate the word “men” from English to Finnish.

Clearly someone is buying ads on Twitter to promote tweets that contain pictures of puppies. We observed four different accounts promoting tweets in this manner, and in each case, the promoted tweet contained nothing more than an image. As we were digging around the Twitter interface, we noticed that, as we were typing one of the account names (“SabrinaTaranti”) into the search dialog, Twitter auto-suggested a number of accounts with different numbers appended to the end of the username (“SabrinaTaranti1”, “SabrinaTaranti2”, “SabrinaTaranti3”, etc.) Some of those accounts still existed. Others had been suspended. All of the related accounts we found contained a profile picture, and a written description. They were also all created in April 2019. Some of these accounts followed others. This was interesting enough to do a bit more digging.

I wrote a script to look at the followers and following of the users we identified, plus the multiple “SabrinaTaranti…” users, searching for other accounts that:

  • were created during April 2019
  • contained a location field
  • contained a description
  • contained a profile picture
  • whose last tweet was an image with no text

The script also contained logic to search for other accounts with numbers at the end of the username. If a username ended with a one- or two-digit number, the script attempted to locate accounts ending in all numbers from 1 to 30.

Based on the above criteria, I found a total of 65 active accounts of interest. All of them were set up to look like accounts owned by female users, with US-sounding names, and US-based locations.

Here are their profile pictures:

Some of the accounts had duplicated description fields. Here are some examples of the descriptions used:

And here is a collage of the images posted in their most recent tweets:

My script also found 113 related accounts that Twitter has already suspended:

Additionally, I found 56 accounts similar to the above, that weren’t suspended, but that weren’t behaving in a similar manner to the other sock puppet accounts. For instance, they didn’t have profile pictures, descriptions, or locations set, or their last tweet wasn’t a picture with no text. These accounts, however, had the same exact “name” field as the others. Obviously, whoever created these sock puppets has decided to create multiple identical personas, perhaps to evade mass suspension. Here’s an example of a few of those other accounts:

The 65 sock puppet accounts identified haven’t published many tweets (between 8 and 28). In almost all cases, these accounts started their lives retweeting travel-related content (from accounts such as @NatGeoTravel), and motivational/inspirational quotes before switching to tweeting pictures of cats and dogs (with no text). Some accounts have retweeted content in other languages, suggesting that perhaps these sock puppets are being used to target Twitter users in other countries via a similar tweet promotion scheme.

As to why the owner of these sock puppet accounts has been paying to promote pictures of puppies, my guess is that they’re doing it to gain engagement and followers, in order to make the accounts more “legit”. Since all of the sock puppet accounts contain female avatars, I would imagine that the choice to target male Twitter users is to gather followers. One might speculate that people over 35 are more likely to engage with pictures of puppies, and hence that was the reason to target people over 35. Finland probably has a rather small Twitter user base (it isn’t a very popular platform here), and hence putting out ads to target certain demographics across the whole of Finland probably doesn’t cost much money.

Due to the inexpensive and relatively inconspicuous nature of this targeted ads campaign, it might just be an experiment to see what works, what doesn’t, and what gets accounts suspended by Twitter’s automation.

Whatever the sock puppet’s owners are trying to do, they’ve somewhat succeeded – people are liking and retweeting the pictures, and some of those accounts have gained followers.

I have no idea what these accounts will evolve into. They may start promoting goods and services, they may be used to perpetrate scams, or they could even be used to spread disinformation. Whatever does end up happening, this whole thing smells really fishy.


Live Coverage Of A Disinformation Operation Against The 2019 EU Parliamentary Elections

I recently worked with investigative journalists from Yle, attempting to uncover disinformation on social media around the May 2019 European elections. This work was also part of F-Secure’s participation in the SHERPA project, which involves developing an understanding of adversarial attacks against machine learning systems – in this case, recommendation systems on social networks. My contribution to the project was to analyze Twitter traffic for manipulation and poisoning attacks.

In the initial phase of the project we ran a script to capture tweets matching any of the official EU election hashtags. We collected data over the course of about 10 days. While analyzing the collected data, we stumbled upon some highly political tweets that received a great deal of suspiciously inorganic engagement – the tweets had received more retweets than likes. Here’s an example of one of them:

Replies to that tweet show that it was a lie. The incident in the video didn’t happen in Italy, but the tweet attempted to present it that way. It worked – thousands of far-right accounts retweeted it.

Here’s another example. A few days ago this tweet had received over 6000 retweets.


While writing this article, both of the tweets above were deleted by the accounts’ owners.

The suspicious tweets we identified originated from two accounts – NewsCompact and PartisanDE. These accounts link to each other in their descriptions. For instance,  PartisanDE identifies itself as the CEO of NewsCompact. The language used in tweets published by these accounts is clearly not written by native English speakers. Both accounts receive a great deal of engagement from far-right Twitter profiles (many of which I’m familiar with from past research). They share a great deal of politically inflammatory content, racist content, and mistruths. The accounts also share many URLs to sources such as VoiceOfEurope, RT, Sputnik, and some other non-authoritative “news” domains.

Both accounts follow, and are followed by many very new Twitter accounts. For example, here’s a histogram of the ages of accounts following PartisanDE. The graph below shows that a great deal of accounts created during the last year follow PartisanDE. The graph for NewsCompact is very similar.

By the looks of it, both accounts readily follow back accounts that follow them. As such, they both follow very many new accounts, and the age distributions graphs of accounts they follow look very similar to the above. There is a clear overlap between their followers and followed.

PartisanDE is a relatively new account (about a year and a half old), whereas NewsCompact is six years old, and has built up a following, apparently in spurts, during that time, as illustrated below.

We decided to dig a little deeper into what these accounts were doing, so we set up a script to follow just the two accounts. The script received tweets from those accounts, and tweets where either of the accounts were mentioned (retweeted, replied to, quoted, or @mentioned). We allowed the script to run for a few days, and then took a look at the data it had gathered.

Visualizing the retweets interactions captured by our script shows that the accounts received engagement from largely separate sources, with a few (rather high volume) accounts (in between the two in the visualization below) interacting with both.

Mapping interactions with tweets gives this visualization:

In the above, tweets are denoted by the nodes with 8-digit numerical labels. Tweet IDs are much longer, so the node labels only contain the last 8 digits of the actual tweet ID (for readability). The thick blue lines denote tweets from PartisanDE. The thick brown lines denote tweets from NewsCompact. This visualization shows that, by-and-large, each tweet received engagement from a separate group of users, with only slight overlap – i.e. each published tweet received engagement (retweets) from a separate, unique group of accounts. This is especially obvious for the pink, green, and orange portions of the above graph.

In addition to plenty of engagement from “completely legitimate” pro-leave, football fan, father of two sons, love Europe, hate the EU profiles, we found that plenty of content from these two accounts was also being amplified by accounts that looked like they shouldn’t be engaging with that content. In one example, we witnessed accounts with Korean, Japanese, Arabic, and continential European language profiles, and accounts self-identifying as US Trump supporters retweeting content about Tommy Robinson. We took a look at the details of all accounts that had engaged with NewsCompact and PartisanDE in the collected data, and found quite a variety of Twitter profiles. Here are some examples.

We found several Korean language accounts that have published a very large number of tweets, like this one:

We found several Japanese accounts, including this one that that likes to frequently retweet McDonalds Japan, and content about Freemasons:

We found accounts that tweet a lot about crypto currencies, like this one:

We found this sockpuppet account that has a Finnish-looking name, is supposedly based in Paris. The account retweets content in a huge variety of languages.

The avatar picture on the JukkaIsorinne account successfully reverse-image searched directly on Google.

We also found this account that identifies as Indian, and has a keen interest in Russian military hardware:

Hopefully the above examples paint a picture of the variety of accounts that engage with NewsCompact and PartisanDE. All of these accounts retweeted highly political content related to the EU. However, this isn’t really the interesting part of the story. By sorting the list of accounts in our dataset by creation date, we found 200 accounts that were created on the 20th and 21st May 2019 (this Monday and Tuesday, so two days ago at time of writing). I uploaded a full list of the user IDs of these accounts to my github repo. Here’s a small random selection of them:

All of these accounts retweeted the same tweet from PartisanDE. The tweet includes a video stamped with the logo (the official campaign operation for the “leave” side of Brexit during the 2016 referendum):

As you may have noticed from the image above, at the time of writing, this tweet had received 388 retweets. Two hundred of those retweets came from the new accounts identified in this research. Most of the new accounts were similar. They mostly didn’t have followers or accounts following them, and had only published a handful of tweets. Here’s one example:

The accounts retweeted a variety of completely different content, and it is fairly obvious that they belong to a “purchase retweets” service. This indicates that PartisanDE paid for a service to retweet at least one of their politically motivated tweets at least 200 times.

Out of interest, we decided to check what else these 200 accounts had retweeted. Here are some of the other accounts they’ve retweeted in the past two days:

The list above includes verified accounts (such as justinsuntron, paulwrblanchard, and Mark_Beech), accounts selling products (inlcuding books and health supplements), people in the music industry, a fashion blogger, finance “experts”, “social influencers” (surprise, surprise), and even “CEOs”. All of the below are purchasing retweets from the same company as PartisanD. Apparently verified accounts can freely purchase retweets (and probably followers) with impunity.

We also found this interesting account that was created two days ago has already published almost 800 far-right tweets.

NewsCompact and PartisanDE were both in the top three most engaged accounts in the EU election conversation space on Twitter two weeks ago. This blog post conclusively illustrates that these two accounts are heavily fabricating engagement and, at least PartisanDE is also purchasing retweets. “Twitter Marketing” services that allow users to pay for retweets don’t seem to care about the political implications of the services they provide.

Since both accounts post content that is far-right, racist, and highly partisan, one has to wonder what effect they’ve had on the EU election conversation space on Twitter over the past few weeks. I wouldn’t be surprised if what I’ve found is just the tip of the iceberg.







Spam Trends: Top attachments and campaigns

Malware authors tend to prefer specific types of file attachments in their campaigns to distribute malicious content.  During our routine threat landscape monitoring in the last three months, we observed some interesting patterns about the attachment types that are being used in various campaigns.

In February and March, we saw huge spam campaigns using ZIP files to send out GandCrab ransomware, and  DOC and XLSM files to distribute Trickbot banking trojan. In the same time period, we saw a similarly large campaign targeting American Express, and a ‘Winner’ scam, both using PDF file attachments.

We also noticed a new trend of disc image files (ISO and IMG) being used to spread malware, with a few small campaigns distributing AgentTesla InfoStealer and NanoCore RAT.

To give some background or context, our spam feeds show that malware authors do use a variety of attachment types:

When we view the feeds as a time chart however, it’s clear that ZIPs, PDF, and MS office files such as DOC and XLSM file attachments were more commonly used in huge spam campaigns.

ZIP files deliver GandCrab ransomware

In February and March, there were huge spam campaigns using ZIP files to deliver GandCrab ransomware. The files were designed to appear to be sending a photo to someone.

The ZIP contains a obfuscated JavaScript downloader, which executes a PowerShell script that downloads and executes the GandCrab ransomware binary.

If the payload is successfully downloaded and executed, it then encrypts the victim’s machine and displays a ransomware note:


DOC and XLSM files deliver Trickbot

In March, there were also huge spikes in spam campaigns using DOC and XLSM files to deliver Trickbot – a modular banking trojan that is also capable of delivering other payloads we’ve been seeing before.

The office doc attachments contain a malicious macro which downloads and executes the payload using bitsamin tool.

On successful download and execution, the Trickbot sample starts execution and creates modules on the victim’s machine:

PDF files used for phishing targeting American Express

One of the highest spikes in the graph that used PDF is a phishing campaign targeting American Express during March.

When the PDF file is opened, it shows a link that leads the user to a “secure message” pretending to be from the American Express Business Card Customer Security Team.

The link leads the victim to a shortened URL ( from GoDaddy – a trick many other phishing campaigns have been using to steal banking credentials. A recent example from another campaign using the similar shortened URL is a phishing link targeting Bank of America.

PDF files used for ‘Winner’ scam 

The second-highest campaign that uses a PDF file attachment is a “Winner” scam from Google as shown below:

The scam asks the victim to provide personal details such as full name, address, country/nationality, telephone/mobile number, occupation, age/gender, and private email address.

(Newly rising players) ISO and IMG: AgentTesla and NanoCore RAT

Though it does not produce the spikes in certain file types seen in the spam campaigns mentioned above, since July 2018 we’ve also noted an increasingly popular trend of attackers using disc image files to deliver malware. We have seen campaigns using this technique delivering AgentTesla InfoStealer and NanoCore RAT.

Interestingly, we also have seen a recent spam campaign delivering two types of attachments: A malicious office doc and ISO image file – both installs an AgentTesla infostealer.

The malicious doc will execute a macro to download and execute the payload.

While the ISO file contains the malicious binary inside.

Regardless of which of the two attachment types the victim chooses to open, either will install AgentTesla – an infostealer that is capable of collecting the victim’s system information and credentials from popular installed software such as browsers, email clients, and ftp clients.

F-Secure customers are protected as we block all the detected threats even at early stages of infections by DeepGuard.

Indicators of Compromise (SHA1s)



American Express Phishing:

Google Scam:



Discovering Hidden Twitter Amplification

As part of the Horizon 2020 SHERPA project, I’ve been studying adversarial attacks against smart information systems (systems that utilize a combination of big data and machine learning). Social networks fall into this category – they’re powered by recommendation algorithms (often based on machine learning techniques) that process large amounts of data in order to display relevant information to users. As such, I’ve been trying to determine how attackers game these systems. This post is a follow-up to F-Secure’s recent report about brexit-related amplification.

In this article, I’d like to share methodology I’ve been developing to observe “behind the scenes” amplification on Twitter. I will illustrate how I have applied this methodology in an attempt to discover political disinformation around pro-leave brexit topics, and to hopefully further clarify how much overlap exists between accounts promoting far-right ideology in the US, and accounts pushing pro-leave ideology in the UK.

Step 1: Collect a list of candidate accounts

I wrote a simple crawler using the Twitter API. It does the following:

  • Pull a new target account from a queue of targets
  • Gather Twitter user objects of the accounts the target is following
  • For each user object, run a set of string matches and regular expressions against text fields (name, description, screen_name)
    • In this experiment, string searches included things like “yellowvest”, “hate the eu”, “istandwithtommy”, “voted brexit”, “qanon”, and “maga”.
  • Check the list of accounts followed against a pre-compiled list of roughly 500 influencer accounts. This list was collected over several months by hand-visiting Twitter accounts identified through data analysis and manual research.
  • If any of the above matched, add the account name to the queue (if it isn’t already on it, and if we haven’t already queried the account), and save the user object for later processing steps.
  • Also, save a node-edge graph representation of the network, as it is crawled
  • Repeat

The crawler was seeded with a couple of random troll accounts I found while browsing Twitter.

As with any Twitter account crawl, the process never finishes – the length of the queue grows faster than it can be consumed. I ran the crawler for just a couple of days (between April 2nd and 3rd 2019), during which it collected about 260,000 “interesting” Twitter user objects. For this experiment, I filtered the objects into a few different groups:

  • Filtering by account creation date, in order to find recently created accounts. (during, or after March 2019). This yielded a list of just over 2,000 accounts.
  • By filtering on strings in description and name fields, I collected lists of accounts self-associating with Tommy Robinson. This yielded a list of just over 900 accounts.
  • By filtering based on who the accounts were following, I collected a list of accounts that followed at least 10 of the pre-compiled list of influencer accounts. This yielded a list of just over 23,000 accounts.

Step 2: Observe candidate account behaviour

I wrote a second script that uses Twitter API’s statuses/filter functionality to follow the activity of a list of candidate accounts collected in the previous step. I ran this script with the first two candidate lists – Twitter’s standard API allows a maximum of 5,000 accounts to be followed in this way. Full tweet objects were pre-processed (abreviated to include only metadata I was interested in looking at), and written sequentially to disk for later processing. I allowed this script to run for a few days (between April 2nd and 3rd 2019), in order to capture a representative sample of the candidate accounts’ activities. I then used Python and Jupyter notebooks to analyze the collected data.


The “new accounts” group tweeted about a variety of topics, and retweeted accounts from both the US and UK. Here’s a graph visualization of their activity:

Names in larger fonts indicate accounts that were retweeted more often.

In comparison, the list of “tommy” accounts promoted very samey content, giving rise to this rather awkward visual:

When analyzing the collected data, I imediately noticed that both groups were promoting a few almost brand new accounts. Here’s the most prominent of them – BringUkip. Notice how the account’s pinned tweet doesn’t even spell UKip party leader Gerard Batten’s name correctly. I suspect this isn’t an official UKIP account.

During the (less than two days) collection period, close to 850 of the 2000 accounts in the “new accounts” group retweeted BringUkip almost 1800 times. In that same time span, over 600 of the approximately 900 accounts in the “tommy” group retweeted BringUkip close to 1000 times.

Here are some of the other brand new accounts being amplified by the “new accounts” group:

Some really surprising phenomena can be observed in the above chart. For instance, Fish_171, an account that was created on March 2nd 2019 (almost exactly a month ago) has over 12,000 followers (and follows almost 11,000 accounts). WillOfThePpl11, an account that was created on March 16th 2019 (about two weeks ago at time of writing) has published over 10,000 Tweets.

Here’s a similar chart, but for the “tommy” group. Not as much new account amplification was being done by those accounts:

Here are a few of the other new accounts being promoted by these groups. Red26478680 was created on March 30th 2019 (4 days ago at time of writing).

I happened to spot this account while browsing Twitter on my way into work this morning. It was promoting itself in a reply to Breaking911 (one of the accounts on my list of influencers):

Another hotly promoted account is HomeRuleNow, an account that was created on March 29th 2019 (5 days ago).

This account self-identifies with #Bluehand – a group that Twitter appears to have been actively clamping down on. This account accrued over 1000 followers in the five days since it was created.

Here’s one more – AGirlToOne:

This account was created on March 23rd 2019 (just over a week ago). It was retweeted by over 1000 of the approximately 2000 “new accounts” users during the collection period.

While browsing through a random selection of timelines in the “new accounts” group, I found some interesting things. Here is a tweet from one of the users (ianant4) discussing how to evade Twitter’s detection mechanisms:

I also found eggman25503141:

This account is quite bot-like. All tweets have the same format, starting with “YOU WONT SEE THIS ON #BBCNEWS”, an extra line break, and then a sentence, followed by a few pictures. All of this account’s tweets promote extreme anti-islamic content. Most of this account’s tweets have received no engagement. Except this one:

The above tweet received, at the time of capture, 48 replies, 374 retweets, and 254 likes. Many of the accounts collected in the first step of my collection process participated in retweeting the above.

For fun, here’s one of the seed accounts used.

The methodology outlined in this article works well due to the fact that the studied demographic uses plenty of self-identifying keywords in their Twitter descriptions. For other subject matter, gathering a candidate list may be more problematic.

Given that the account list I gathered was based on following-follower relationships between Twitter accounts, it is clear that many pro-brexit Twitter accounts and MAGA accounts follow each other. The content these accounts promote is, however, somewhat separate, and dependent on the account’s persona (MAGA versus brexit). Some accounts I looked at promoted both types of content, but they were in the minority. Most of the accounts observed in this research were/are being operated, at least in part, by actual human beings. In between heavily retweeting content, these accounts occasionally publish original tweets, converse with each other, and troll other people. I believe the results of this piece of research strongly prove the existence of a well-organized and potentially large network of individuals that are creating and operating multiple Twitter accounts in order to purposefully promote political content directly under our noses.

Mira Ransomware Decryptor

We investigated some recent Ransomware called Mira (Trojan:W32/Ransomware.AN) in order to check if it’s feasible to decrypt the encrypted files.

Most often, decryption can be very challenging because of missing keys that are needed for decryption. However, in the case of Mira ransomware, it appends all information required to decrypt an encrypted file into the encrypted file itself.

Encryption Process

The ransomware first initializes a new instance of the Rfc2898DeriveBytes class to generate a key. This class takes a password, salt, and iteration count.


The password is generated using the following information:

  • Machine name
  • OS Version
  • Number of processors


The salt, on the other hand, is generated by a Random Number Generator (RNG):


The ransomware then proceeds to use the Rijndael algorithm to encrypt files:


After encryption, it appends a ‘header‘ structure to the end of the file.

This header conveniently contains the salt and the password hash. In addition to that, the iteration count is hard-coded into the sample, in this case, the value was 20.

By retrieving the password, salt, and the iteration count from the ransomware itself, we were able to obtain all the information needed to create a decryption tool for the encrypted files.

Decryption tool

You can download our decryption tool from here.

Here’s a video of how you can use our tool:


A Hammer Lurking In The Shadows

And then there was ShadowHammer, the supply chain attack on the ASUS Live Update Utility between June and November 2018, which was discovered by Kaspersky earlier this year, and made public a few days ago. In short, this is how the trojanized Setup.exe works: An executable embedded in the Resources section has been overwritten by […]


Analysis of LockerGoga Ransomware

We recently observed a new ransomware variant (which our products detect as Trojan.TR/LockerGoga.qnfzd) circulating in the wild. In this post, we’ll provide some technical details of the new variant’s functionalities, as well as some Indicators of Compromise (IOCs). Overview Compared to other ransomware variants that use Window’s CRT library functions, this new variant relies heavily […]


Analysis Of Brexit-Centric Twitter Activity

This is a rather long blog post, so we’ve created a PDF for you to download, if you’d like to read it offline. You can download that from here. Executive Summary This report explores Brexit-related Twitter activity occurring between December 4, 2018 and February 13, 2019. Using the standard Twitter API, researchers collected approximately 24 […]


Why Social Network Analysis Is Important

I got into social network analysis purely for nerdy reasons – I wanted to write some code in my free time, and python modules that wrap Twitter’s API (such as tweepy) allowed me to do simple things with just a few lines of code. I started off with toy tasks, (like mapping the time of […]


NRSMiner updates to newer version

More than a year after the world first saw the Eternal Blue exploit in action during the May 2017 WannaCry outbreak, we are still seeing unpatched machines in Asia being infected by malware that uses the exploit to spread. Starting in mid-November 2018, our telemetry reports indicate that the newest version of the NRSMiner cryptominer, […]


Phishing Campaign targeting French Industry

We have recently observed an ongoing phishing campaign targeting the French industry. Among these targets are organizations involved in chemical manufacturing, aviation, automotive, banking, industry software providers, and IT service providers. Beginning October 2018, we have seen multiple phishing emails which follow a similar pattern, similar indicators, and obfuscation with quick evolution over the course […]


Ethics In Artificial Intelligence: Introducing The SHERPA Consortium

In May of this year, Horizon 2020 SHERPA project activities kicked off with a meeting in Brussels. F-Secure is a partner in the SHERPA consortium – a group consisting of 11 members from six European countries – whose mission is to understand how the combination of artificial intelligence and big data analytics will impact ethics […]


Spam campaign targets Exodus Mac Users

We’ve seen a small spam campaign that attempts to target Mac users that use Exodus, a multi-cryptocurrency wallet. The theme of the email focuses mainly on Exodus. The attachment was “” and the sender domain was “update-exodus[.]io”, suggesting that it wanted to associate itself to the organization. It was trying to deliver a fake Exodus […]


Value-Driven Cybersecurity

Constructing an Alliance for Value-driven Cybersecurity (CANVAS) launched ~two years ago with F-Secure as a member. The goal of the EU project is “to unify technology developers with legal and ethical scholars and social scientists to approach the challenge of how cybersecurity can be aligned with European values and fundamental rights.” (That’s a mouthful, right?) […]


Taking Pwnie Out On The Town

Black Hat 2018 is now over, and the winners of the Pwnie Awards have been published. The Best Client-Side Bug was awarded to Georgi Geshev and Rob Miller for their work called “The 12 Logic Bug Gifts of Christmas.” Georgi and Rob work for MWR Infosecurity, which (as some of you might remember) was acquired by F-Secure […]