I got into social network analysis purely for nerdy reasons – I wanted to write some code in my free time, and python modules that wrap Twitter’s API (such as tweepy) allowed me to do simple things with just a few lines of code. I started off with toy tasks, (like mapping the time of day that @realDonaldTrump tweets) and then moved onto creating tools to fetch and process streaming data, which I used to visualize trends during some recent elections.
The more I work on these analyses, the more I’ve come to realize that there are layers upon layers of insights that can be derived from the data. There’s data hidden inside data – and there are many angles you can view it from, all of which highlight different phenomena. Social network data is like a living organism that changes from moment to moment.
Perhaps some pictures will help explain this better. Here’s a visualization of conversations about Brexit that happened between between the 3rd and 4th of December, 2018. Each dot is a user, and each line represents a reply, mention, or retweet.
Tweets supportive of the idea that the UK should leave the EU are concentrated in the orange-colored community at the top. Tweets supportive of the UK remaining in the EU are in blue. The green nodes represent conversations about UK’s Labour party, and the purple nodes reflect conversations about Scotland. Names of accounts that were mentioned more often have a larger font.
Here’s what the conversation space looked like between the 14th and 15th of January, 2019.
Notice how the shape of the visualization has changed. Every snapshot produces a different picture, that reflects the opinions, issues, and participants in that particular conversation space, at the moment it was recorded. Here’s one more – this time from the 20th to 21st of January, 2019.
Every interaction space is unique. Here’s a visual representation of interactions between users and hashtags on Twitter during the weekend before the Finnish presidential elections that took place in January of 2018.
And here’s a representation of conversations that happened in the InfoSec community on Twitter between the 15th and 16th of March, 2018.
I’ve been looking at Twitter data on and off for a couple of years now. My focus has been on finding scams, social engineering, disinformation, sentiment amplification, and astroturfing campaigns. Even though the data is readily available via Twitter’s API, and plenty of the analysis can be automated, oftentimes finding suspicious activity just involves blind luck – the search space is so huge that you have to be looking in the right place, at the right time, to find it. One approach is, of course, to think like the adversary. Social networks run on recommendation algorithms that can be probed and reverse engineered. Once an adversary understands how those underlying algorithms work, they’ll game them to their advantage. These tactics share many analogies with search engine optimization methodologies. One approach to countering malicious activities on these platforms is to devise experiments that simulate the way attackers work, and then design appropriate detection methods, or countermeasures against these. Ultimately, it would be beneficial to have automation that can trace suspicious activity back through time, to its source, visualize how the interactions propagated through the network, and provide relevant insights (that can be queried using natural language). Of course, we’re not there yet.
The way social networks present information to users has changed over time. In the past, Twitter feeds contained a simple, sequential list of posts published by the accounts a user followed. Nowadays, Twitter feeds are made up of recommendations generated by the platform’s underlying models – what they understand about a user, and what they think the user wants to see.
A potentially dystopian outcome of social networks was outlined in a blog post written by François Chollet in May 2018, which he describes social media becoming a “psychological panopticon”.
The premise for his theory is that the algorithms that drive social network recommendation systems have access to every user’s perceptions and actions. Algorithms designed to drive user engagement are currently rather simple, but if more complex algorithms (for instance, based on reinforcement learning) were to be used to drive these systems, they may end up creating optimization loops for human behavior, in which the recommender observes the current state of each target (user) and keeps tuning the information that is fed to them, until the algorithm starts observing the opinions and behaviors it wants to see. In essence the system will attempt to optimize its users. Here are some ways these algorithms may attempt to “train” their targets:
Chollet goes on to mention that, although social network recommenders may start to see their users as optimization problems, a bigger threat still arises from external parties gaming those recommenders in malicious ways. The data available about users of a social network can already be used to predict when a when a user is suicidal or when a user will fall in love or break up with their partner, and content delivered by social networks can be used to change users’ moods. We also know that this same data can be used to predict which way a user will vote in an election, and the probability of whether that user will vote or not.
If this optimization problem seems like a thing of the future, bear in mind that, at the beginning of 2019, YouTube made changes to their recommendation algorithms exactly because of problems it was causing for certain members of society. Guillaume Chaslot posted a Twitter thread in February 2019 that described how YouTube’s algorithms favored recommending conspiracy theory videos, guided by the behaviors of a small group of hyper-engaged viewers. Fiction is often more engaging than fact, especially for users who spend all day, every day watching YouTube. As such, the conspiracy videos watched by this group of chronic users received high engagement, and thus were pushed up the recommendation system. Driven by these high engagement numbers, the makers of these videos created more and more content, which was, in-turn, viewed by this same group of users. YouTube’s recommendation system was optimized to pull more and more users into a hole of chronic YouTube addiction. Many of the users sucked into this hole have since become indoctrinated with right-wing extremist views. One such user actually became convinced that his brother was a lizard, and killed him with a sword. Chaslot has since created a tool that allows users to see which of these types of videos are being promoted by YouTube.
Social engineering campaigns run by entities such as the Internet Research Agency, Cambridge Analytica, and the far-right demonstrate that social media advert distribution platforms (such as those on Facebook) have provided a weapon for malicious actors that is incredibly powerful, and damaging to society. The disruption caused by their recent political campaigns has created divides in popular thinking and opinion that may take generations to repair. Now that the effectiveness of these social engineering techniques is apparent, I expect what we’ve seen so far is just an omen of what’s to come.
The disinformation we hear about is only a fraction of what’s actually happening. It requires a great deal of time and effort for researchers to find evidence of these campaigns. As I already noted, Twitter data is open and freely available, and yet it can still be extremely tedious to find evidence of disinformation campaigns on that platform. Facebook’s targeted ads are only seen by the users who were targeted in the first place. Unless those who were targeted come forward, it is almost impossible to determine what sort of ads were published, who they were targeted at, and what the scale of the campaign was. Although social media platforms now enforce transparency on political ads, the source of these ads must still be determined in order to understand who’s being targeted, and by what content.
Many individuals on social networks share links to “clickbait” headlines that align with their personal views or opinions (sometimes without having read the content behind the link). Fact checking is uncommon, and often difficult for people who don’t have a lot of time on their hands. As such, inaccurate or fabricated news, headlines, or “facts” propagate through social networks so quickly that even if they are later refuted, the damage is already done. This mechanism forms the very basis of malicious social media disinformation. A well-documented example of this was the UK’s “Leave” campaign that was run before the Brexit referendum. Some details of that campaign are documented in the recent Channel 4 film: “Brexit: The Uncivil War”.
Its not just the engineers of social networks that need to understand how they work and how they might be abused. Social networks are a relatively new form of human communication, and have only been around for a few decades. But they’re part of our everyday lives, and obviously they’re here to stay. Social networks are a powerful tool for spreading information and ideas, and an equally powerful weapon for social engineering, disinformation, and propaganda. As such, research into these systems should be of interest to governments, law enforcement, cyber security companies and organizations that seek to understand human communications, culture, and society.
The potential avenues of research in this field are numerous. Whilst my research with Twitter data has largely focused on graph analysis methodologies, I’ve also started experimenting with natural language processing techniques, which I feel have a great deal of potential.
We don’t yet know how much further social networks will integrate into society. Perhaps the future will end up looking like the “Majority Rule” episode of The Orville, or the “Nosedive” episode of Black Mirror, both of which depict societies in which each individual’s social “rating” determines what they can and can’t do and where a low enough rating can even lead to criminal punishment.