Unicode Phishing Domains Rediscovered

There is a variant of phishing attack that nowadays is receiving much attention in the security community. It’s called IDN homograph attack and it takes advantage of the fact that many different Unicode characters look alike. The use of Unicode in domain names makes it easier to spoof websites as the visual representation of an internationalized domain name in a web browser may appear indistinguishable to the legitimate site. For example, Unicode character U+0117 which is Latin small letter E with dot above, looks similar to Latin small letter E in ASCII. Hence it is possible to register domain such as labsblog.xn--f-secur-z8a.com which is equivalent to labsblog.f-securė.com.

This topic has already been thoroughly discussed. Security researchers have had been warning about it for over a decade, but it has only relatively recently gained more attention – also from the bad guys. To trace this dangerous trend, we’re going to use a combination of DNS reconnaissance tool dnstwist (which I created some time ago) as well as some command line kung fu to gather and analyze all the information we find.

Grabbing the data

We will start by pulling a list of the most popular websites worldwide published by Alexa Internet. This seems to be a good representative group because the very top of them should be a tempting target for phishing attacks.

The ZIP file contains a million of domain names so we’ll just narrow that down to a reasonable scope of 100. This will give us something that looks like this.

Grabbing Alexa Top100

Finding Unicode phishing domains

We will use dnstwist which provides a convenient way for generating domain name variations using a range of techniques including Unicode homograph attack. The idea is quite straightforward. The tool will use previously prepared list of 100 domains as a seed, generate a list of potential phishing domains and then query WHOIS servers for registration dates.

dnstwist data extracted

An hour later we have 100 files named with the corresponding domain names. Since we’re focusing on Unicode domains we need to filter out domain names which when encoded with Punycode start with xn-- string. This data is comma delimited so we cut out the column with registration date. Finally we group it by year and count the number of occurrences in order to plot a nice graph.


The data collected clearly shows that attackers have been using Unicode-based domains for a long time.

Unicode phishing domains by month

The top three phishing targets are Google, Facebook and Amazon.

Unicode phishing domains by target

Due to the fact that the life span of a phishing domain is rather short and the lack of data from a wider period it is difficult to demonstrate a clear upward trend. However, given the recent interest in the subject, it can be assumed that attacks of this nature will occur more often.

Side note

At the time of conducting this research, we inadvertently discovered a domain running an active phishing site that seems to target Facebook users in China. We have notified Facebook’s security team about this incident.

Facebook .cn phish

Articles with similar Tags