New study: Phishing Landscape 2020

My colleagues Greg Aaron, Dr. Colin Strutt, Lyman Chapin and I have published a new research report, Phishing Landscape 2020: A Study of the Scope and Distribution of Phishing.

The report can be found at http://www.interisle.net/PhishingLandscape2020.html

Our goal in this study was to capture and analyze a large set of information about phishing attacks, to better understand how much phishing is taking place and where it is taking place, and to see if the data suggests better ways to fight phishing. We studied where phishers are getting the resources they need to perpetrate their crimes — where they obtain domain names, and what web hosting is used. We identify where additional phishing detection and mitigation efforts are needed and can identify vulnerable providers.

We collected URLs, domain names, IP addresses, and other data about phishing attacks from four widely used and respected threat data providers: the Anti-Phishing Working Group (APWG), OpenPhish, PhishTank, and Spamhaus. (We greatly appreciate the cooperation from these providers).

Over a three-month collection period, we learned about more than 100,000 newly discovered phishing sites.

Our major findings and conclusions are based on the data we collected:

  1. Most phishing is concentrated at small numbers of domain registrars, domain registries, and hosting providers.
  2. Phishers themselves register more than half of the domain names on which phishing occurs.
  3. Domain name registrars and registry operators can prevent and mitigate large amounts of phishing by finding and suspending maliciously registered domains.
  4. Registries, registrars, and hosting providers should focus on both mitigation and prevention.
  5. The problem of phishing is bigger than is reported, and the exact size of the problem is unknown. This is due to gaps in detection and in data sharing. The over-redaction of contact data in WHOIS is contributing to the under-detection problem.
  6. Sixty-five percent of maliciously registered domain names are used for phishing within five days of registration.
  7. New top-level domains introduced since 2014 account for 9% of all registered domain names, but 18% of the domain names used for phishing.
  8. About 9% of phishing occurs at a small set of providers that offer subdomain services.

The data set that we collected for this study is quite interesting. Key statistics:

  • 298,012 phishing reports. This is the number of URLs and domains that were added to the four feeds during the study period. Duplicates, i.e., URLs reported separately by one or more of the sources, were removed.
  • 122,092 phishing attacks. From the reports, we identified phishing site (a web location) that targeted a specific brand or entity. We call these "attacks" (the methodology describes how we identified attacks).
  • 99,412 unique domain names. This is the number of unique "registrations", e.g., second-level domain names and third-level domain names where the relevant registry offers third-level registrations (such as domain.co.uk).
  • 439 top-level domains. This is the number of TLDs where at least one phish was reported.
  • 414 registrars. This is the number of registrars that sponsored gTLD domains that were used for phishing. (A registrar is a businesses that processes domain registrations.
  • 2,169 different Autonomous Systems (AS). This is the number of Autonomous Systems where a phish was reported in at least one IP space delegated from the AS. 
  • 619 attacks on URLs that contained IPv4 addresses and no domain name.

We also identified 60,935 maliciously registered domain names. Of the 99,412 domains used for phishing, we identified 60,935 that we believe were registered maliciously, by phishers. The rest were “compromised domains,” owned by innocent parties on vulnerable hosting.

During our collection period, phishers targeted a whopping 684 brands. The phishing sites emulated 684 different entities. The most-attacked targets identified by our data sources were, in alphabetical order: Amazon, Apple, AT&T, Chase, Facebook, LinkedIn, Microsoft, Outlook (owned by Microsoft), PayPal, and WhatsApp. These top ten targets suffered 50% of the identified phishing attacks. 

One of our most important findings is that:

"Domain name registrars and registry operators are [ ] in an excellent position to find and
prevent the majority of phishing, which takes place on maliciously registered domains. It is possible
for registrars and registry operators to identify maliciously registered phishing domains with a high
degree of accuracy, often at the time of registration... Registrars also possess dispositive information that no one else does: the registrant’s identity (contact information, now mostly redacted in public WHOIS as allowed by a recent change in ICANN policy), the registrant’s payment information, the registrant’s IP address, and the
registrant’s purchase history."

From this finding, I'll suggest that

With great (or in this case, unique) power comes great responsibility. 

ICANN should contemplate carefully how its Whois policy has affected phishing and other cyber attack response and mitigation. Registrars and registries are now the only parties who can reliably access contact data in a timely manner (e.g., in minutes or hours). In our report, we describe proactive or preemptive measures that ICANN's contracted parties could adopt to quash some phishing attacks before harm is done. It's time to step up your game.