Anti-Malware and Anti-phishing

In new study Interisle Reveals Excessive Withholding of Internet WHOIS Data

My  Interisle colleagues, together with Greg Aaron, have completed an in-depth analysis of the effects of ICANN policy for WHOIS, a public lookup service that has until recently made it possible to identify who registered and controls a domain name. 

The European Union’s General Data Protection Regulation (GDPR), adopted in May 2018, restricted the publication of personally identifiable data in WHOIS. In response, the Internet Corporation for Assigned Names and Numbers (ICANN) established a new policy, allowing registrars and registry operators to redact (withhold) personally identifiable data from publication in WHOIS. The implementation of this policy has been widely criticized, in particular, for failing to discriminate between legal entities and natural persons and for failing to scope the application of redaction to parties operating or residing in the EU's "jurisdiction". This over-redaction is alleged to interfere with parties who have legitimate reasons to contact domain owners (e.g., to notify a victim of a phishing attack) or who are investigating the thousands of domains used daily to perpetrate fraud (business email compromise), extortion (ransomware) or to foment political unrest (state sponsored election interference) or social uncertainty (anti-science rhetoric). 

In 2013, ICANN commissioned NORC/University of Chicago to conduct a WHOIS Registrant Identification Study  Despite the obvious benefit of having more recent data to inform policy, ICANN avoided studying the "demographics" of domain name registrations but instead allowed its community to  develop policy with no answers to the following relevant and compelling questions:

  1. What percentage of gTLD domains have actual registrant data on record?
  2. What percentage of gTLD domains are under privacy/proxy services, and which services?
  3. What percentage of gTLD domains have contact data that is redacted/hidden under ICANN’s Temporary Specification?
  4. What percentage of gTLD domains have redacted contact data but are not subject to GDPR? 
  5. What percentages of gTLD registrants are natural versus legal persons? Of these, how many are inside versus outside the jurisdiction of the European Union?  What is the relative percentage of privacy/proxy use among legal persons?
  6. What are the percentages for gTLD domains registered for malicious purposes (cybercrimes such as malware and phishing)?

We adopted the NORC methodology terms of reference and conducted our own study to answer these questions in our WHOIS Contact Data Availability and Registrant Classification Study, where we also compare the answers to 1-6 above to the state that existed in early 2018, before the GDPR and ICANN’s ill-advised took effect.

Some takeaways from the study:

ICANN’s GDPR-driven policy has resulted in the redaction of contact data for 57% of all generic Top-level Domain (gTLD) names.

ICANN’s policy has allowed registrars and registry operators to hide much more contact data than is required by the GDPR—perhaps five times as much.

Including “proxy-protected” domains, for which the identity of the domain owner is deliberately concealed, 86.5% of registrants can no longer be identified via WHOIS—up from 24% before the ICANN policy went into effect.

The implications of this ICANN policy change are profound: consumers can no longer use WHOIS to confirm the identities of parties they may want to transact with on the Internet, it is harder for law enforcement personnel and security professionals to identify criminals and cybercrime victims, and brand owners face greater challenges defending misuse of their intellectual property.

We hope that our study provides policy decision makers, regulators, and legislators with the bases to make more informed policy or if need be to impose regulatory obligations to (i) continue to offer GDPR privacy protections to intended parties but to (ii) cease the needless suppression of contact data that is needed to maintain a secure and interoperable Internet.

New study: Phishing Landscape 2020

My colleagues Greg Aaron, Dr. Colin Strutt, Lyman Chapin and I have published a new research report, Phishing Landscape 2020: A Study of the Scope and Distribution of Phishing.

The report can be found at

Our goal in this study was to capture and analyze a large set of information about phishing attacks, to better understand how much phishing is taking place and where it is taking place, and to see if the data suggests better ways to fight phishing. We studied where phishers are getting the resources they need to perpetrate their crimes — where they obtain domain names, and what web hosting is used. We identify where additional phishing detection and mitigation efforts are needed and can identify vulnerable providers.

We collected URLs, domain names, IP addresses, and other data about phishing attacks from four widely used and respected threat data providers: the Anti-Phishing Working Group (APWG), OpenPhish, PhishTank, and Spamhaus. (We greatly appreciate the cooperation from these providers).

Over a three-month collection period, we learned about more than 100,000 newly discovered phishing sites.

Our major findings and conclusions are based on the data we collected:

  1. Most phishing is concentrated at small numbers of domain registrars, domain registries, and hosting providers.
  2. Phishers themselves register more than half of the domain names on which phishing occurs.
  3. Domain name registrars and registry operators can prevent and mitigate large amounts of phishing by finding and suspending maliciously registered domains.
  4. Registries, registrars, and hosting providers should focus on both mitigation and prevention.
  5. The problem of phishing is bigger than is reported, and the exact size of the problem is unknown. This is due to gaps in detection and in data sharing. The over-redaction of contact data in WHOIS is contributing to the under-detection problem.
  6. Sixty-five percent of maliciously registered domain names are used for phishing within five days of registration.
  7. New top-level domains introduced since 2014 account for 9% of all registered domain names, but 18% of the domain names used for phishing.
  8. About 9% of phishing occurs at a small set of providers that offer subdomain services.

The data set that we collected for this study is quite interesting. Key statistics:

  • 298,012 phishing reports. This is the number of URLs and domains that were added to the four feeds during the study period. Duplicates, i.e., URLs reported separately by one or more of the sources, were removed.
  • 122,092 phishing attacks. From the reports, we identified phishing site (a web location) that targeted a specific brand or entity. We call these "attacks" (the methodology describes how we identified attacks).
  • 99,412 unique domain names. This is the number of unique "registrations", e.g., second-level domain names and third-level domain names where the relevant registry offers third-level registrations (such as
  • 439 top-level domains. This is the number of TLDs where at least one phish was reported.
  • 414 registrars. This is the number of registrars that sponsored gTLD domains that were used for phishing. (A registrar is a businesses that processes domain registrations.
  • 2,169 different Autonomous Systems (AS). This is the number of Autonomous Systems where a phish was reported in at least one IP space delegated from the AS. 
  • 619 attacks on URLs that contained IPv4 addresses and no domain name.

We also identified 60,935 maliciously registered domain names. Of the 99,412 domains used for phishing, we identified 60,935 that we believe were registered maliciously, by phishers. The rest were “compromised domains,” owned by innocent parties on vulnerable hosting.

During our collection period, phishers targeted a whopping 684 brands. The phishing sites emulated 684 different entities. The most-attacked targets identified by our data sources were, in alphabetical order: Amazon, Apple, AT&T, Chase, Facebook, LinkedIn, Microsoft, Outlook (owned by Microsoft), PayPal, and WhatsApp. These top ten targets suffered 50% of the identified phishing attacks. 

One of our most important findings is that:

"Domain name registrars and registry operators are [ ] in an excellent position to find and
prevent the majority of phishing, which takes place on maliciously registered domains. It is possible
for registrars and registry operators to identify maliciously registered phishing domains with a high
degree of accuracy, often at the time of registration... Registrars also possess dispositive information that no one else does: the registrant’s identity (contact information, now mostly redacted in public WHOIS as allowed by a recent change in ICANN policy), the registrant’s payment information, the registrant’s IP address, and the
registrant’s purchase history."

From this finding, I'll suggest that

With great (or in this case, unique) power comes great responsibility. 

ICANN should contemplate carefully how its Whois policy has affected phishing and other cyber attack response and mitigation. Registrars and registries are now the only parties who can reliably access contact data in a timely manner (e.g., in minutes or hours). In our report, we describe proactive or preemptive measures that ICANN's contracted parties could adopt to quash some phishing attacks before harm is done. It's time to step up your game.