Domain names and DNS

Facts & Figures: Whois Policy Changes Impair Blocklisting Defenses

In October 2018, APWG and M3AAWG jointly conducted a survey of 300+ cyber first responders, researchers, and law enforcement investigators to learn whether or not ICANN's Temporary Specification for Whois was interfering with efforts to mitigate security threats. The survey respondents overwhelmingly indicated that the Temp Spec, and in particular, the redaction of Whois contact data, was impeding investigations.
 
Having identified Whois redaction as an impedance,  the cybersecurity community should try to measure the impact.
 
I invited my colleagues Jeff Chan and Joe Wein of SURBL, and Ivo Bitter of Spamhaus, to work with me to conduct studies to measure the impact of redacted Whois contact data. I chose SURBL and Spamhaus because I've worked with my colleagues there for years and know their dedication to scientific methods. I was also keen to include widely used reputation services. Both of these services qualify: Spamhaus and SURBL blacklists protect billions of mailboxes and hundreds of millions of end users daily. We agreed to compare blocklist counts prior to, and following, the adoption of the Whois changes.
 
We conclude from two independently conducted studies that the onset of masking Whois contact data has had the direct, corresponding, and ongoing effect of reducing the number of blocklisted domains, dramatically undermining the efficiency of this and other security countermeasures.
 
Further, this interference exposes users of government and private networks, internet and hosting providers to various online threats that could have been preemptively stopped had Whois contact data remained available.
 
A briefing we've prepared for legislators and ICANN policy consideration follows. A more comprehensive report is forthcoming.

Facts & Figures: Whois Policy Changes Impair Blacklisting Defenses

Dave Piscitello, Interisle Consulting Group; Joe Wein, SURBL; Jeff Chan, SURBL; Ivo Bitter, Spamhaus

Summary

Let's begin with a summary of the problem space we intended to study:

  • Whois databases hold the records and details of who owns domain names.
  • Cyber attackers register domain names for political influence campaigns, fraud, malware hosting, and spam.
  • Investigators use Whois contact data to identify other domains with some or all of the same contact data that are owned by the same attackers.
  • Changes to Whois that were implemented on May 25, 2018 mask the Whois contact data.

We studied domain name tracking and blacklisting reporting provided by two industry-leading and influential reputation providers -  SURBL and Spamhaus -  to compare their blocklist counts prior to, and following, the adoption of the Whois changes.  

We have determined that changes to Whois impair blocklisting, and expose users of government and private networks, internet and hosting providers to various online threats that could have been preemptively stopped had Whois contact data remained available. 

Background

Criminals and fraudsters use global criminal networks to transmit email containing phishing scams, inauthentic content on web sites to incite, radicalize, or recruit terrorists, malware, sales of illegal pharmaceuticals (including opiods), and more. These threat actors register thousands of domain names to support these activities, cheaply and often in bulk, and use these in massive campaigns to thwart detection and mitigation activities.

  • Private actors, threat researchers, reputation providers (e.g., SURBL, Spamhaus and APWG) investigate online criminal activities in real time, and use Whois contact data to identify threat actors and blacklist domains they use.
  • They extract domain names from billions of email, text, and messaging app correspondences that they process daily, to detect cyber threats.
  • Having found these domain names, investigators next ask, "Who registered these, and what other domains have they registered?”

Prior to May 25, 2018, contact data from Whois databases around the world answered that question, and the resulting data was confidently used by IT staff to protect the users of government agency networks, ISPs, private networks, and hosting providers.

Whois contact data provides the means to find related domains

Threat detection and blocklisting are by necessity ongoing and iterative processes:

  • Criminals typically register hundreds or thousands of domains at a time, and can readily replace any domains that are identified as malicious and blocklisted and are therefore no longer useful to them.
  • Investigators query Whois databases constantly, in real time, as part of their analysis methodologies.

Whois contact data – whether owner name, email or postal addresses, telephone number – that is associated with confirmed malicious domains is essential to finding other domains with some or all of the same contact data. Some of these domains may be registered but not yet used, so this processing adds a valuable preemptive element to blocklisting. When matches are found, and other abuse criteria are satisfied, these domains are added a blocklist. Investigators use the lists to dismantle the criminal’s attack networks. IT administrators use the lists to protect users and networks from threats. Cybersecurity analysts need to find virtually ALL of a malefactor’s domain names in order to stop criminal campaigns.

Two Independent Studies, Similar Results

We conducted a study of domains that SURBL and Spamhaus identified and blocklisted from January 2018 to January 2019.

From January 2018 through May 24, 2018, SURBL and Spamhaus had access to Whois contact data for nearly 200 million domain names, and were able to identify lists of registrants that engaged in illicit or cyber threat activities.

On May 25, 2018 many Whois service operators began masking point-of-contact data. Only a handful of Top-level domains and registrars continue to provide unmasked Whois contact data.

To illustrate the impact of masking Whois, we graphed SURBL and Spamhaus blocklistings from January 2018 through January 2019. The counts in these graphs represent domains for which SURBL and Spamhaus had access to Whois contact data, and could thus find domains with similar Whois contact data "indicators".

Findings

Studies shows that Whois contact data availability correlates to blocklisting efficiency.

Figure A illustrates the precipitous drop in the number of criminal or cyber domain names that are identifiable, and thus trackable using Whois contact data, after May 25, 2018.

Neither SURBL nor Spamhaus are able to determine whether registrations created after May 25 2018 are part of a known criminal actor’s arsenal of domains, which also adversely affects the ability to separate good from bad. Knowing who the good actors are is extremely valuable information in conducting threat assessment.

Figure A Data available from Whois

               Figure A: Data Available from Whois   

Studies illustrate how the masking of Whois contact data impairs blacklisting services.

SURBL provided two sets of blacklist counts for the studies:

  • Set One, depicted in Figure 1, represents the TLDs .us and .gdn. These Top-Level Domains still provide unmasked Whois contact data, and so Whois contact data was thus available for the entire study period.
  • Set Two, depicted in Figure 2, represents counts of all other Top-level Domains. In this set, contact data was available until May 25, 2018 and masked thereafter.

Figure1 SURBL US-GDN

Figure 1: US and GDN Blocklistings

Figure2 SURBL other TLDs

Figure 2: Other TLD Blocklistings

Figure 1 illustrates that SURBL is still able to blacklist “.us” and “.gdn” domains using Whois point-of-contact data to identify a criminal or malicious actor, and then to find other domains registered by that actor. Figure 2 shows a dramatic and continuing decline in our ability to blocklist domains based on Whois point-of-contact indictors where Whois is masked. The number of domains registered by bad actors grows continuously, so the number of blacklisted domains should also grow over time.

Spamhaus did not study US and GDN separately but instead provided a set of blacklist counts for all TLDs.

Figure 3 represents all Top-Level Domains blocklisted by Spamhaus from January 2018 through January 2019. The counts in these graphs represent domains for which Spamhaus had access to Whois contact data and could thus find domains with similar Whois indicators.

Figure3 spamhaus all tldsFigure 3 Spamhaus All TLDs

Spamhaus experienced similar results despite using different data sources and methodologies than SURBL. As the trend line again illustrates, there is a dramatic decline in blocklisting to a fraction (~30%) of blocklistings as compared to pre-May 25, 2018 counts, despite two significant campaigns that triggered exceptionally large blocklisting events.

Note that the trend lines from Figures 2 and 3 track closely with the decline of available Whois contact data in Figure A.

Conclusions

Greg Aaron and I reported in an ICANN blog that reputation services (RBLs), "are used ubiquitously and are a proven way to protect Internet users. RBLs have been in use for twenty years. During that time, they have been one of the most widely deployed and effective security solutions on the Internet. It is likely every type of entity relies on RBLs, including companies, governments, nongovernmental organizations (NGOs), mobile networks, Internet service providers, email service providers, and social networking sites."

Policies that impair reputation services have wide reaching consequences, and thus studies to measure any impairment are merited. From the findings of two independently conducted studies, we conclude that:

The onset of masking Whois contact data has had the direct, corresponding, and ongoing effect of reducing the number of blocklisted domains, dramatically undermining the efficiency of this, and other, security countermeasures. Some, but not all domains that are associated with a known criminal actor may be blocklisted using alternative data, but not in the timely manner that modern organizational or government agency risk mitigation dictates. Incomplete or delayed information increases vulnerability and users, organizations and sensitive data at increased risk.


Conservative abuse reporting throws new TLD program under the bus

ICANN has released a January 2019 domain abuse report generated from the Domain Abuse Activity Reporting system (DAAR). DAAR is a system for studying and reporting on domain name registration and security threat (domain abuse) behavior across top-level domain (TLD) registries and registrars. While at ICANN, I was actively involved with DAAR from inception to early production, and I’m pleased that ICANN has begun monthly reporting.  

According to ICANN’s DAAR project page, one purpose of the project is to “provide the ICANN community with a reliable, persistent, and reproducible set of data from which security threat (abuse) analyses could be performed.” The January 2019 report provides a distribution of domains identified as security threats and a breakdown of security threats by class: phishing, botnet command-control, spam, and malware hosting. It does so for all new and legacy registries for which the DAAR project can collect TLD zone data. 

The report is promising in the sense that ICANN has finally begun reporting abuse, so we should celebrate this landmark publication.

The report is disappointing for several reasons.

Top-level Domain Reporting Tells a Partial Story

The report provides only summary statistics for the new and legacy Top-level Domains (TLDs), in pie-chart format, and thus provide the operational community with “findings” that are not actionable. Most importantly, the data tell a largely misleading story. To explain further, I’ve reproduced two pie charts from the report, side by side:

Screenshot (84) Screenshot (85)

Together, these charts say the following:

Over one-half of the domains identified as security threats are
registered in one-eighth of the generic TLD name space.

This is an actionable conclusion.

Let’s assume that you wear a network or email administrator’s blocking policy hat. An important part of your role is to mitigate risk for your organization in the most straightforward and expeditious manner. Your pragmatic conclusion from this finding may well be…

Administrators can reduce an organization’s exposure to over one-half
of domain-related security threats by BLOCKING ALL new TLDs.

Administrators can implement this security policy at firewalls, mail servers, proxies, or DNS resolvers. It is an appropriate blocking rule based on the limited insights that ICANN’s report offers.

Conservative reporting in this case does more harm than good. By failing to be open and transparent about the high levels of abuse in new TLDs, ICANN actively frustrates efforts to promote Universal Acceptance of domain names and email addresses and calls future new TLD delegations into question.

An Opportunity Lost

Later in the report, ICANN could have helped administrators refine blocking policies, where it reports that

“of the 781,795 domains identified as security threats reported in
 341 new gTLDs:

• 35 percent were in the 5 most-exploited new gTLDs.
• 52 percent were in the 10 most-exploited new gTLDs.
• 88 percent were in the 25 most-exploited new gTLDs.
• 98 percent were in the 50 most-exploited new gTLDs.”

Administrators now know only 341 new TLDs have abuse domains reported. That’s less approximately one-third of the generic TLDs. They also now know that 98% of the new gTLD security threats are concentrated in 50 of new gTLDs, so they can refine the above rule:

Administrators can reduce an organization’s exposure to nearly one-half
of domain-related security threats by BLOCKING 50 new TLDs.

Except… ICANN does not publish the names of TLDs in the report.

With only the findings that ICANN’s report publishes, the most risk-averse blocking rule we can compose throws nearly the entire new TLD program under the bus.

The Truth is Out There

ICANN’s Context Document for the January 2019 report lists the reputation feeds that DAAR employs. The largest and most popularly used among these are the Spamhaus and SURBL data. With some work, you can make use of the statistics published at Spamhaus World’s Most Abused TLDs or SURBL’s Most Abused TLDs page to approximate the 10, 20, or 50 new TLDs that DAAR reports as concentrations of abuse domains. But you have to ask, “Why would an administrator go through this effort if the governance body for domain names is unwilling to commit itself to publishing actionable data or at least data that can inform its own policy community where the problems lie?”

Naming may be shaming, but it can also be enlightening. It’s been my experience that some TLD operators do not actively investigate abuse. Some TLDs have had high badness indexes for years.  Public disclosure might be the forcing function that instigates change in abuse mitigation.

To illustrate a use case for naming to enlighten, I’ve reproduced recent values for the Spamhaus “badness index” for the city TLDs delegated as new TLDs:

amsterdam = 0.0% bad
barcelona = 0.0% bad  
brussels = 0.0% bad  
capetown = 0.0% bad  
kyoto = 0.0% bad    
moscow = 0.0% bad
москва = 0.0% bad
stockholm = 0.0% bad
taipei = 0.0% bad
wien = 0.0% bad
佛山= 0.0% bad
广东  = 0.0% bad   
hamburg = 0.4% bad
nyc = 0.4% bad

berlin = 0.5% bad
sydney = 0.6% bad  
melbourne = 0.7% bad
paris = 0.7% bad
vegas = 0.9% bad
cologne = 1.2% bad  
miami = 2.0% bad
quebec = 3.0% bad
istanbul = 4.0% bad
osaka = 9.1% bad  
nagoya = 45.0% bad
yokohama = 44.0% bad
tokyo = 59.5% bad

Using TLD data only, several Japanese cities appears to be loci of abuse. Like or not, many organizations use Spamhaus or SURBL data and may choose to block all of the domain names delegated from these TLDs.

But again, you’re only seeing part of the story. Ask any operational security investigator or reputation feed operator and they will tell you that Attackers and criminals are TLD agnostic or opportunistic. Sponsoring registrar data often reveals more about the loci of abuse than TLD data.

No Data on Registrar Abuse

For reasons that only ICANN can provide, the January 2019 report does not contain registrar data. Again, the Truth is Out There. Spamhaus also publishes a Top 10 Most Abused Registrars List, where you can observe that Asia-Pacific is a geo-locus of registrars with extraordinarily high badness indexes. If ICANN were to publish registrar data, administrators might be able to learn more about the interrelationships between TLDs and registrar abuse, and refine blocking policies accordingly; more importantly, ICANN community would have data that could influence future registrar accreditation agreement deliberations.  

Urge ICANN to Meet Commitments

The ICANN community harps about the need for ICANN organization to operate transparenly and accountably. The community, especially the contracted TLD and registrar parties, should be held to the same standards. Urge ICANN to publish data that identify the outliers. Urge ICANN, too, to publish registrar data. These data are essential if ICANN organization and community intend to meet the commitment to “provide the community with a reliable, persistent, and reproducible set of data from which security threat (abuse) analyses could be performed.”