Facts & Figures: Whois Policy Changes Impair Blacklisting Defenses
Dave Piscitello, Interisle Consulting Group; Joe Wein, SURBL; Jeff Chan, SURBL; Ivo Bitter, Spamhaus
Let's begin with a summary of the problem space we intended to study:
- Whois databases hold the records and details of who owns domain names.
- Cyber attackers register domain names for political influence campaigns, fraud, malware hosting, and spam.
- Investigators use Whois contact data to identify other domains with some or all of the same contact data that are owned by the same attackers.
- Changes to Whois that were implemented on May 25, 2018 mask the Whois contact data.
We studied domain name tracking and blacklisting reporting provided by two industry-leading and influential reputation providers - SURBL and Spamhaus - to compare their blocklist counts prior to, and following, the adoption of the Whois changes.
We have determined that changes to Whois impair blocklisting, and expose users of government and private networks, internet and hosting providers to various online threats that could have been preemptively stopped had Whois contact data remained available.
Criminals and fraudsters use global criminal networks to transmit email containing phishing scams, inauthentic content on web sites to incite, radicalize, or recruit terrorists, malware, sales of illegal pharmaceuticals (including opiods), and more. These threat actors register thousands of domain names to support these activities, cheaply and often in bulk, and use these in massive campaigns to thwart detection and mitigation activities.
- Private actors, threat researchers, reputation providers (e.g., SURBL, Spamhaus and APWG) investigate online criminal activities in real time, and use Whois contact data to identify threat actors and blacklist domains they use.
- They extract domain names from billions of email, text, and messaging app correspondences that they process daily, to detect cyber threats.
- Having found these domain names, investigators next ask, "Who registered these, and what other domains have they registered?”
Prior to May 25, 2018, contact data from Whois databases around the world answered that question, and the resulting data was confidently used by IT staff to protect the users of government agency networks, ISPs, private networks, and hosting providers.
Whois contact data provides the means to find related domains
Threat detection and blocklisting are by necessity ongoing and iterative processes:
- Criminals typically register hundreds or thousands of domains at a time, and can readily replace any domains that are identified as malicious and blocklisted and are therefore no longer useful to them.
- Investigators query Whois databases constantly, in real time, as part of their analysis methodologies.
Whois contact data – whether owner name, email or postal addresses, telephone number – that is associated with confirmed malicious domains is essential to finding other domains with some or all of the same contact data. Some of these domains may be registered but not yet used, so this processing adds a valuable preemptive element to blocklisting. When matches are found, and other abuse criteria are satisfied, these domains are added a blocklist. Investigators use the lists to dismantle the criminal’s attack networks. IT administrators use the lists to protect users and networks from threats. Cybersecurity analysts need to find virtually ALL of a malefactor’s domain names in order to stop criminal campaigns.
Two Independent Studies, Similar Results
We conducted a study of domains that SURBL and Spamhaus identified and blocklisted from January 2018 to January 2019.
From January 2018 through May 24, 2018, SURBL and Spamhaus had access to Whois contact data for nearly 200 million domain names, and were able to identify lists of registrants that engaged in illicit or cyber threat activities.
On May 25, 2018 many Whois service operators began masking point-of-contact data. Only a handful of Top-level domains and registrars continue to provide unmasked Whois contact data.
To illustrate the impact of masking Whois, we graphed SURBL and Spamhaus blocklistings from January 2018 through January 2019. The counts in these graphs represent domains for which SURBL and Spamhaus had access to Whois contact data, and could thus find domains with similar Whois contact data "indicators".
Studies shows that Whois contact data availability correlates to blocklisting efficiency.
Figure A illustrates the precipitous drop in the number of criminal or cyber domain names that are identifiable, and thus trackable using Whois contact data, after May 25, 2018.
Neither SURBL nor Spamhaus are able to determine whether registrations created after May 25 2018 are part of a known criminal actor’s arsenal of domains, which also adversely affects the ability to separate good from bad. Knowing who the good actors are is extremely valuable information in conducting threat assessment.
Figure A: Data Available from Whois
Studies illustrate how the masking of Whois contact data impairs blacklisting services.
SURBL provided two sets of blacklist counts for the studies:
- Set One, depicted in Figure 1, represents the TLDs .us and .gdn. These Top-Level Domains still provide unmasked Whois contact data, and so Whois contact data was thus available for the entire study period.
- Set Two, depicted in Figure 2, represents counts of all other Top-level Domains. In this set, contact data was available until May 25, 2018 and masked thereafter.
Figure 1: US and GDN Blocklistings
Figure 2: Other TLD Blocklistings
Figure 1 illustrates that SURBL is still able to blacklist “.us” and “.gdn” domains using Whois point-of-contact data to identify a criminal or malicious actor, and then to find other domains registered by that actor. Figure 2 shows a dramatic and continuing decline in our ability to blocklist domains based on Whois point-of-contact indictors where Whois is masked. The number of domains registered by bad actors grows continuously, so the number of blacklisted domains should also grow over time.
Spamhaus did not study US and GDN separately but instead provided a set of blacklist counts for all TLDs.
Figure 3 represents all Top-Level Domains blocklisted by Spamhaus from January 2018 through January 2019. The counts in these graphs represent domains for which Spamhaus had access to Whois contact data and could thus find domains with similar Whois indicators.
Spamhaus experienced similar results despite using different data sources and methodologies than SURBL. As the trend line again illustrates, there is a dramatic decline in blocklisting to a fraction (~30%) of blocklistings as compared to pre-May 25, 2018 counts, despite two significant campaigns that triggered exceptionally large blocklisting events.
Note that the trend lines from Figures 2 and 3 track closely with the decline of available Whois contact data in Figure A.
Greg Aaron and I reported in an ICANN blog that reputation services (RBLs), "are used ubiquitously and are a proven way to protect Internet users. RBLs have been in use for twenty years. During that time, they have been one of the most widely deployed and effective security solutions on the Internet. It is likely every type of entity relies on RBLs, including companies, governments, nongovernmental organizations (NGOs), mobile networks, Internet service providers, email service providers, and social networking sites."
Policies that impair reputation services have wide reaching consequences, and thus studies to measure any impairment are merited. From the findings of two independently conducted studies, we conclude that:
The onset of masking Whois contact data has had the direct, corresponding, and ongoing effect of reducing the number of blocklisted domains, dramatically undermining the efficiency of this, and other, security countermeasures. Some, but not all domains that are associated with a known criminal actor may be blocklisted using alternative data, but not in the timely manner that modern organizational or government agency risk mitigation dictates. Incomplete or delayed information increases vulnerability and users, organizations and sensitive data at increased risk.