One of the most memorable lyrics of For What It’s Worth (Buffalo Springfield, 1967) aptly describes the current condition of the post-GDPR debate over domain registration data access:
There’s battle lines being drawn… nobody’s right if everybody’s wrong.
Cybersecurity and policy pundits are heatedly engaged over the impact of the EU General Data Protection Regulation (GDPR). Both sides have done a poor job of articulating the problem space, overlooking key aspects of the regulation and ICANN’s attempt to comply to GDPR in a Temporary Specification For Whois.
As difficult as it is to engage in this discussion dispassionately, it’s both necessary and urgent that we re-focus attention to the problem space.
I’ll begin by considering comments in a recent post, Special Interests Push US Congress to Override ICANN’s Whois Policy Process. I question several of the assertions made in this article:
The author asserts that publication of Whois point of contact data exacerbates spam. This remains a poorly studied theory with no definitive conclusion. Whois may be a source of email addresses for spammers, but the true question is whether it is the most popular, least expensive, and most effective means. Compared to crawling web pages or social media sites where users commonly publish email addresses, Whois is far too slow, and the number of email addresses that someone can conceivably scrape from Whois is small compared to what is available with less latency from the web. Spammers are also quite accomplished at synthesizing the user name part of an email address and flooding mail exchanges with permutations or combinations of addresses with near zero overhead. Given that there is no popularly accepted study or any study that satisfies academic rigor on which to conclude that spammers use Whois as a primary source for email address collection, this remains speculation.
The author asserts that “Registrars and Registries must still provide reasonable access to personal data to third parties with legitimate interests that are not overridden by privacy rights, such as law enforcement agencies pursuing criminals”. The author asserts this as fact, based perhaps on the temporary specification but evidently not on practice or recent experience. In practice, reasonable access by third parties is flawed in several important respects. There is presently no uniformity across registrars and registries with regard to interpretation and implementation of “Reasonable access”, “third parties”, and “legitimate interests”. The process for requesting access is not well publicized, and at least three frameworks for accreditation have been proposed. There is no defined ICANN compliance process to contest whether a registry or registrar has failed to meet the reasonable access obligation. Some are responsive. Some ignore any request. Some insist on court orders. When denied access in these circumstances, investigators cannot determine whether the domain registration has inaccuracies or fraudulently composed data. As email expert John Levine explains in his recent post, investigators can’t “find connections among domains (which tend to be registered with similar information, even if it's false) to take down a whole network of them at a time.” These same flaws affect trademark protection.
The author fails to discuss the most egregious problem with redacting non-public Whois data: response time for access requests is indeterminate but, in every case, it’s longer than Whois query time prior to 25 May 2018. Investigators cannot provide timely victim notification; simply put, they can’t contact a registrant whose web site has been hacked or is hosting a phishing page. In cases involving malware hosting, or spam campaigns that deliver malware, investigators strive to make near real time blocking or takedown decisions. Optimally, investigators want to blocklist, suspend name resolution, or remove harmful content in 1-4 hours. Meeting this objective was challenging before GDPR. It is more so now, and the consequence to registrants is that security or email administrators have to make allow/deny decisions, with less precision.
In an earlier post, How Far Will Email Operators Take Blocklisting to Prevent Spam?, I explained that ”email administrators weigh risk against reward when they make decisions regarding how to mitigate spam. They think first or exclusively about the security of their organization, their users, or their customers.” Domain registrants should interpret this as an indication that false positives or universal acceptance carry less weight than risk mitigation. The best precision that email or security administrators may have with the limited intelligence that Whois offers may be TLD, sponsoring registrar, or name server. Blocking entire registries may become as accepted a practice as dropping traffic to ASNs with poor reputations. If you’re defending your network or protecting your users, blocking the most abused registries and registrars makes even more sense today than prior to GDPR.
While post-GDPR policy suppresses attributable data and makes it very difficult to pivot from one data set to another, the web thankfully does not. Let’s consider another recent article, 90 Days of GDPR: Minimal Impact on Spam and Domain Registrations. The authors assert that there has been a decline in spam volume in the generic Top-Level Domain. John Levine notes, and I concur, that the authors are studying the wrong question. Whether or not spammers would send more spam and register more domains because GDPR came into effect “tells us nothing useful about how GDPR affects anything. It's the wrong question, it's not a question most security people are concerned with, and it ignores how spam and spammers work.”
John explains the fundamentals of spam so well in his article that I’ll encourage you to read it carefully, especially if you are not involved in spam detection or mitigation. I will add, however, that Talos system and others try to measure spam message volume. Spam message volume fluctuates, sometimes dramatically. There are many factors that influence when spammers run campaigns: promotional pricing, TLD ownership, changes in spam countermeasures, or botnet take downs all affect spam distribution. The authors should have considered such factors. For example, Global Registry Services, Ltd. has recently taken over running Famous Four Media’s new gTLD portfolio, which has been previously reported as the spammiest block of new TLDs. The new operators promise to “abandon the failed penny-domain strategy and crack down on spam”. This action will hopefully result in a clean-up of these TLDs. 90 days is too small a measurement window to evaluate whether this factor or others like aggressive blocklisting, GPDR, or the dismantling of a particular spam delivery infrastructure such as Avalanche are the collective causes for spam volume decline.
The authors dismiss concerns that the spam problem is growing as “popular opinions among security researchers”. This may be a popular opinion, but it’s not accurately attributed. From my experience in this field and my daily interaction with security researchers, it’s more accurate to say that operational security practitioners worry that spam mitigation is more encumbered today, and the ICANN Temporary Spec for Whois, not the GDPR, makes it so. I’ll repeat: focus on the problem space.
The authors shift to discussing to changes in domain name registration volume. Like spam message volume, new registration measurements are not particularly insightful when studying spam. Spammers register thousands of domain names through registrar bulk registration services, at rates of hundreds per minute, thousands per day, periodically, in some cases 6-8 months in advance of using them. New domain registrations are not the relevant denominator in calculating spam abuse percentages. Since only names that resolve in the DNS can be used in spam, new registrations tell us more about pricing, promotions, and flocking behavior than spam trends. Again, I refer you to ICANN’s DAAR and in particular, the methodology paper.
They note that the Spamhaus Most Abused TLDs list contains many new TLDs, and then discuss how registrations among these have declined. They observe that new registrations data that they studied “contradict the idea that spammers are focusing on registering domains that might be used for spam later. The first and most obvious is the uptick in the percentage of .com domain registrations”. How they arrived at this conclusion eludes me. The authors should have looked at the history of the new TLD program, perhaps using cached Spamhaus abused TLD pages from archive.org. A 90-day observation window tells us little that we did not observe before GPDR and misses a great deal as well. Security researchers know from longer historical data that spammers have migrated from one or sets of new TLDs to others since the inception of the program. Early in the program, XYZ was a heavily spammed TLD. Nearly every one of the TLDs in Famous Four Media’s portfolio have been in the top 10 most abused TLDs over the past two years. Flocking to and migrating from new TLDs has largely followed pricing. Registrars that accommodate spammers with low cost and with bulk registration features that are terrifying similar to algorithmic domain generation will influence new registrations more than GPDR.
Lastly, the authors seem to fall victim to size bias when they assert that COM is “relatively spam free”.COM is an enormous TLD, with approximately 136M registrations. It’s nearly forty times larger than the largest new TLD (TOP). For comparative purposes, the authors should use the counts of bad domains seen. The count of bad domains seen in COM dwarfs the raw bad domains count found in all of the new TLDs listed in the Spamhaus Most Abused TLD list combined.
The research in this post does not support the findings.
The findings are not relevant to finding solutions to the problem space.
We need to be better than this. Focus on the problem space.