A recent Domain Incite article quotes an ICANN Registrar Stakeholder Group (RrSG) claim that over “800,000 domain names have been suspended since the beginning of the year as a result of Whois email verification rules in the new ICANN Registrar Accreditation Agreement (RAA 2013)”. The cause for these suspensions is that registrants did not respond to a validation request sent to the email addresses that they submitted in their domain registration and thus failed to satisfy the validation criteria in the new agreement.
According to DI, the RrSG claim that the suspension figure represents suspension data collected by registrars representing approximately 75% of registered gTLD domains (com, net, org, biz, …). To put that 800,000 figure into context - and assuming a ballpark estimate of the total count of registered domains in the gTLDs is north of 150 million – that’s less than 1 per cent. Given the numerous reported and alleged estimates of Whois inaccuracy over the years 1, 2, 3, I find it hard to understand why anyone would be surprised or alarmed over this figure. However, this figure represents new registrations over a measured period, and thus the percentage would seem more disturbing if not for arguments by experts like Paul Vixie, whose analyses of the DNS cause him to say most new domains are malicious.
Let me temper this seeming insensitivity to possible registrant or Internet user harm or inconvenience by suggesting that, rather than looking at this single data point as alarming or indicative of any pattern or unintended consequence, consider the unprecedented opportunity the data set that corroborates this claim offers.
I applaud the registrars who took the time to collect these data.
I encourage the registrars to share the data.
With ICANN’s SSAC.
With respected members of the security, operations, research and public safety communities.
With ICANN’s Identifier Systems SSR team.
These data – the domain names and the associated registration records (Whois) – can be studied to answer important questions related to registrations (legitimate and malicious) and registrant Whois submission practices; for example:
- Are the registrant data other than email addresses evidently inaccurate?
- When did registrants first apply for the domain name?
- Have the registration data – in particular, email address or point of contact information – been modified by the registrant at any time since the original registration?
- Are the domain names on domain or URL block lists?
- Are the email addresses evidently inaccurate, one-time use (throwaway) email addresses, or validly composed email addresses that bounced?
- Are the email addresses associated with other domains outside the data set?
- Who operates the name servers of these domains?
- Do the name servers of these domains have a positive or negative reputation (e.g., are the name servers known to host malicious domains)?
- What services do these domains offer publicly (and uniquely identify in their zone data using A, CNAME, MX, or SRV resource records)?
- What can be deduced from passive DNS data associated with this domain?
- Where are these services hosted?
- Do the hosting providers of these domains have a positive or negative reputation?
- Where do the prevalence of these domains reside on SEO or site popularity rankings?
- Do hosted services or content (e.g., web site) provide evidence that the domain is active, dormant or malicious?
- What is the characteristic use of the domain name (e.g., online presence, merchant, social medium, pay per click, mail exchange, streaming content…)?
This list is not exhaustive but it does illustrate how valuable such data could be if shared with researchers. Such data could continue to be valuable if registrars are willing to repeat periodic collection and provide additional data points of this kind.
Single data points, especially when presented as an unqualified statistic, rarely provide sufficient insight to characterize the entire set of affected registrants. Concluding such without sharing or subjecting the data to deeper analysis is premature. Rather than invite others to produce similar data without commensurate access to registration data, I encourage the RrSG to work in cooperation with security, operations and public safety communities to better understand the data already collected.
As always... the opinions I express here are my own.
Thanks for the clarifications. I've corrected.
I think we agree that "Finally, and most importantly, we have no sense of either the problem we are trying to solve or how this "solution" does anything to solve it."
You're absolutely right. And the purpose of my post was to (a) ask you to share data so that (b) security researchers could study the questions I've listed to determine whether we are solving a(ny) problem.
As importantly, you can't follow this up with:
"The harm, to hundreds of thousands of registrants is now demonstrable."
with credibility without sharing the data, as I ask. I'm sympathetic if a health care provider's web site goes offline but less so if they submitted a false email address: that's on the provider's IT staff. Is this an outlying case or have 1000s of health care providers been similarly affected? I can't tell without the data. What I can tell is that 10,000s of _new_ registrations are algorithmically generated for botnets, or generated for phishing, illegal pharma or other spam. I can ask Spamhaus, APWG, or SURBL to share these with you. And the whois is routinely inaccurate or incomplete.
Last point. Yes, bad actors will try to circumvent security measures. I've never imagined that any single security measure was a silver bullet and I don't think this one is, either. Ideally, we would continue to add validation measures to make it increasingly hard for circumvention and collectively they become formidable enough for bad actors to change behavior or make mistakes or both.
Posted by: The Security Skeptic | Monday, 13 April 2015 at 03:12 AM
Dave, I was the one who presented the data. A couple things you say above are incorrect/misleading.
First, you say:
"The cause for these suspensions is inaccurate domain registration data, in particular, email addresses that do not satisfy the validation criteria in the new agreement."
This is not correct. The reason is that the registrants did not respond to the validation request. There are a ton of different reasons for this, most primarily because folks like you do good work educating (and scaring people) about the dangers of clicking a link in an email. Your statement is either an assumption or a bare assertion and certainly did not come from the data I presented or the comments that followed.
Second, you state:
"To put that 800,000 figure into context - and assuming a ballpark estimate of the total count of registered domains in the gTLDs is north of 150 million – that’s less than 1 per cent."
That is misleading and not relevant. The 800k is a materially significant % of the total NEW domains registered in the time period covered. You can do your own math on that as the zone files are public, but I think the percentage will shock you.
Finally, and most importantly, we have no sense of either the problem we are trying to solve or how this "solution" does anything to solve it. The harm, to hundreds of thousands of registrants is now demonstrable. The benefit is not even asserted. Our call was to those who asked for this, primarily LEA, to at least begin to try and document the benefit. There is no evidence whatsoever that more accurate whois data will address any ills.
You know better than we do that the bad guys will have no problem dealing with validation. It is the health care providers and community groups whose websites go down that will.
Always happy to discuss.
Posted by: Enoss | Wednesday, 25 June 2014 at 11:26 AM
Keep in mind that this number will increase.
At the current speed we could be looking at 1.6 million domain names in 6 monnths and so on.
Posted by: Cctlddk | Wednesday, 25 June 2014 at 10:55 AM
Just a clarification for the initial premise of your article.
The 800,000 suspensions are NOT due to inaccurate whois email addresses.
They simply reflect unverified email addresses. It is theoretically possible that all of them are accurate and the users simply did not click on the verification link.
Lets avoid a bias of assuming that a suspension was due to inaccurate data.
Posted by: EnCirca | Wednesday, 25 June 2014 at 10:51 AM