The URL Block List at APWG’s eCrime eXchange (ECX UBL) is about to change in several, dramatic ways.
Data volume. APWG is folding Facebook’s phish feed into the UBL database. When the merge of the Facebook and APWG UBL archives is complete, the UBL will grow from just under two million to an estimated fourteen million records. Once current, APWG anticipates that the Facebook feed will add approximately 160,000 unique phish records per day. In an email to ECX members, APWG Lead Developer, Mike D'Ambrogia, notes that, “We have a few other large new feeds from other corporate entities beginning to submit phish URLs to us as well over the next few weeks that will continue the development of the UBL into a global corpus of crime event data.”
Data expansion. Facebook data are dominated by shortened URLs. APWG intends to add a new process to expand and track the shortened links by following the 30x redirects to the final destination tracking the IPs and FQDNs we encounter along the way. Mike explains that APWG intends to fold three records into the UBL: the original reported shortened URL, the expanded URL (redirects), and the landing location. APWG will also maintain a record of the parent-child relationship between these records, to “make the data interrelationships visible/accessible via the eCX UBL API to help members understand where the data originated.”
API will evolve. The ECX API provides data query capability with replies in XML or JSON schemas and also provides canned downloads (now last 72 hours or last week). Mike explains that the API now provides “a much improved ability for you to get exactly what you want out of the data versus the current strategy of grabbing everything and then filtering the data on your end you can search for exactly what you are after”. The API currently allows members to filter by confidence percentage ranges, URL sub-strings, IP subnet ranges, and brands. As new metadata becomes associated with UBL data, APWG intends to add the query filtering options to further filter on other data or metadata, including the shortened URL data sets, whois, registry and registrar data. APWG intends to move from the current RPC-based API to a REST-based application in the future.
My ICANN colleague Rick Lamb and I are currently experimenting with the ECX UBL to track and study phishing activity in the new TLDs. The UBL data had already proven valuable. The near tenfold increase in volume, the additional data and metadata associated with shortened URLs, and the anticipated expanded search capabilities has us really excited.
APWG members can access the expanded and enhanced data using their existing eCrime eXchange accounts. Enhancements to the API are documented in each UBL section of the site (e.g., downloads, submissions).
What if you’re not currently a member and want to explore the cybercrime event data and API I’ve described here? APWG has established a trial access program to the ECX UBL and for all eligible enterprises. Send requests for access to APWG Engineering to [email protected].
Comments