13 Mar 2025 8 min read phishing

DomainHunter: A Distributed System for Identifying Potentially Malicious Domains

Introduction

In today's digital landscape, phishing attacks remain one of the most prevalent threats to organizations and individuals. Attackers constantly register new domains that mimic legitimate services, often using sophisticated techniques to evade detection. Security teams need efficient tools to identify, analyze, and respond to these threats quickly.

In this post, I'll walk through DomainHunter, a distributed system built on Cloudflare Workers that can help security teams identify, analyze, and respond to potentially malicious domains. We'll explore the architecture, implementation details, and how each component works together to provide a comprehensive domain monitoring solution.

System Architecture Overview

DomainHunter consists of four main components:

Webhook Service: Receives domain alerts from Cloudflare's Brand Protection product and filters unwanted domains.
Enrichment Service: Enhances domain data with WHOIS information, IP resolution details, and URLscan results.
LogView Service: Provides a web interface for viewing and searching collected domain data.
Graph Service: Offers a visualization dashboard for domain threat analytics.

Let's dive into each component to understand how they work together.

The Webhook Service: Processing Domain Alerts

The webhook service is the heart of DomainHunter. It receives notifications from Cloudflare's Brand Protection product about potentially suspicious domains that might be impersonating your brand or services.

This component performs several critical functions:

Receives webhook notifications about potentially malicious domains
Refangs the domain for processing (replaces '[.]' with '.')
Validates and filters out noisy or spammy domains to conserve API resources

Here's a simplified version of the webhook handler:

async function handleWebhook(request, env) {
  // Parse the incoming request
  const alertData = await request.json();
  
  // Extract and refang the domain (replace '[.]' with '.')
  let domain, domainDefang;
  if (alertData.domain) {
    domain = alertData.domain.replace('[.]', '.');
    console.log(`Extracted Domain (refanged): ${domain}`);
    domainDefang = alertData.domain;
    console.log(`Extracted Domain (defanged): ${domainDefang}`);
  }
  
  // Get match details
  const match = alertData.match || 'Unknown';
  const matchTime = alertData.matchTime || new Date().toISOString();
  
  // Filter out false positives
  if (domain && /XXX|YYY|ZZZ/i.test(domain)) {
    console.log('Spam domain name detected, skipping processing');
    return new Response('Spam domain detected, skipping processing.', {
      status: 200,
      headers: { 'Content-Type': 'text/plain' },
    });
  }
  
  const spammyTLDs = ['XXX', 'YYY', 'ZZZ'];
  if (domain && spammyTLDs.some(tld => domain.endsWith(tld))) {
    console.log('Spammy TLD detected, skipping processing');
    return new Response('Spammy TLD detected, skipping processing.', {
      status: 200,
      headers: { 'Content-Type': 'text/plain' },
    });
  }
  
  // Processing continues...
}

Filtering Logic

The filtering logic is particularly important as it helps prevent noise and conserves your API quotas. The system filters domains based on:

Known spam patterns in the domain name
Specific TLDs frequently associated with abuse
Other custom rules you might want to add based on your threat landscape

This initial triage ensures that only domains worthy of further investigation move to the enrichment phase.

The Enrichment Service: Gathering Intelligence

One of the most valuable aspects of DomainHunter is its ability to enrich domain data with information from multiple sources. This gives security teams context about a potentially malicious domain. In the example code below my real domain has been replaced with [YOURDOMAIN] as a placeholder. If you were deploying the same code (if I eventually release it) this would be your domain name.

// Fetch Whois information for the domain
const whoisUrl = `https://whois.[YOURDOMAIN]/?domainName=${domain}`;
const whoisResponse = await fetch(whoisUrl, {
  method: 'GET',
  headers: {
    'Content-Type': 'application/json',
  },
});

const whoisData = await whoisResponse.json();
let registrarName = 'N/A';
let hostNames = [];

if (whoisData && whoisData.registrar) {
  registrarName = whoisData.registrar;
}

if (whoisData && Array.isArray(whoisData.hostNames)) {
  hostNames = whoisData.hostNames;
}

// Fetch IP information if IPs are available
let ipData = null;
let org = 'N/A';
let formattedIps = 'N/A';

if (whoisData && Array.isArray(whoisData.ips) && whoisData.ips.length > 0) {
  formattedIps = whoisData.ips.join('\n');
  
  const ipInfoUrl = `https://ipinfo.[YOURDOMAIN]/?ip=${whoisData.ips[0]}`;
  const ipInfoResponse = await fetch(ipInfoUrl, {
    method: 'GET',
    headers: {
      'Content-Type': 'application/json',
    },
  });
  
  ipData = await ipInfoResponse.json();
  
  if (ipData && ipData.org) {
    org = ipData.org;
  }
}

// Fetch URLScan information for the domain
const urlScanUrl = `https://urlscan.[YOURDOMAIN]/?url=https://${domain}`;
const urlScanResponse = await fetch(urlScanUrl, {
  method: 'GET',
  headers: {
    'Content-Type': 'application/json',
  },
});

const urlScanData = await urlScanResponse.json();
let resultUrl = 'N/A';

if (urlScanData && urlScanData.result) {
  resultUrl = urlScanData.result;
}

Understanding the Intelligence APIs

To build a comprehensive threat profile for each domain, DomainHunter leverages three specialized intelligence APIs that work together to provide a multi-dimensional view:

1. WHOIS Intelligence (whois.[YOURDOMAIN])

This API provides critical domain registration information:

Registrar details (which company the domain was registered through)
Registration and expiration dates (newly registered domains are often suspicious)
Name servers that host the domain's DNS records
IP addresses the domain resolves to
Ownership information that can help identify patterns across malicious domains
Historical registration data

This information helps security teams evaluate the legitimacy of a domain by examining its age, who registered it, and what infrastructure it uses. In my specific use case, WhoisXML generously supported my independent security research. I created a Worker wrapper around their WHOIS API to query and extract the exact data points needed for this project.

2. IP Intelligence (ipinfo.[YOURDOMAIN])

Once we have the IP addresses from the WHOIS data, this API enriches our understanding of the hosting infrastructure:

Hosting organization/provider (certain providers are frequently used by threat actors)
Geographic location of the server
Autonomous System Number (ASN) information
Network details (subnet, range, etc.)
Abuse contact information for the IP
Additional domains hosted on the same IP (potentially revealing campaign infrastructure)

In this use case I am making use of data from IPinfo. This data helps identify hosting patterns and infrastructure connections that might not be apparent from domain data alone.

3. Content Analysis (urlscan.[YOURDOMAIN])

This API performs active analysis of the website content:

Initiates a real-time scan of the website
Captures screenshots of the site (visual evidence)
Analyzes page content for phishing indicators
Detects malicious scripts, iframes, and redirects
Identifies technologies used on the site
Maps connections to other domains and resources
Provides a detailed report URL for manual investigation

This active scanning component is crucial for understanding what's actually hosted on the domain and determining if it contains phishing content.

By combining these three intelligence sources, DomainHunter creates a multi-dimensional threat assessment of each domain, providing security teams with the context they need to make informed decisions.

Real-Time Alerts via Slack

After processing the domain, DomainHunter sends a formatted notification to a Slack channel, allowing security teams to quickly assess the threat:

const slackUrl = env.SLACK_WEBHOOK_URL;
const slackMessage = {
  text: `New Detection: \`${domainDefang}\`
Detection Time: \`${matchTime}\`
Matched this query: \`${match}\`
Domain resolves to: \`\`\`${formattedIps}\`\`\`
Host/CDN: \`${org}\`
Name servers: \`\`\`${formattedHostNames}\`\`\`
Registrar Name: \`${registrarName}\`
URLscan: <${resultUrl}>`
};

const slackResponse = await fetch(slackUrl, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(slackMessage)
});

The Slack notification provides a concise summary of all the critical information, allowing security teams to make quick assessments:

Benefits of Real-Time Alerts

Rapid response: Security teams can immediately see new detections
Actionable intel: All critical data points are included
Direct URLscan link: One-click access to visual evidence
Seamless workflow: No need to constantly check a dashboard

Persistent Storage with Cloudflare D1

All the collected information is stored in a Cloudflare D1 database for historical analysis and reporting:

// Store data in D1 database
const { d1 } = env;
const formattedHostNames = hostNames.join('\n');

const insertQuery = `INSERT INTO logs (domain, match_time, match, ips, name_servers, registrar, host_cdn, urlscan_result) 
                    VALUES (?, ?, ?, ?, ?, ?, ?, ?)`;
const stmt = d1.prepare(insertQuery);
const result = await stmt
  .bind(domain, matchTime, match, formattedIps, formattedHostNames, registrarName, org, resultUrl)
  .run();

Why Cloudflare D1?

Using Cloudflare D1 for storage provides several advantages:

Zero infrastructure management: No database servers to maintain
Global distribution: Data is stored close to where it's used
SQL compatibility: Familiar query language for data analysis
Automatic scaling: Handles high volumes of data without provisioning concerns
Direct integration: Seamlessly works with Cloudflare Workers

Web application for viewing and searching logs

The LogView Service: Searching and Analyzing Domains

The LogView service provides a web interface for viewing and searching all collected domain data. It features:

Basic authentication for secure access
A sortable and searchable HTML table
Pagination for browsing large datasets
Direct links to URLScan results
Export functionality for further analysis

async function handleRequest(request, env) {
  // Basic Authentication
  const authHeader = request.headers.get('Authorization');
  const expectedAuth = 'Basic ' + btoa('XXX:YYY'); // Username: 'XXX', password: 'YYY'

  if (authHeader !== expectedAuth) {
    return new Response('Unauthorized', {
      status: 401,
      headers: {
        'WWW-Authenticate': 'Basic realm="Logs Viewer"',
        'Content-Type': 'text/plain'
      },
    });
  }

  // Query the database for all logs
  const { d1 } = env;
  const stmt = d1.prepare('SELECT * FROM logs ORDER BY id DESC');
  const result = await stmt.all();
  
  // Generate HTML table with the results
  // Including search, sort, and pagination functionality
  // ...
}

The LogView interface allows security teams to conduct historical analysis, search for patterns, and create reports based on collected data.

Graph view of the threat data, also includes search functionality

The Graph Service: Searching and Visualizing Domain Threats

The Graph service provides a searchable visualization dashboard using Chart.js to display:

Top domain registrars used by potential threat actors
Top hosting providers/CDNs where suspicious domains are hosted
Top match patterns (security rules that triggered alerts)
Top TLDs (Top-Level Domains) used in suspicious domains
Day of week when the match took place
Time of day (UTC) when the match took place
Trends over time for all metrics

async function handleRequest(request, env) {
  const { d1 } = env;
  
  // Query for top registrars
  const registrarStmt = d1.prepare(`
    SELECT registrar, COUNT(*) as count 
    FROM logs 
    WHERE registrar IS NOT NULL AND registrar != 'N/A'
    GROUP BY registrar 
    ORDER BY count DESC 
    LIMIT 10
  `);
  const registrarResult = await registrarStmt.all();
  
  // Similar queries for host_cdn, match, and TLDs
  // ...
  
  // Format data for Chart.js
  const registrarLabels = registrarResult.results.map(item => item.registrar);
  const registrarData = registrarResult.results.map(item => item.count);
  
  // Generate HTML with Chart.js visualizations
  // ...
}

Benefits of Visualization

The visualization dashboard provides several benefits:

Pattern recognition: Quickly identify common infrastructure used by attackers
Trend analysis: See how attack patterns evolve over time

Future Enhancements

There are several ways DomainHunter could be enhanced in the future:

Machine learning integration: Automatically classify domains based on risk scores
Threat intelligence feeds: Integrate with external threat feeds for additional context
Automated takedown workflow: Create a workflow for submitting abuse reports when there is confirmed malicious activity.
Custom alerting rules: Allow security teams to define their own alerting criteria

Conclusion

DomainHunter demonstrates how Cloudflare Workers, combined with D1 and external APIs, can create a powerful system for detecting and responding to potentially malicious domains. The architecture is:

Distributed: Each component runs as a separate Cloudflare Worker
Scalable: Workers automatically scale to handle load
Responsive: Real-time alerts via Slack
Comprehensive: Enriches domain data from multiple sources
Analytical: Provides visualization tools for threat analysis

This approach to security tooling leverages the serverless paradigm to create a system that is both efficient and effective for security teams. The combination of automated detection, enrichment, and visualization allows for rapid identification and response to potential phishing threats.

For organizations looking to enhance their security posture against domain-based threats, this architecture provides a solid foundation that can be customized and extended based on specific needs.