1. Our Data-Collection Methods — At a Glance

Collection Channel What We Take Where It Comes From Legal Basis*
1. Public-web crawling (Common Crawl) • Business domain
• Contact-form URL
• Public business e-mail/phone if present on the same page
Monthly snapshots of the open-source Common Crawl corpus. Our parser only visits pages permitted by each site’s robots.txt and never bypasses technical controls (e.g., CAPTCHAs, paywalls). Legitimate interest (GDPR Art. 6 (1)(f))
§ 1798.145 (l) CCPA research exemption
2. Voluntary submissions • Data clients upload for hygiene/validation Direct from customer Contract (GDPR Art. 6 (1)(b))
3. Business-directory & government filings • Firm name, reg. number, officer names, capital, NAICS/SIC codes Open corporate registers, SEC/Companies House, ministry of commerce portals Legitimate interest / public task
4. Opt-in enrichment partners • Size band, revenue bracket, LinkedIn URL Vendors certified under ISO 27001 & SOC 2 Legitimate interest / consent obtained by partner

* Under the CCPA/CPRA we operate as a “service provider / contractor” and do not “sell” or “share” personal information for cross-context behavioural advertising. B2B records are nevertheless handled as if fully in scope.


2. Detailed Workflow

  1. Seed & Crawl

    • We seed the crawler with CC Index URLs, filter to .html, .php, .asp, etc., then apply keyword heuristics (contact, support, enquiry, form, etc.).

    • The crawler respects robots.txt directives (Disallow, Crawl-delay, Noindex), user-agent string: LeadsBigBot (+https://leadsbig.co/bot).

    • Only HTTP 200 responses are stored; 4xx/5xx pages are discarded.

  2. Parsing & Deduplication

    • HTML is parsed with Cheerio; forms are captured when an <form> contains an <input> with type="email" | "text" | "textarea" and an action attribute.

    • Duplicate domains or URLs are removed via SHA-256 hashing before storage.

  3. Data Hygiene & Validation

    • MX-lookup and SMTP-ping verify e-mail deliverability (no message is ever sent).

    • Phone numbers are normalised.

    • We cross-check against industry opt-out/suppression lists and users who have exercised data-subject rights.

  4. Human Rights & Sensitive Data Check

    • Automatic classifiers remove pages that contain special-category data (GDPR Art. 9) or content related to minors.

    • We do not collect passwords, financial account numbers, or consumer-credit data.

  5. Encryption & Storage

    • Ingestion happens on isolated VMs; data at rest is AES-256 encrypted, in transit via TLS 1.3.

    • Access is role-based (RBAC) and logged; logs stored for 12 months.


3. Why This Is Lawful

GDPR Principle Our Implementation
Lawfulness, Fairness, Transparency Public notice, this article, Art. 14 “indirect source” e-mail template, one-click opt-out
Purpose Limitation Sole purpose: legitimate B2B outreach & lead generation
Data Minimisation Collect only business-relevant identifiers; strip personal notes/comments
Accuracy Monthly refreshes; stale records purged after 12 months
Storage Limitation Raw crawl kept 90 days; curated dataset retained up to 24 months or until customer deletion
Integrity & Confidentiality ISO 27001-aligned security programme, quarterly pen-tests
Accountability Records of Processing Activities (RoPA), DPIA updated annually

Under CCPA/CPRA we:

  • honour “Do Not Sell or Share” signals via the GPC header and webform opt-out;

  • provide an e-mail ([email protected]);

  • disclose data categories collected (§ 1798.110), purposes (§ 1798.115), and retention (§ 1798.100(a)(3)).


4. Your Rights & Opt-Out Channels

Region Rights How to Exercise
EU/UK Access • Rectification • Erasure • Restriction • Objection • Portability E-mail [email protected] or use our self-service portal
USA (California) Know • Delete • Correct • Opt-out of Sale/Share • Limit Use of Sensitive PI Webform “Do Not Sell” + browser Global Privacy Control
Canada Access • Correction • Withdrawal of consent Same portal
Global Unsubscribe from further processing One-click suppression link bundled with every dataset we deliver

We respond within 30 days (GDPR) or 45 days (CCPA) and maintain a rolling suppression list to ensure permanent deletion unless law requires retention.


5. Children’s Data

Leads Big does not knowingly collect or sell data relating to individuals under 16. Our keyword filters flag and purge any domain that targets minors (schools, clubs, .kids domains).


6. International Transfers

Primary storage is in Frankfurt, Germany (AWS eu-central-1).
If we must transfer data outside the EEA/UK we rely on Standard Contractual Clauses 2021/914/EU, plus supplementary encryption and limited-purpose access.


7. Updates to This Notice

We review the workflow quarterly. Material changes trigger a banner on Leadsbig.co and, where feasible, e-mail notice to affected contacts at least 30 days beforehand.

Last updated: 9 May 2025


To request removal of your data: https://leadsbig.co/remove-my-data/