1. Our Data-Collection Methods — At a Glance
Collection Channel | What We Take | Where It Comes From | Legal Basis* |
---|---|---|---|
1. Public-web crawling (Common Crawl) | • Business domain • Contact-form URL • Public business e-mail/phone if present on the same page |
Monthly snapshots of the open-source Common Crawl corpus. Our parser only visits pages permitted by each site’s robots.txt and never bypasses technical controls (e.g., CAPTCHAs, paywalls). |
Legitimate interest (GDPR Art. 6 (1)(f)) § 1798.145 (l) CCPA research exemption |
2. Voluntary submissions | • Data clients upload for hygiene/validation | Direct from customer | Contract (GDPR Art. 6 (1)(b)) |
3. Business-directory & government filings | • Firm name, reg. number, officer names, capital, NAICS/SIC codes | Open corporate registers, SEC/Companies House, ministry of commerce portals | Legitimate interest / public task |
4. Opt-in enrichment partners | • Size band, revenue bracket, LinkedIn URL | Vendors certified under ISO 27001 & SOC 2 | Legitimate interest / consent obtained by partner |
* Under the CCPA/CPRA we operate as a “service provider / contractor” and do not “sell” or “share” personal information for cross-context behavioural advertising. B2B records are nevertheless handled as if fully in scope.
2. Detailed Workflow
-
Seed & Crawl
-
We seed the crawler with CC Index URLs, filter to
.html
,.php
,.asp
, etc., then apply keyword heuristics (contact
,support
,enquiry
,form
, etc.). -
The crawler respects
robots.txt
directives (Disallow
,Crawl-delay
,Noindex
), user-agent string:LeadsBigBot (+https://leadsbig.co/bot)
. -
Only HTTP 200 responses are stored; 4xx/5xx pages are discarded.
-
-
Parsing & Deduplication
-
HTML is parsed with Cheerio; forms are captured when an
<form>
contains an<input>
withtype="email" | "text" | "textarea"
and an action attribute. -
Duplicate domains or URLs are removed via SHA-256 hashing before storage.
-
-
Data Hygiene & Validation
-
MX-lookup and SMTP-ping verify e-mail deliverability (no message is ever sent).
-
Phone numbers are normalised.
-
We cross-check against industry opt-out/suppression lists and users who have exercised data-subject rights.
-
-
Human Rights & Sensitive Data Check
-
Automatic classifiers remove pages that contain special-category data (GDPR Art. 9) or content related to minors.
-
We do not collect passwords, financial account numbers, or consumer-credit data.
-
-
Encryption & Storage
-
Ingestion happens on isolated VMs; data at rest is AES-256 encrypted, in transit via TLS 1.3.
-
Access is role-based (RBAC) and logged; logs stored for 12 months.
-
3. Why This Is Lawful
GDPR Principle | Our Implementation |
---|---|
Lawfulness, Fairness, Transparency | Public notice, this article, Art. 14 “indirect source” e-mail template, one-click opt-out |
Purpose Limitation | Sole purpose: legitimate B2B outreach & lead generation |
Data Minimisation | Collect only business-relevant identifiers; strip personal notes/comments |
Accuracy | Monthly refreshes; stale records purged after 12 months |
Storage Limitation | Raw crawl kept 90 days; curated dataset retained up to 24 months or until customer deletion |
Integrity & Confidentiality | ISO 27001-aligned security programme, quarterly pen-tests |
Accountability | Records of Processing Activities (RoPA), DPIA updated annually |
Under CCPA/CPRA we:
-
honour “Do Not Sell or Share” signals via the GPC header and webform opt-out;
-
provide an e-mail ([email protected]);
-
disclose data categories collected (§ 1798.110), purposes (§ 1798.115), and retention (§ 1798.100(a)(3)).
4. Your Rights & Opt-Out Channels
Region | Rights | How to Exercise |
---|---|---|
EU/UK | Access • Rectification • Erasure • Restriction • Objection • Portability | E-mail [email protected] or use our self-service portal |
USA (California) | Know • Delete • Correct • Opt-out of Sale/Share • Limit Use of Sensitive PI | Webform “Do Not Sell” + browser Global Privacy Control |
Canada | Access • Correction • Withdrawal of consent | Same portal |
Global | Unsubscribe from further processing | One-click suppression link bundled with every dataset we deliver |
We respond within 30 days (GDPR) or 45 days (CCPA) and maintain a rolling suppression list to ensure permanent deletion unless law requires retention.
5. Children’s Data
Leads Big does not knowingly collect or sell data relating to individuals under 16. Our keyword filters flag and purge any domain that targets minors (schools, clubs, .kids domains).
6. International Transfers
Primary storage is in Frankfurt, Germany (AWS eu-central-1).
If we must transfer data outside the EEA/UK we rely on Standard Contractual Clauses 2021/914/EU, plus supplementary encryption and limited-purpose access.
7. Updates to This Notice
We review the workflow quarterly. Material changes trigger a banner on Leadsbig.co and, where feasible, e-mail notice to affected contacts at least 30 days beforehand.
Last updated: 9 May 2025