From 78ee05f50e8ef5e1a4c519fa0827a84e37e62c1c Mon Sep 17 00:00:00 2001 From: Samir Boulahtit Date: Sat, 28 Feb 2026 16:04:52 +0100 Subject: [PATCH] docs(prospecting): add scoring, database, and research docs Co-Authored-By: Claude Opus 4.6 --- docs/modules/prospecting/database.md | 171 ++++++++++++++++++ docs/modules/prospecting/research-findings.md | 80 ++++++++ docs/modules/prospecting/scoring.md | 110 +++++++++++ 3 files changed, 361 insertions(+) create mode 100644 docs/modules/prospecting/database.md create mode 100644 docs/modules/prospecting/research-findings.md create mode 100644 docs/modules/prospecting/scoring.md diff --git a/docs/modules/prospecting/database.md b/docs/modules/prospecting/database.md new file mode 100644 index 00000000..daf2d5c5 --- /dev/null +++ b/docs/modules/prospecting/database.md @@ -0,0 +1,171 @@ +# Database Schema + +## Entity Relationship Diagram + +``` +┌─────────────────────┐ ┌────────────────────────┐ +│ prospects │────<│ prospect_tech_profiles │ +├─────────────────────┤ ├────────────────────────┤ +│ id │ │ id │ +│ channel │ │ prospect_id (FK) │ +│ business_name │ │ cms, server │ +│ domain_name │ │ hosting_provider │ +│ status │ │ js_framework, cdn │ +│ source │ │ analytics │ +│ has_website │ │ ecommerce_platform │ +│ uses_https │ │ tech_stack_json (JSON) │ +│ ... │ └────────────────────────┘ +└─────────────────────┘ + │ + │ ┌──────────────────────────────┐ + └──────────────<│ prospect_performance_profiles │ + │ ├──────────────────────────────┤ + │ │ id │ + │ │ prospect_id (FK) │ + │ │ performance_score (0-100) │ + │ │ accessibility_score │ + │ │ seo_score │ + │ │ FCP, LCP, TBT, CLS │ + │ │ is_mobile_friendly │ + │ └──────────────────────────────┘ + │ + │ ┌───────────────────────┐ + └──────────────<│ prospect_scores │ + │ ├───────────────────────┤ + │ │ id │ + │ │ prospect_id (FK) │ + │ │ score (0-100) │ + │ │ technical_health_score│ + │ │ modernity_score │ + │ │ business_value_score │ + │ │ engagement_score │ + │ │ reason_flags (JSON) │ + │ │ lead_tier │ + │ └───────────────────────┘ + │ + │ ┌───────────────────────┐ + └──────────────<│ prospect_contacts │ + │ ├───────────────────────┤ + │ │ id │ + │ │ prospect_id (FK) │ + │ │ contact_type │ + │ │ value │ + │ │ source_url │ + │ │ is_primary │ + │ └───────────────────────┘ + │ + │ ┌───────────────────────┐ + └──────────────<│ prospect_interactions │ + │ ├───────────────────────┤ + │ │ id │ + │ │ prospect_id (FK) │ + │ │ interaction_type │ + │ │ subject, notes │ + │ │ outcome │ + │ │ next_action │ + │ │ next_action_date │ + │ │ created_by_user_id │ + │ └───────────────────────┘ + │ + │ ┌───────────────────────┐ + └──────────────<│ prospect_scan_jobs │ + ├───────────────────────┤ + │ id │ + │ job_type │ + │ status │ + │ total_items │ + │ processed_items │ + │ celery_task_id │ + └───────────────────────┘ + +┌──────────────────────┐ ┌──────────────────┐ +│ campaign_templates │────<│ campaign_sends │ +├──────────────────────┤ ├──────────────────┤ +│ id │ │ id │ +│ name │ │ template_id (FK) │ +│ lead_type │ │ prospect_id (FK) │ +│ channel │ │ channel │ +│ language │ │ rendered_subject │ +│ subject_template │ │ rendered_body │ +│ body_template │ │ status │ +│ is_active │ │ sent_at │ +└──────────────────────┘ │ sent_by_user_id │ + └──────────────────┘ +``` + +## Tables + +### prospects + +Central table for all leads — both digital (domain-based) and offline (in-person). + +| Column | Type | Description | +|--------|------|-------------| +| id | INTEGER PK | Auto-increment | +| channel | ENUM(digital, offline) | How the lead was discovered | +| business_name | VARCHAR(255) | Required for offline | +| domain_name | VARCHAR(255) | Required for digital, unique | +| status | ENUM | pending, active, inactive, parked, error, contacted, converted | +| source | VARCHAR(100) | e.g. "domain_scan", "networking_event", "street" | +| has_website | BOOLEAN | Determined by HTTP check | +| uses_https | BOOLEAN | SSL status | +| http_status_code | INTEGER | Last HTTP response | +| address | VARCHAR(500) | Physical address (offline) | +| city | VARCHAR(100) | City | +| postal_code | VARCHAR(10) | Postal code | +| country | VARCHAR(2) | Default "LU" | +| notes | TEXT | Free-form notes | +| tags | JSON | Flexible tagging | +| captured_by_user_id | INTEGER FK | Who captured this lead | +| location_lat / location_lng | FLOAT | GPS from mobile capture | +| last_*_at | DATETIME | Timestamps for each scan type | + +### prospect_tech_profiles + +Technology stack detection results. One per prospect. + +| Column | Type | Description | +|--------|------|-------------| +| cms | VARCHAR(100) | WordPress, Drupal, Joomla, etc. | +| server | VARCHAR(100) | Nginx, Apache | +| hosting_provider | VARCHAR(100) | Hosting company | +| cdn | VARCHAR(100) | CDN provider | +| js_framework | VARCHAR(100) | React, Vue, Angular, jQuery | +| analytics | VARCHAR(200) | Google Analytics, Matomo, etc. | +| ecommerce_platform | VARCHAR(100) | Shopify, WooCommerce, etc. | +| tech_stack_json | JSON | Full detection results | + +### prospect_performance_profiles + +Lighthouse audit results. One per prospect. + +| Column | Type | Description | +|--------|------|-------------| +| performance_score | INTEGER | 0-100 | +| accessibility_score | INTEGER | 0-100 | +| seo_score | INTEGER | 0-100 | +| first_contentful_paint_ms | INTEGER | FCP | +| largest_contentful_paint_ms | INTEGER | LCP | +| total_blocking_time_ms | INTEGER | TBT | +| cumulative_layout_shift | FLOAT | CLS | +| is_mobile_friendly | BOOLEAN | Mobile test | + +### prospect_scores + +Calculated opportunity scores. One per prospect. See [scoring.md](scoring.md) for algorithm details. + +### prospect_contacts + +Scraped or manually entered contact info. Many per prospect. + +### prospect_interactions + +CRM-style interaction log. Many per prospect. Types: note, call, email_sent, email_received, meeting, visit, sms, proposal_sent. + +### prospect_scan_jobs + +Background job tracking for batch operations. + +### campaign_templates / campaign_sends + +Marketing campaign templates and send tracking. Templates support placeholders like `{business_name}`, `{domain}`, `{score}`, `{issues}`. diff --git a/docs/modules/prospecting/research-findings.md b/docs/modules/prospecting/research-findings.md new file mode 100644 index 00000000..085cdc28 --- /dev/null +++ b/docs/modules/prospecting/research-findings.md @@ -0,0 +1,80 @@ +# .lu Domain Lead Generation — Research Findings + +Research on data sources, APIs, legal requirements, and cost analysis for the prospecting module. + +## 1. Data Sources for .lu Domains + +The official .lu registry (DNS-LU / RESTENA) does **not** publish zone files. All providers use web crawling to discover domains, so no list is 100% complete. Expect 70-80% coverage. + +### Providers + +| Provider | Domains | Price | Format | Notes | +|----------|---------|-------|--------|-------| +| NetworksDB | ~70,000 | $5 | Zipped text | Best value, one-time purchase | +| DomainMetaData | Varies | $9.90/mo | CSV | Daily updates | +| Webatla | ~75,000 | Unknown | CSV | Good coverage | + +## 2. Technical APIs — Cost Analysis + +### Technology Detection + +| Service | Free Tier | Notes | +|---------|-----------|-------| +| CRFT Lookup | Unlimited | Budget option, includes Lighthouse | +| Wappalyzer | 50/month | Most accurate | +| WhatCMS | Free lookups | CMS-only | + +**Approach used**: Custom HTML parsing for CMS, JS framework, analytics, and server detection (no external API dependency). + +### Performance Audits + +PageSpeed Insights API — **free**, 25,000 queries/day, 400/100 seconds. + +### SSL Checks + +Simple HTTPS connectivity check (fast). SSL Labs API available for deep analysis of high-priority leads. + +### WHOIS + +Due to GDPR, .lu WHOIS data for private individuals is hidden. Only owner type and country visible. Contact info scraped from websites instead. + +## 3. Legal — Luxembourg & GDPR + +### B2B Cold Email Rules + +Luxembourg has **no specific B2B cold email restrictions** per Article 11(1) of the Electronic Privacy Act (applies only to natural persons). + +**Requirements**: +1. Identify yourself clearly (company name, address) +2. Provide opt-out mechanism in every email +3. Message must relate to recipient's business +4. Store contact data securely +5. Only contact businesses, not private individuals + +**Legal basis**: Legitimate interest (GDPR Art. 6(1)(f)) + +### GDPR Penalties + +Fines up to EUR 20 million or 4% of global revenue for violations. + +**Key violations to avoid**: +- Emailing private individuals without consent +- No opt-out mechanism +- Holding personal data longer than necessary + +### Recommendation + +- Focus on `info@`, `contact@`, and business role emails +- Always include unsubscribe link +- Document legitimate interest basis + +## 4. Cost Summary + +| Item | Cost | Type | +|------|------|------| +| Domain list (NetworksDB) | $5 | One-time | +| PageSpeed API | Free | Ongoing | +| Contact scraping | Free | Self-hosted | +| Tech detection | Free | Self-hosted | + +Working MVP costs under $25 total. diff --git a/docs/modules/prospecting/scoring.md b/docs/modules/prospecting/scoring.md new file mode 100644 index 00000000..015140cd --- /dev/null +++ b/docs/modules/prospecting/scoring.md @@ -0,0 +1,110 @@ +# Opportunity Scoring Model + +## Overview + +The scoring model assigns each prospect a score from 0-100 based on the opportunity potential for offering web services. Higher scores indicate better leads. The model supports two channels: **digital** (domain-based) and **offline** (in-person discovery). + +## Score Components — Digital Channel + +### Technical Health (Max 40 points) + +Issues that indicate immediate opportunities: + +| Issue | Points | Condition | +|-------|--------|-----------| +| No SSL | 15 | `uses_https = false` | +| Very Slow | 15 | `performance_score < 30` | +| Slow | 10 | `performance_score < 50` | +| Moderate Speed | 5 | `performance_score < 70` | +| Not Mobile Friendly | 10 | `is_mobile_friendly = false` | + +### Modernity / Stack (Max 25 points) + +Outdated technology stack: + +| Issue | Points | Condition | +|-------|--------|-----------| +| Outdated CMS | 15 | CMS is Drupal, Joomla, or TYPO3 | +| Unknown CMS | 5 | No CMS detected but has website | +| Legacy JavaScript | 5 | Uses jQuery without modern framework | +| No Analytics | 5 | No Google Analytics or similar | + +### Business Value (Max 25 points) + +Indicators of business potential: + +| Factor | Points | Condition | +|--------|--------|-----------| +| Has Website | 10 | Active website exists | +| Has E-commerce | 10 | E-commerce platform detected | +| Short Domain | 5 | Domain name <= 15 characters | + +### Engagement Potential (Max 10 points) + +Ability to contact the business: + +| Factor | Points | Condition | +|--------|--------|-----------| +| Has Contacts | 5 | Any contact info found | +| Has Email | 3 | Email address found | +| Has Phone | 2 | Phone number found | + +## Score Components — Offline Channel + +Offline leads have a simplified scoring model based on the information captured during in-person encounters: + +| Scenario | Technical Health | Modernity | Business Value | Engagement | Total | +|----------|-----------------|-----------|----------------|------------|-------| +| No website at all | 30 | 20 | 20 | 0 | **70** (top_priority) | +| Uses gmail/free email | +0 | +10 | +0 | +0 | +10 | +| Met in person | +0 | +0 | +0 | +5 | +5 | +| Has email contact | +0 | +0 | +0 | +3 | +3 | +| Has phone contact | +0 | +0 | +0 | +2 | +2 | + +A business with no website met in person with contact info scores: 70 + 5 + 3 + 2 = **80** (top_priority). + +## Lead Tiers + +Based on the total score: + +| Tier | Score Range | Description | +|------|-------------|-------------| +| `top_priority` | 70-100 | Best leads, multiple issues or no website at all | +| `quick_win` | 50-69 | Good leads, 1-2 easy fixes | +| `strategic` | 30-49 | Moderate potential | +| `low_priority` | 0-29 | Low opportunity | + +## Reason Flags + +Each score includes `reason_flags` that explain why points were awarded: + +```json +{ + "score": 78, + "reason_flags": ["no_ssl", "slow", "outdated_cms"], + "lead_tier": "top_priority" +} +``` + +Common flags (digital): +- `no_ssl` — Missing HTTPS +- `very_slow` — Performance score < 30 +- `slow` — Performance score < 50 +- `not_mobile_friendly` — Fails mobile tests +- `outdated_cms` — Using old CMS +- `legacy_js` — Using jQuery only +- `no_analytics` — No tracking installed + +Offline-specific flags: +- `no_website` — Business has no website +- `uses_gmail` — Uses free email provider +- `met_in_person` — Lead captured in person (warm lead) + +## Customizing the Model + +The scoring logic is in `app/modules/prospecting/services/scoring_service.py`. You can adjust: + +1. **Point values** — Change weights for different issues +2. **Thresholds** — Adjust performance score cutoffs +3. **Conditions** — Add new scoring criteria +4. **Tier boundaries** — Change score ranges for tiers