docs(prospecting): add scoring, database, and research docs
Some checks failed
Some checks failed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
171
docs/modules/prospecting/database.md
Normal file
171
docs/modules/prospecting/database.md
Normal file
@@ -0,0 +1,171 @@
|
||||
# Database Schema
|
||||
|
||||
## Entity Relationship Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────┐ ┌────────────────────────┐
|
||||
│ prospects │────<│ prospect_tech_profiles │
|
||||
├─────────────────────┤ ├────────────────────────┤
|
||||
│ id │ │ id │
|
||||
│ channel │ │ prospect_id (FK) │
|
||||
│ business_name │ │ cms, server │
|
||||
│ domain_name │ │ hosting_provider │
|
||||
│ status │ │ js_framework, cdn │
|
||||
│ source │ │ analytics │
|
||||
│ has_website │ │ ecommerce_platform │
|
||||
│ uses_https │ │ tech_stack_json (JSON) │
|
||||
│ ... │ └────────────────────────┘
|
||||
└─────────────────────┘
|
||||
│
|
||||
│ ┌──────────────────────────────┐
|
||||
└──────────────<│ prospect_performance_profiles │
|
||||
│ ├──────────────────────────────┤
|
||||
│ │ id │
|
||||
│ │ prospect_id (FK) │
|
||||
│ │ performance_score (0-100) │
|
||||
│ │ accessibility_score │
|
||||
│ │ seo_score │
|
||||
│ │ FCP, LCP, TBT, CLS │
|
||||
│ │ is_mobile_friendly │
|
||||
│ └──────────────────────────────┘
|
||||
│
|
||||
│ ┌───────────────────────┐
|
||||
└──────────────<│ prospect_scores │
|
||||
│ ├───────────────────────┤
|
||||
│ │ id │
|
||||
│ │ prospect_id (FK) │
|
||||
│ │ score (0-100) │
|
||||
│ │ technical_health_score│
|
||||
│ │ modernity_score │
|
||||
│ │ business_value_score │
|
||||
│ │ engagement_score │
|
||||
│ │ reason_flags (JSON) │
|
||||
│ │ lead_tier │
|
||||
│ └───────────────────────┘
|
||||
│
|
||||
│ ┌───────────────────────┐
|
||||
└──────────────<│ prospect_contacts │
|
||||
│ ├───────────────────────┤
|
||||
│ │ id │
|
||||
│ │ prospect_id (FK) │
|
||||
│ │ contact_type │
|
||||
│ │ value │
|
||||
│ │ source_url │
|
||||
│ │ is_primary │
|
||||
│ └───────────────────────┘
|
||||
│
|
||||
│ ┌───────────────────────┐
|
||||
└──────────────<│ prospect_interactions │
|
||||
│ ├───────────────────────┤
|
||||
│ │ id │
|
||||
│ │ prospect_id (FK) │
|
||||
│ │ interaction_type │
|
||||
│ │ subject, notes │
|
||||
│ │ outcome │
|
||||
│ │ next_action │
|
||||
│ │ next_action_date │
|
||||
│ │ created_by_user_id │
|
||||
│ └───────────────────────┘
|
||||
│
|
||||
│ ┌───────────────────────┐
|
||||
└──────────────<│ prospect_scan_jobs │
|
||||
├───────────────────────┤
|
||||
│ id │
|
||||
│ job_type │
|
||||
│ status │
|
||||
│ total_items │
|
||||
│ processed_items │
|
||||
│ celery_task_id │
|
||||
└───────────────────────┘
|
||||
|
||||
┌──────────────────────┐ ┌──────────────────┐
|
||||
│ campaign_templates │────<│ campaign_sends │
|
||||
├──────────────────────┤ ├──────────────────┤
|
||||
│ id │ │ id │
|
||||
│ name │ │ template_id (FK) │
|
||||
│ lead_type │ │ prospect_id (FK) │
|
||||
│ channel │ │ channel │
|
||||
│ language │ │ rendered_subject │
|
||||
│ subject_template │ │ rendered_body │
|
||||
│ body_template │ │ status │
|
||||
│ is_active │ │ sent_at │
|
||||
└──────────────────────┘ │ sent_by_user_id │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
## Tables
|
||||
|
||||
### prospects
|
||||
|
||||
Central table for all leads — both digital (domain-based) and offline (in-person).
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | INTEGER PK | Auto-increment |
|
||||
| channel | ENUM(digital, offline) | How the lead was discovered |
|
||||
| business_name | VARCHAR(255) | Required for offline |
|
||||
| domain_name | VARCHAR(255) | Required for digital, unique |
|
||||
| status | ENUM | pending, active, inactive, parked, error, contacted, converted |
|
||||
| source | VARCHAR(100) | e.g. "domain_scan", "networking_event", "street" |
|
||||
| has_website | BOOLEAN | Determined by HTTP check |
|
||||
| uses_https | BOOLEAN | SSL status |
|
||||
| http_status_code | INTEGER | Last HTTP response |
|
||||
| address | VARCHAR(500) | Physical address (offline) |
|
||||
| city | VARCHAR(100) | City |
|
||||
| postal_code | VARCHAR(10) | Postal code |
|
||||
| country | VARCHAR(2) | Default "LU" |
|
||||
| notes | TEXT | Free-form notes |
|
||||
| tags | JSON | Flexible tagging |
|
||||
| captured_by_user_id | INTEGER FK | Who captured this lead |
|
||||
| location_lat / location_lng | FLOAT | GPS from mobile capture |
|
||||
| last_*_at | DATETIME | Timestamps for each scan type |
|
||||
|
||||
### prospect_tech_profiles
|
||||
|
||||
Technology stack detection results. One per prospect.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| cms | VARCHAR(100) | WordPress, Drupal, Joomla, etc. |
|
||||
| server | VARCHAR(100) | Nginx, Apache |
|
||||
| hosting_provider | VARCHAR(100) | Hosting company |
|
||||
| cdn | VARCHAR(100) | CDN provider |
|
||||
| js_framework | VARCHAR(100) | React, Vue, Angular, jQuery |
|
||||
| analytics | VARCHAR(200) | Google Analytics, Matomo, etc. |
|
||||
| ecommerce_platform | VARCHAR(100) | Shopify, WooCommerce, etc. |
|
||||
| tech_stack_json | JSON | Full detection results |
|
||||
|
||||
### prospect_performance_profiles
|
||||
|
||||
Lighthouse audit results. One per prospect.
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| performance_score | INTEGER | 0-100 |
|
||||
| accessibility_score | INTEGER | 0-100 |
|
||||
| seo_score | INTEGER | 0-100 |
|
||||
| first_contentful_paint_ms | INTEGER | FCP |
|
||||
| largest_contentful_paint_ms | INTEGER | LCP |
|
||||
| total_blocking_time_ms | INTEGER | TBT |
|
||||
| cumulative_layout_shift | FLOAT | CLS |
|
||||
| is_mobile_friendly | BOOLEAN | Mobile test |
|
||||
|
||||
### prospect_scores
|
||||
|
||||
Calculated opportunity scores. One per prospect. See [scoring.md](scoring.md) for algorithm details.
|
||||
|
||||
### prospect_contacts
|
||||
|
||||
Scraped or manually entered contact info. Many per prospect.
|
||||
|
||||
### prospect_interactions
|
||||
|
||||
CRM-style interaction log. Many per prospect. Types: note, call, email_sent, email_received, meeting, visit, sms, proposal_sent.
|
||||
|
||||
### prospect_scan_jobs
|
||||
|
||||
Background job tracking for batch operations.
|
||||
|
||||
### campaign_templates / campaign_sends
|
||||
|
||||
Marketing campaign templates and send tracking. Templates support placeholders like `{business_name}`, `{domain}`, `{score}`, `{issues}`.
|
||||
80
docs/modules/prospecting/research-findings.md
Normal file
80
docs/modules/prospecting/research-findings.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# .lu Domain Lead Generation — Research Findings
|
||||
|
||||
Research on data sources, APIs, legal requirements, and cost analysis for the prospecting module.
|
||||
|
||||
## 1. Data Sources for .lu Domains
|
||||
|
||||
The official .lu registry (DNS-LU / RESTENA) does **not** publish zone files. All providers use web crawling to discover domains, so no list is 100% complete. Expect 70-80% coverage.
|
||||
|
||||
### Providers
|
||||
|
||||
| Provider | Domains | Price | Format | Notes |
|
||||
|----------|---------|-------|--------|-------|
|
||||
| NetworksDB | ~70,000 | $5 | Zipped text | Best value, one-time purchase |
|
||||
| DomainMetaData | Varies | $9.90/mo | CSV | Daily updates |
|
||||
| Webatla | ~75,000 | Unknown | CSV | Good coverage |
|
||||
|
||||
## 2. Technical APIs — Cost Analysis
|
||||
|
||||
### Technology Detection
|
||||
|
||||
| Service | Free Tier | Notes |
|
||||
|---------|-----------|-------|
|
||||
| CRFT Lookup | Unlimited | Budget option, includes Lighthouse |
|
||||
| Wappalyzer | 50/month | Most accurate |
|
||||
| WhatCMS | Free lookups | CMS-only |
|
||||
|
||||
**Approach used**: Custom HTML parsing for CMS, JS framework, analytics, and server detection (no external API dependency).
|
||||
|
||||
### Performance Audits
|
||||
|
||||
PageSpeed Insights API — **free**, 25,000 queries/day, 400/100 seconds.
|
||||
|
||||
### SSL Checks
|
||||
|
||||
Simple HTTPS connectivity check (fast). SSL Labs API available for deep analysis of high-priority leads.
|
||||
|
||||
### WHOIS
|
||||
|
||||
Due to GDPR, .lu WHOIS data for private individuals is hidden. Only owner type and country visible. Contact info scraped from websites instead.
|
||||
|
||||
## 3. Legal — Luxembourg & GDPR
|
||||
|
||||
### B2B Cold Email Rules
|
||||
|
||||
Luxembourg has **no specific B2B cold email restrictions** per Article 11(1) of the Electronic Privacy Act (applies only to natural persons).
|
||||
|
||||
**Requirements**:
|
||||
1. Identify yourself clearly (company name, address)
|
||||
2. Provide opt-out mechanism in every email
|
||||
3. Message must relate to recipient's business
|
||||
4. Store contact data securely
|
||||
5. Only contact businesses, not private individuals
|
||||
|
||||
**Legal basis**: Legitimate interest (GDPR Art. 6(1)(f))
|
||||
|
||||
### GDPR Penalties
|
||||
|
||||
Fines up to EUR 20 million or 4% of global revenue for violations.
|
||||
|
||||
**Key violations to avoid**:
|
||||
- Emailing private individuals without consent
|
||||
- No opt-out mechanism
|
||||
- Holding personal data longer than necessary
|
||||
|
||||
### Recommendation
|
||||
|
||||
- Focus on `info@`, `contact@`, and business role emails
|
||||
- Always include unsubscribe link
|
||||
- Document legitimate interest basis
|
||||
|
||||
## 4. Cost Summary
|
||||
|
||||
| Item | Cost | Type |
|
||||
|------|------|------|
|
||||
| Domain list (NetworksDB) | $5 | One-time |
|
||||
| PageSpeed API | Free | Ongoing |
|
||||
| Contact scraping | Free | Self-hosted |
|
||||
| Tech detection | Free | Self-hosted |
|
||||
|
||||
Working MVP costs under $25 total.
|
||||
110
docs/modules/prospecting/scoring.md
Normal file
110
docs/modules/prospecting/scoring.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# Opportunity Scoring Model
|
||||
|
||||
## Overview
|
||||
|
||||
The scoring model assigns each prospect a score from 0-100 based on the opportunity potential for offering web services. Higher scores indicate better leads. The model supports two channels: **digital** (domain-based) and **offline** (in-person discovery).
|
||||
|
||||
## Score Components — Digital Channel
|
||||
|
||||
### Technical Health (Max 40 points)
|
||||
|
||||
Issues that indicate immediate opportunities:
|
||||
|
||||
| Issue | Points | Condition |
|
||||
|-------|--------|-----------|
|
||||
| No SSL | 15 | `uses_https = false` |
|
||||
| Very Slow | 15 | `performance_score < 30` |
|
||||
| Slow | 10 | `performance_score < 50` |
|
||||
| Moderate Speed | 5 | `performance_score < 70` |
|
||||
| Not Mobile Friendly | 10 | `is_mobile_friendly = false` |
|
||||
|
||||
### Modernity / Stack (Max 25 points)
|
||||
|
||||
Outdated technology stack:
|
||||
|
||||
| Issue | Points | Condition |
|
||||
|-------|--------|-----------|
|
||||
| Outdated CMS | 15 | CMS is Drupal, Joomla, or TYPO3 |
|
||||
| Unknown CMS | 5 | No CMS detected but has website |
|
||||
| Legacy JavaScript | 5 | Uses jQuery without modern framework |
|
||||
| No Analytics | 5 | No Google Analytics or similar |
|
||||
|
||||
### Business Value (Max 25 points)
|
||||
|
||||
Indicators of business potential:
|
||||
|
||||
| Factor | Points | Condition |
|
||||
|--------|--------|-----------|
|
||||
| Has Website | 10 | Active website exists |
|
||||
| Has E-commerce | 10 | E-commerce platform detected |
|
||||
| Short Domain | 5 | Domain name <= 15 characters |
|
||||
|
||||
### Engagement Potential (Max 10 points)
|
||||
|
||||
Ability to contact the business:
|
||||
|
||||
| Factor | Points | Condition |
|
||||
|--------|--------|-----------|
|
||||
| Has Contacts | 5 | Any contact info found |
|
||||
| Has Email | 3 | Email address found |
|
||||
| Has Phone | 2 | Phone number found |
|
||||
|
||||
## Score Components — Offline Channel
|
||||
|
||||
Offline leads have a simplified scoring model based on the information captured during in-person encounters:
|
||||
|
||||
| Scenario | Technical Health | Modernity | Business Value | Engagement | Total |
|
||||
|----------|-----------------|-----------|----------------|------------|-------|
|
||||
| No website at all | 30 | 20 | 20 | 0 | **70** (top_priority) |
|
||||
| Uses gmail/free email | +0 | +10 | +0 | +0 | +10 |
|
||||
| Met in person | +0 | +0 | +0 | +5 | +5 |
|
||||
| Has email contact | +0 | +0 | +0 | +3 | +3 |
|
||||
| Has phone contact | +0 | +0 | +0 | +2 | +2 |
|
||||
|
||||
A business with no website met in person with contact info scores: 70 + 5 + 3 + 2 = **80** (top_priority).
|
||||
|
||||
## Lead Tiers
|
||||
|
||||
Based on the total score:
|
||||
|
||||
| Tier | Score Range | Description |
|
||||
|------|-------------|-------------|
|
||||
| `top_priority` | 70-100 | Best leads, multiple issues or no website at all |
|
||||
| `quick_win` | 50-69 | Good leads, 1-2 easy fixes |
|
||||
| `strategic` | 30-49 | Moderate potential |
|
||||
| `low_priority` | 0-29 | Low opportunity |
|
||||
|
||||
## Reason Flags
|
||||
|
||||
Each score includes `reason_flags` that explain why points were awarded:
|
||||
|
||||
```json
|
||||
{
|
||||
"score": 78,
|
||||
"reason_flags": ["no_ssl", "slow", "outdated_cms"],
|
||||
"lead_tier": "top_priority"
|
||||
}
|
||||
```
|
||||
|
||||
Common flags (digital):
|
||||
- `no_ssl` — Missing HTTPS
|
||||
- `very_slow` — Performance score < 30
|
||||
- `slow` — Performance score < 50
|
||||
- `not_mobile_friendly` — Fails mobile tests
|
||||
- `outdated_cms` — Using old CMS
|
||||
- `legacy_js` — Using jQuery only
|
||||
- `no_analytics` — No tracking installed
|
||||
|
||||
Offline-specific flags:
|
||||
- `no_website` — Business has no website
|
||||
- `uses_gmail` — Uses free email provider
|
||||
- `met_in_person` — Lead captured in person (warm lead)
|
||||
|
||||
## Customizing the Model
|
||||
|
||||
The scoring logic is in `app/modules/prospecting/services/scoring_service.py`. You can adjust:
|
||||
|
||||
1. **Point values** — Change weights for different issues
|
||||
2. **Thresholds** — Adjust performance score cutoffs
|
||||
3. **Conditions** — Add new scoring criteria
|
||||
4. **Tier boundaries** — Change score ranges for tiers
|
||||
Reference in New Issue
Block a user