- New scrape_content() method in enrichment_service: extracts meta
description, H1/H2 headings, paragraphs, images (filtered for size),
social links, service items, and detected languages using BeautifulSoup
- Scans 6 pages per prospect: /, /about, /a-propos, /services,
/nos-services, /contact
- Results stored as JSON in prospect.scraped_content_json
- New endpoints: POST /content-scrape/{id} and /content-scrape/batch
- Added to full_enrichment pipeline (Step 5, before security audit)
- CONTENT_SCRAPE job type for scan-jobs tracking
- "Content Scrape" batch button on scan-jobs page
- Add beautifulsoup4 to requirements.txt
Tested on batirenovation-strasbourg.fr: extracted 30 headings,
21 paragraphs, 13 images.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New model: ProspectSecurityAudit with score, grade, findings_json,
severity counts, has_https, has_valid_ssl, missing_headers, exposed
files, technologies, scan_error
- Add last_security_audit_at timestamp to Prospect model
- Add security_audit 1:1 relationship on Prospect
Part of Phase 1: Security Audit in Enrichment Pipeline. Service,
constants, migration, endpoints, and frontend to follow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrates scanning pipeline from marketing-.lu-domains app into Orion module.
Supports digital (domain scan) and offline (manual capture) lead channels
with enrichment, scoring, campaign management, and interaction tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>