Some checks failed
- Fix contact_type column: Enum(ContactType) → String(20) to match the
migration (fixes "type contacttype does not exist" on insert)
- Rewrite scrape_contacts with structured-first approach:
Phase 1: tel:/mailto: href extraction (high confidence)
Phase 2: regex fallback with SVG/script stripping, international phone
pattern (requires + prefix, min 10 digits)
Phase 3: address extraction from Schema.org JSON-LD, <address> tags,
and European street address regex (FR/DE/EN street keywords)
- URL-decode email values, strip tags to plain text for cross-element
address matching
- Add /mentions-legales to scanned paths
Tested on batirenovation-strasbourg.fr: finds 3 contacts (email, phone,
address) vs 120+ false positives and a crash before.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>