# Prospecting Contact Scraper — Fix Enum + Improve Regex ## Problem 1: DB Enum type mismatch `ProspectContact.contact_type` is defined as a Python Enum (`contacttype`) in the model, but the DB column was created as a plain `VARCHAR` in the migration. When SQLAlchemy inserts, it casts to `::contacttype` which doesn't exist in PostgreSQL. **Error:** `type "contacttype" does not exist` **File:** `app/modules/prospecting/models/prospect_contact.py` **Fix options:** - A) Change the model column from `Enum(ContactType)` to `String` to match the migration - B) Create an Alembic migration to add the `contacttype` enum to PostgreSQL Option A is simpler and consistent with how the scraper creates contacts (using plain strings like `"email"`, `"phone"`). ## Problem 2: Phone regex too loose and Luxembourg-specific The phone pattern `(?:\+352|00352)?[\s.-]?\d{2,3}[\s.-]?\d{2,3}[\s.-]?\d{2,3}` has two issues: 1. **Too loose** — matches any 6-9 digit sequence (CSS values, timestamps, hex colors, zip codes). On batirenovation-strasbourg.fr it found 120+ false positives. 2. **Luxembourg-only** — only recognizes `+352`/`00352` prefix. This is a French site with `+33` numbers. **File:** `app/modules/prospecting/services/enrichment_service.py:274` **Fix:** Replace with a broader international phone regex: ```python phone_pattern = re.compile( r'(?:\+\d{1,3}[\s.-]?)?' # optional international prefix (+33, +352, etc.) r'\(?\d{1,4}\)?[\s.-]?' # area code with optional parens r'\d{2,4}[\s.-]?' # first group r'\d{2,4}(?:[\s.-]?\d{2,4})?' # second group + optional third ) ``` Also add minimum length filter (10+ digits for international numbers) and exclude patterns that look like dates, hex colors, or CSS values. ## Problem 3: Email with URL-encoded space The scraper finds `%20btirenovation@gmail.com` (from an `href="mailto:%20btirenovation@gmail.com"`) alongside the clean `btirenovation@gmail.com`. The `%20` prefix should be stripped. **File:** `app/modules/prospecting/services/enrichment_service.py:293-303` **Fix:** URL-decode email values before storing, or strip `%20` prefix. ## Files to change | File | Change | |---|---| | `prospecting/models/prospect_contact.py` | Change `contact_type` from `Enum` to `String` | | `prospecting/services/enrichment_service.py` | Improve phone regex, add min-length filter, URL-decode emails | | Alembic migration | If needed for the enum change |