- New scrape_content() method in enrichment_service: extracts meta
description, H1/H2 headings, paragraphs, images (filtered for size),
social links, service items, and detected languages using BeautifulSoup
- Scans 6 pages per prospect: /, /about, /a-propos, /services,
/nos-services, /contact
- Results stored as JSON in prospect.scraped_content_json
- New endpoints: POST /content-scrape/{id} and /content-scrape/batch
- Added to full_enrichment pipeline (Step 5, before security audit)
- CONTENT_SCRAPE job type for scan-jobs tracking
- "Content Scrape" batch button on scan-jobs page
- Add beautifulsoup4 to requirements.txt
Tested on batirenovation-strasbourg.fr: extracted 30 headings,
21 paragraphs, 13 images.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
62 lines
1.1 KiB
Plaintext
62 lines
1.1 KiB
Plaintext
# requirements.txt - Python 3.13.5 compatible
|
|
# Core FastAPI and web framework
|
|
starlette==0.41.3
|
|
fastapi==0.115.5
|
|
uvicorn[standard]==0.32.1
|
|
pydantic==2.10.3
|
|
pydantic-settings==2.6.1
|
|
pydantic[email]==2.10.3
|
|
|
|
# Database
|
|
sqlalchemy==2.0.36
|
|
psycopg2-binary==2.9.10
|
|
alembic==1.14.0
|
|
|
|
# Authentication and Security
|
|
python-jose[cryptography]==3.3.0
|
|
passlib[bcrypt]==1.7.4
|
|
bcrypt==4.0.1 # Changed from 4.2.1 for Python 3.13.5 compatibility
|
|
python-multipart==0.0.20
|
|
|
|
# Data processing
|
|
pandas==2.2.3
|
|
requests==2.32.3
|
|
beautifulsoup4==4.14.3
|
|
|
|
# Image processing
|
|
Pillow>=10.0.0
|
|
|
|
# System monitoring
|
|
psutil>=5.9.0
|
|
|
|
# PDF generation
|
|
weasyprint==62.3
|
|
|
|
# Templating
|
|
Jinja2>=3.1.0
|
|
|
|
# Environment and configuration
|
|
python-dotenv==1.0.1
|
|
|
|
# Payment processing
|
|
stripe>=7.0.0
|
|
|
|
# Task queue (Celery with Redis)
|
|
celery[redis]==5.3.6
|
|
redis==5.0.1
|
|
kombu==5.3.4
|
|
flower==2.0.1
|
|
|
|
# Error tracking
|
|
sentry-sdk[fastapi]>=2.0.0
|
|
|
|
# Prometheus metrics
|
|
prometheus_client>=0.20.0
|
|
|
|
# Cloud storage (S3-compatible - Cloudflare R2)
|
|
boto3>=1.34.0
|
|
|
|
# Google Wallet integration (loyalty passes)
|
|
google-auth>=2.0.0
|
|
PyJWT>=2.0.0
|