Files
orion/requirements.txt
Samir Boulahtit 1828ac85eb feat(prospecting): add content scraping for POC builder (Workstream 3A)
- New scrape_content() method in enrichment_service: extracts meta
  description, H1/H2 headings, paragraphs, images (filtered for size),
  social links, service items, and detected languages using BeautifulSoup
- Scans 6 pages per prospect: /, /about, /a-propos, /services,
  /nos-services, /contact
- Results stored as JSON in prospect.scraped_content_json
- New endpoints: POST /content-scrape/{id} and /content-scrape/batch
- Added to full_enrichment pipeline (Step 5, before security audit)
- CONTENT_SCRAPE job type for scan-jobs tracking
- "Content Scrape" batch button on scan-jobs page
- Add beautifulsoup4 to requirements.txt

Tested on batirenovation-strasbourg.fr: extracted 30 headings,
21 paragraphs, 13 images.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:26:56 +02:00

62 lines
1.1 KiB
Plaintext

# requirements.txt - Python 3.13.5 compatible
# Core FastAPI and web framework
starlette==0.41.3
fastapi==0.115.5
uvicorn[standard]==0.32.1
pydantic==2.10.3
pydantic-settings==2.6.1
pydantic[email]==2.10.3
# Database
sqlalchemy==2.0.36
psycopg2-binary==2.9.10
alembic==1.14.0
# Authentication and Security
python-jose[cryptography]==3.3.0
passlib[bcrypt]==1.7.4
bcrypt==4.0.1 # Changed from 4.2.1 for Python 3.13.5 compatibility
python-multipart==0.0.20
# Data processing
pandas==2.2.3
requests==2.32.3
beautifulsoup4==4.14.3
# Image processing
Pillow>=10.0.0
# System monitoring
psutil>=5.9.0
# PDF generation
weasyprint==62.3
# Templating
Jinja2>=3.1.0
# Environment and configuration
python-dotenv==1.0.1
# Payment processing
stripe>=7.0.0
# Task queue (Celery with Redis)
celery[redis]==5.3.6
redis==5.0.1
kombu==5.3.4
flower==2.0.1
# Error tracking
sentry-sdk[fastapi]>=2.0.0
# Prometheus metrics
prometheus_client>=0.20.0
# Cloud storage (S3-compatible - Cloudflare R2)
boto3>=1.34.0
# Google Wallet integration (loyalty passes)
google-auth>=2.0.0
PyJWT>=2.0.0