Files
orion/docs/deployment/infrastructure.md
Samir Boulahtit ff5b395cdd feat: add Sentry, Cloudflare R2, and CloudFlare CDN integrations
Production quick wins for improved observability and scalability:

Sentry Error Tracking:
- Add sentry-sdk[fastapi] dependency
- Initialize Sentry in main.py with FastAPI/SQLAlchemy integrations
- Add Celery integration for background task error tracking
- Feature-flagged via SENTRY_DSN (disabled when empty)

Cloudflare R2 Storage:
- Add boto3 dependency for S3-compatible API
- Create storage_service.py with StorageBackend abstraction
- LocalStorageBackend for development (default)
- R2StorageBackend for production cloud storage
- Feature-flagged via STORAGE_BACKEND setting

CloudFlare CDN/Proxy:
- Create middleware/cloudflare.py for CF header handling
- Extract real client IP from CF-Connecting-IP
- Support CF-IPCountry for geo features
- Feature-flagged via CLOUDFLARE_ENABLED setting

Documentation:
- Add docs/deployment/cloudflare.md setup guide
- Update infrastructure.md with dev vs prod requirements
- Add enterprise upgrade checklist for scaling beyond 1000 users
- Update installation.md with new environment variables

All features are optional and disabled by default for development.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 19:44:59 +01:00

964 lines
31 KiB
Markdown

# Infrastructure Guide
This guide documents the complete infrastructure for the Wizamart platform, from development to high-end production.
**Philosophy:** We prioritize **debuggability and operational simplicity** over complexity. Every component should be directly accessible for troubleshooting.
---
## Table of Contents
- [Architecture Overview](#architecture-overview)
- [Current State](#current-state)
- [Development Environment](#development-environment)
- [Production Options](#production-options)
- [Future High-End Architecture](#future-high-end-architecture)
- [Component Deep Dives](#component-deep-dives)
- [Troubleshooting Guide](#troubleshooting-guide)
- [Decision Matrix](#decision-matrix)
---
## Architecture Overview
### System Components
```
┌─────────────────────────────────────────────────────────────────────────┐
│ CLIENTS │
│ (Browsers, Mobile Apps, API Consumers) │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ LOAD BALANCER / PROXY │
│ (Nginx, Caddy, or Cloud LB) │
│ - SSL termination │
│ - Static file serving │
│ - Rate limiting │
└─────────────────────────────────────────────────────────────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ APPLICATION SERVERS │
│ (FastAPI + Uvicorn) │
│ - API endpoints │
│ - HTML rendering (Jinja2) │
│ - WebSocket connections │
└─────────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ PostgreSQL │ │ Redis │ │ File Storage │
│ (Primary DB) │ │ (Cache/Queue) │ │ (S3/Local) │
└──────────────────┘ └──────────────────┘ └──────────────────┘
┌──────────────────┐
│ Celery Workers │
│ (Background Jobs)│
└──────────────────┘
```
### Data Flow
1. **Request** → Nginx → Uvicorn → FastAPI → Service Layer → Database
2. **Background Job** → API creates task → Redis Queue → Celery Worker → Database
3. **Static Files** → Nginx serves directly (or CDN in production)
---
## Current State
### What We Have Now
| Component | Technology | Dev Required | Prod Required | Status |
|-----------|------------|--------------|---------------|--------|
| Web Framework | FastAPI + Uvicorn | ✅ | ✅ | ✅ Production Ready |
| Database | PostgreSQL 15 | ✅ | ✅ | ✅ Production Ready |
| ORM | SQLAlchemy 2.0 | ✅ | ✅ | ✅ Production Ready |
| Migrations | Alembic | ✅ | ✅ | ✅ Production Ready |
| Templates | Jinja2 + Tailwind CSS | ✅ | ✅ | ✅ Production Ready |
| Authentication | JWT (PyJWT) | ✅ | ✅ | ✅ Production Ready |
| Email | SMTP/SendGrid/Mailgun/SES | ❌ | ✅ | ✅ Production Ready |
| Payments | Stripe | ❌ | ✅ | ✅ Production Ready |
| Task Queue | Celery 5.3 + Redis | ❌ | ✅ | ✅ Production Ready |
| Task Scheduler | Celery Beat | ❌ | ✅ | ✅ Production Ready |
| Task Monitoring | Flower | ❌ | ⚪ Optional | ✅ Production Ready |
| Caching | Redis 7 | ❌ | ✅ | ✅ Production Ready |
| File Storage | Local / Cloudflare R2 | Local | R2 | ✅ Production Ready |
| Error Tracking | Sentry | ❌ | ⚪ Recommended | ✅ Production Ready |
| CDN / WAF | CloudFlare | ❌ | ⚪ Recommended | ✅ Production Ready |
**Legend:** ✅ Required | ⚪ Optional/Recommended | ❌ Not needed
### Development vs Production
**Development** requires only:
- PostgreSQL (via Docker: `make docker-up`)
- Python 3.11+ with dependencies
**Production** adds:
- Redis (for Celery task queue)
- Celery workers (for background tasks)
- Reverse proxy (Nginx)
- SSL certificates
**Optional but recommended for Production:**
- Sentry (error tracking) - Set `SENTRY_DSN` to enable
- Cloudflare R2 (cloud storage) - Set `STORAGE_BACKEND=r2` to enable
- CloudFlare CDN (caching/DDoS) - Set `CLOUDFLARE_ENABLED=true` to enable
### What We Need for Enterprise (Future Growth)
| Component | Priority | When Needed | Estimated Users |
|-----------|----------|-------------|-----------------|
| Load Balancer | Medium | Horizontal scaling | 1,000+ concurrent |
| Database Replica | Medium | Read-heavy workloads | 1,000+ concurrent |
| Redis Sentinel | Low | Cache redundancy | 5,000+ concurrent |
| Prometheus/Grafana | Low | Advanced metrics | Any (nice to have) |
| Kubernetes | Low | Multi-region/HA | 10,000+ concurrent |
---
## Development Environment
### Local Setup (Recommended)
```bash
# 1. Start PostgreSQL and Redis
make docker-up
# 2. Run migrations
make migrate-up
# 3. Initialize data
make init-prod
# 4. Start development server
make dev
# 5. (Optional) Start Celery worker for background tasks
make celery-dev # Worker + Beat together
# 6. (Optional) Run tests
make test
```
### Services Running Locally
| Service | Host | Port | Purpose |
|---------|------|------|---------|
| FastAPI | localhost | 8000 | Main application |
| PostgreSQL | localhost | 5432 | Development database |
| PostgreSQL (test) | localhost | 5433 | Test database |
| Redis | localhost | 6380 | Cache and task broker |
| Celery Worker | - | - | Background task processing |
| Celery Beat | - | - | Scheduled task scheduler |
| Flower | localhost | 5555 | Task monitoring dashboard |
| MkDocs | localhost | 9991 | Documentation |
### Docker Compose Services
```yaml
# docker-compose.yml
services:
db: # PostgreSQL 15 for development
redis: # Redis 7 for cache/queue
api: # FastAPI application (profile: full)
celery-worker: # Background task processor (profile: full)
celery-beat: # Scheduled task scheduler (profile: full)
flower: # Task monitoring UI (profile: full)
```
### Celery Commands
```bash
# Start worker only
make celery-worker
# Start scheduler only
make celery-beat
# Start worker + scheduler together (development)
make celery-dev
# Start Flower monitoring
make flower
# Check worker status
make celery-status
# Purge pending tasks
make celery-purge
```
---
## Production Options
### Option 1: Traditional VPS (Recommended for Troubleshooting)
**Best for:** Teams who want direct server access, familiar with Linux administration.
```
┌─────────────────────────────────────────────────────────────┐
│ VPS (4GB+ RAM) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Nginx │ │ Uvicorn │ │ PostgreSQL │ │
│ │ (reverse │ │ (4 workers)│ │ (local) │ │
│ │ proxy) │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Redis │ │ Celery │ │
│ │ (local) │ │ (workers) │ │
│ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Setup:**
```bash
# On Ubuntu 22.04+ VPS
# 1. Install system packages
sudo apt update
sudo apt install -y nginx postgresql-15 redis-server python3.11 python3.11-venv
# 2. Create application user
sudo useradd -m -s /bin/bash wizamart
sudo su - wizamart
# 3. Clone and setup
git clone <repo> /home/wizamart/app
cd /home/wizamart/app
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
nano .env # Edit with production values
# 5. Setup database
sudo -u postgres createuser wizamart_user
sudo -u postgres createdb wizamart_db -O wizamart_user
alembic upgrade head
python scripts/init_production.py
# 6. Create systemd service
sudo nano /etc/systemd/system/wizamart.service
```
**Systemd Service:**
```ini
# /etc/systemd/system/wizamart.service
[Unit]
Description=Wizamart API
After=network.target postgresql.service redis.service
[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/uvicorn main:app --host 127.0.0.1 --port 8000 --workers 4
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
```
**Celery Workers:**
```ini
# /etc/systemd/system/wizamart-celery.service
[Unit]
Description=Wizamart Celery Worker
After=network.target redis.service
[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/celery -A app.celery worker --loglevel=info --concurrency=4
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
```
**Nginx Configuration:**
```nginx
# /etc/nginx/sites-available/wizamart
server {
listen 80;
server_name yourdomain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
# Static files (served directly by Nginx)
location /static {
alias /home/wizamart/app/static;
expires 30d;
add_header Cache-Control "public, immutable";
}
# Uploaded files
location /uploads {
alias /home/wizamart/app/uploads;
expires 7d;
}
# API and application
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (for future real-time features)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
```
**Troubleshooting Commands:**
```bash
# Check service status
sudo systemctl status wizamart
sudo systemctl status wizamart-celery
sudo systemctl status postgresql
sudo systemctl status redis
# View logs
sudo journalctl -u wizamart -f
sudo journalctl -u wizamart-celery -f
# Connect to database directly
sudo -u postgres psql wizamart_db
# Check Redis
redis-cli ping
redis-cli monitor # Watch commands in real-time
# Restart services
sudo systemctl restart wizamart
sudo systemctl restart wizamart-celery
```
---
### Option 2: Docker Compose Production
**Best for:** Consistent environments, easy rollbacks, container familiarity.
```yaml
# docker-compose.prod.yml
services:
api:
build: .
restart: always
ports:
- "127.0.0.1:8000:8000"
environment:
DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
REDIS_URL: redis://redis:6379/0
CELERY_BROKER_URL: redis://redis:6379/1
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
volumes:
- ./uploads:/app/uploads
- ./logs:/app/logs
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
celery:
build: .
restart: always
command: celery -A app.celery worker --loglevel=info --concurrency=4
environment:
DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
REDIS_URL: redis://redis:6379/0
CELERY_BROKER_URL: redis://redis:6379/1
depends_on:
- db
- redis
volumes:
- ./logs:/app/logs
celery-beat:
build: .
restart: always
command: celery -A app.celery beat --loglevel=info
environment:
DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
CELERY_BROKER_URL: redis://redis:6379/1
depends_on:
- redis
db:
image: postgres:15
restart: always
environment:
POSTGRES_DB: wizamart_db
POSTGRES_USER: wizamart_user
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U wizamart_user -d wizamart_db"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
restart: always
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
nginx:
image: nginx:alpine
restart: always
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./static:/app/static:ro
- ./uploads:/app/uploads:ro
- /etc/letsencrypt:/etc/letsencrypt:ro
depends_on:
- api
volumes:
postgres_data:
redis_data:
```
**Troubleshooting Commands:**
```bash
# View all containers
docker compose -f docker-compose.prod.yml ps
# View logs
docker compose -f docker-compose.prod.yml logs -f api
docker compose -f docker-compose.prod.yml logs -f celery
# Access container shell
docker compose -f docker-compose.prod.yml exec api bash
docker compose -f docker-compose.prod.yml exec db psql -U wizamart_user -d wizamart_db
# Restart specific service
docker compose -f docker-compose.prod.yml restart api
# View resource usage
docker stats
```
---
### Option 3: Managed Services (Minimal Ops)
**Best for:** Small teams, focus on product not infrastructure.
| Component | Service | Cost (approx) |
|-----------|---------|---------------|
| App Hosting | Railway / Render / Fly.io | $5-25/mo |
| Database | Neon / Supabase / PlanetScale | $0-25/mo |
| Redis | Upstash / Redis Cloud | $0-10/mo |
| File Storage | Cloudflare R2 / AWS S3 | $0-5/mo |
| Email | Resend / SendGrid | $0-20/mo |
**Example: Railway + Neon**
```bash
# Deploy to Railway
railway login
railway init
railway up
# Configure environment
railway variables set DATABASE_URL="postgresql://..."
railway variables set REDIS_URL="redis://..."
```
---
## Future High-End Architecture
### Target Production Architecture
```
┌─────────────────┐
│ CloudFlare │
│ (CDN + WAF) │
└────────┬────────┘
┌────────▼────────┐
│ Load Balancer │
│ (HA Proxy/ALB) │
└────────┬────────┘
┌──────────────────────┼──────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ API Server 1 │ │ API Server 2 │ │ API Server N │
│ (Uvicorn) │ │ (Uvicorn) │ │ (Uvicorn) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└──────────────────────┼──────────────────────┘
┌───────────────────────────┼───────────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ PostgreSQL │ │ Redis │ │ S3 / MinIO │
│ (Primary) │ │ (Cluster) │ │ (Files) │
│ │ │ │ │ │ │
│ ┌────▼────┐ │ │ ┌─────────┐ │ │ │
│ │ Replica │ │ │ │ Sentinel│ │ │ │
│ └─────────┘ │ │ └─────────┘ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌──────────────────────┼──────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ Celery Worker 1 │ │ Celery Worker 2 │ │ Celery Beat │
│ (General) │ │ (Import Jobs) │ │ (Scheduler) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────────────────┐
│ Monitoring Stack │
│ ┌─────────┐ ┌───────────┐ │
│ │Prometheus│ │ Grafana │ │
│ └─────────┘ └───────────┘ │
│ ┌─────────┐ ┌───────────┐ │
│ │ Sentry │ │ Loki │ │
│ └─────────┘ └───────────┘ │
└─────────────────────────────┘
```
### Celery Task Queues
```python
# app/celery.py (to be implemented)
from celery import Celery
celery_app = Celery(
"wizamart",
broker=settings.celery_broker_url,
backend=settings.celery_result_backend,
)
celery_app.conf.task_queues = {
"default": {"exchange": "default", "routing_key": "default"},
"imports": {"exchange": "imports", "routing_key": "imports"},
"emails": {"exchange": "emails", "routing_key": "emails"},
"reports": {"exchange": "reports", "routing_key": "reports"},
}
celery_app.conf.task_routes = {
"app.tasks.import_letzshop_products": {"queue": "imports"},
"app.tasks.send_email": {"queue": "emails"},
"app.tasks.generate_report": {"queue": "reports"},
}
```
### Background Tasks to Implement
| Task | Queue | Priority | Description |
|------|-------|----------|-------------|
| `import_letzshop_products` | imports | High | Marketplace product sync |
| `import_letzshop_orders` | imports | High | Order sync from Letzshop |
| `send_order_confirmation` | emails | High | Order emails |
| `send_password_reset` | emails | High | Auth emails |
| `send_invoice_email` | emails | Medium | Invoice delivery |
| `generate_sales_report` | reports | Low | Analytics reports |
| `cleanup_expired_sessions` | default | Low | Maintenance |
| `sync_stripe_subscriptions` | default | Medium | Billing sync |
---
## Component Deep Dives
### PostgreSQL Configuration
**Production Settings (`postgresql.conf`):**
```ini
# Memory (adjust based on server RAM)
shared_buffers = 256MB # 25% of RAM for dedicated DB server
effective_cache_size = 768MB # 75% of RAM
work_mem = 16MB
maintenance_work_mem = 128MB
# Connections
max_connections = 100
# Write-Ahead Log
wal_level = replica
max_wal_senders = 3
# Query Planning
random_page_cost = 1.1 # For SSD storage
effective_io_concurrency = 200 # For SSD storage
# Logging
log_min_duration_statement = 1000 # Log queries > 1 second
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
```
**Backup Strategy:**
```bash
# Daily backup script
#!/bin/bash
BACKUP_DIR=/backups/postgresql
DATE=$(date +%Y%m%d_%H%M%S)
pg_dump -U wizamart_user wizamart_db | gzip > $BACKUP_DIR/wizamart_$DATE.sql.gz
# Keep last 7 days
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete
```
### Redis Configuration
**Use Cases:**
| Use Case | Database | TTL | Description |
|----------|----------|-----|-------------|
| Session Cache | 0 | 24h | User sessions |
| API Rate Limiting | 0 | 1h | Request counters |
| Celery Broker | 1 | - | Task queue |
| Celery Results | 2 | 24h | Task results |
| Feature Flags | 3 | 5m | Feature gate cache |
**Configuration (`redis.conf`):**
```ini
maxmemory 256mb
maxmemory-policy allkeys-lru
appendonly yes
appendfsync everysec
```
### Nginx Tuning
```nginx
# /etc/nginx/nginx.conf
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
http {
# Buffers
client_body_buffer_size 10K;
client_header_buffer_size 1k;
client_max_body_size 50M;
large_client_header_buffers 2 1k;
# Timeouts
client_body_timeout 12;
client_header_timeout 12;
keepalive_timeout 15;
send_timeout 10;
# Gzip
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml application/json application/javascript;
}
```
---
## Troubleshooting Guide
### Quick Diagnostics
```bash
# Check all services
systemctl status wizamart wizamart-celery postgresql redis nginx
# Check ports
ss -tlnp | grep -E '(8000|5432|6379|80|443)'
# Check disk space
df -h
# Check memory
free -h
# Check CPU/processes
htop
```
### Database Issues
```bash
# Connect to database
sudo -u postgres psql wizamart_db
# Check active connections
SELECT count(*) FROM pg_stat_activity;
# Find slow queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
# Kill stuck query
SELECT pg_terminate_backend(pid);
# Check table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
# Analyze query performance
EXPLAIN ANALYZE SELECT ...;
```
### Redis Issues
```bash
# Check connectivity
redis-cli ping
# Monitor real-time commands
redis-cli monitor
# Check memory usage
redis-cli info memory
# List all keys (careful in production!)
redis-cli --scan
# Check queue lengths
redis-cli llen celery
# Flush specific database
redis-cli -n 1 flushdb # Flush Celery broker
```
### Celery Issues
```bash
# Check worker status
celery -A app.celery inspect active
celery -A app.celery inspect reserved
celery -A app.celery inspect stats
# Purge all pending tasks
celery -A app.celery purge
# List registered tasks
celery -A app.celery inspect registered
```
### Application Issues
```bash
# Check API health
curl -s http://localhost:8000/health | jq
# View recent logs
journalctl -u wizamart --since "10 minutes ago"
# Check for Python errors
journalctl -u wizamart | grep -i error | tail -20
# Test database connection
python -c "from app.core.database import engine; print(engine.connect())"
```
### Common Problems & Solutions
| Problem | Diagnosis | Solution |
|---------|-----------|----------|
| 502 Bad Gateway | `systemctl status wizamart` | Restart app: `systemctl restart wizamart` |
| Database connection refused | `pg_isready` | Start PostgreSQL: `systemctl start postgresql` |
| High memory usage | `free -h`, `ps aux --sort=-%mem` | Restart app, check for memory leaks |
| Slow queries | PostgreSQL slow query log | Add indexes, optimize queries |
| Celery tasks stuck | `celery inspect active` | Restart workers, check Redis |
| Disk full | `df -h` | Clean logs, backups, temp files |
---
## Decision Matrix
### When to Use Each Option
| Scenario | Recommended | Reason |
|----------|-------------|--------|
| Solo developer, MVP | Managed (Railway) | Focus on product |
| Small team, budget conscious | Traditional VPS | Full control, low cost |
| Need direct DB access for debugging | Traditional VPS | Direct psql access |
| Familiar with Docker, want consistency | Docker Compose | Reproducible environments |
| High availability required | Docker + Orchestration | Easy scaling |
| Enterprise, compliance requirements | Kubernetes | Full orchestration |
### Cost Comparison (Monthly)
| Setup | Low Traffic | Medium | High |
|-------|-------------|--------|------|
| Managed (Railway + Neon) | $10 | $50 | $200+ |
| VPS (Hetzner/DigitalOcean) | $5 | $20 | $80 |
| Docker on VPS | $5 | $20 | $80 |
| AWS/GCP Full Stack | $50 | $200 | $1000+ |
---
## Migration Path
### Phase 1: Development ✅ COMPLETE
- ✅ PostgreSQL 15 (Docker)
- ✅ FastAPI + Uvicorn
- ✅ Local file storage
### Phase 2: Production MVP ✅ COMPLETE
- ✅ PostgreSQL (managed or VPS)
- ✅ FastAPI + Uvicorn (systemd or Docker)
- ✅ Redis 7 (cache + task broker)
- ✅ Celery 5.3 (background jobs)
- ✅ Celery Beat (scheduled tasks)
- ✅ Flower (task monitoring)
- ✅ Cloudflare R2 (cloud file storage)
- ✅ Sentry (error tracking)
- ✅ CloudFlare CDN (caching + DDoS protection)
### Phase 3: Scale (1,000+ Users)
- ⏳ Load balancer (Nginx/HAProxy/ALB)
- ⏳ Horizontal app scaling (2-4 Uvicorn instances)
- ⏳ PostgreSQL read replica
- ⏳ Dedicated Celery workers per queue
### Phase 4: Enterprise (5,000+ Users)
- ⏳ Redis Sentinel/cluster
- ⏳ Database connection pooling (PgBouncer)
- ⏳ Full monitoring stack (Prometheus/Grafana)
- ⏳ Log aggregation (Loki/ELK)
### Phase 5: High Availability (10,000+ Users)
- ⏳ Multi-region deployment
- ⏳ Database failover (streaming replication)
- ⏳ Container orchestration (Kubernetes)
- ⏳ Global CDN with edge caching
---
## Enterprise Upgrade Checklist
When you're ready to scale beyond 1,000 concurrent users:
### Infrastructure
- [ ] **Load Balancer** - Add Nginx/HAProxy in front of API servers
- Enables horizontal scaling
- Health checks and automatic failover
- SSL termination at edge
- [ ] **Multiple API Servers** - Run 2-4 Uvicorn instances
- Scale horizontally instead of vertically
- Blue-green deployments possible
- [ ] **Database Read Replica** - Add PostgreSQL replica
- Offload read queries from primary
- Backup without impacting production
- [ ] **Connection Pooling** - Add PgBouncer
- Reduce database connection overhead
- Handle connection spikes
### Monitoring & Observability
- [ ] **Prometheus + Grafana** - Metrics dashboards
- Request latency, error rates, saturation
- Database connection pool metrics
- Celery queue lengths
- [ ] **Log Aggregation** - Loki or ELK stack
- Centralized logs from all services
- Search and alerting
- [ ] **Alerting** - PagerDuty/OpsGenie integration
- On-call rotation
- Escalation policies
### Security
- [ ] **WAF Rules** - CloudFlare or AWS WAF
- SQL injection protection
- Rate limiting at edge
- Bot protection
- [ ] **Secrets Management** - HashiCorp Vault
- Rotate credentials automatically
- Audit access to secrets
---
## Next Steps
**You're production-ready now!** Optional improvements:
1. **Enable Sentry** - Add `SENTRY_DSN` for error tracking (free tier)
2. **Enable R2** - Set `STORAGE_BACKEND=r2` for cloud storage (~$5/mo)
3. **Enable CloudFlare** - Proxy domain for CDN + DDoS protection (free tier)
4. **Add load balancer** - When you need horizontal scaling
See also:
- [Production Deployment Guide](production.md)
- [CloudFlare Setup Guide](cloudflare.md)
- [Docker Deployment](docker.md)
- [Environment Configuration](environment.md)
- [Background Tasks Architecture](../architecture/background-tasks.md)