Files
orion/docs/deployment/infrastructure.md
Samir Boulahtit 12b79c1ff0 docs: update infrastructure docs to reflect Celery/Redis implementation
- Update "Current State" table: Celery, Beat, Flower, Redis now production-ready
- Update "What We Need to Add": removed Celery/Redis, added S3 and Sentry as priorities
- Add Celery commands section to development environment
- Update services table with Redis, Celery, Flower ports
- Update Docker Compose services section with all Celery services
- Update Migration Path: Phase 1 & 2 marked as COMPLETE
- Update Next Steps: focus on S3, Sentry, CloudFlare
- Fix Celery app path in systemd services (app.core.celery_config)
- Add Flower systemd service configuration
- Add queue flags to Celery worker command

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 19:13:40 +01:00

28 KiB

Infrastructure Guide

This guide documents the complete infrastructure for the Wizamart platform, from development to high-end production.

Philosophy: We prioritize debuggability and operational simplicity over complexity. Every component should be directly accessible for troubleshooting.


Table of Contents


Architecture Overview

System Components

┌─────────────────────────────────────────────────────────────────────────┐
│                              CLIENTS                                     │
│  (Browsers, Mobile Apps, API Consumers)                                  │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         LOAD BALANCER / PROXY                            │
│  (Nginx, Caddy, or Cloud LB)                                            │
│  - SSL termination                                                       │
│  - Static file serving                                                   │
│  - Rate limiting                                                         │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         APPLICATION SERVERS                              │
│  (FastAPI + Uvicorn)                                                     │
│  - API endpoints                                                         │
│  - HTML rendering (Jinja2)                                              │
│  - WebSocket connections                                                 │
└─────────────────────────────────────────────────────────────────────────┘
                    │               │               │
                    ▼               ▼               ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│   PostgreSQL     │ │      Redis       │ │   File Storage   │
│   (Primary DB)   │ │  (Cache/Queue)   │ │  (S3/Local)      │
└──────────────────┘ └──────────────────┘ └──────────────────┘
                                │
                                ▼
                    ┌──────────────────┐
                    │  Celery Workers  │
                    │ (Background Jobs)│
                    └──────────────────┘

Data Flow

  1. Request → Nginx → Uvicorn → FastAPI → Service Layer → Database
  2. Background Job → API creates task → Redis Queue → Celery Worker → Database
  3. Static Files → Nginx serves directly (or CDN in production)

Current State

What We Have Now

Component Technology Status
Web Framework FastAPI + Uvicorn Production Ready
Database PostgreSQL 15 Production Ready
ORM SQLAlchemy 2.0 Production Ready
Migrations Alembic Production Ready
Templates Jinja2 + Tailwind CSS Production Ready
Authentication JWT (PyJWT) Production Ready
Email SMTP/SendGrid/Mailgun/SES Production Ready
Payments Stripe Production Ready
Task Queue Celery 5.3 + Redis Production Ready
Task Scheduler Celery Beat Production Ready
Task Monitoring Flower Production Ready
Caching Redis 7 Production Ready
File Storage Local filesystem Needs S3 for prod

What We Need to Add

Component Priority Reason
S3/MinIO High Scalable file storage
Sentry High Error tracking
CloudFlare Medium CDN + DDoS protection
Load Balancer Medium Horizontal scaling
Prometheus/Grafana Low Metrics and dashboards

Development Environment

# 1. Start PostgreSQL and Redis
make docker-up

# 2. Run migrations
make migrate-up

# 3. Initialize data
make init-prod

# 4. Start development server
make dev

# 5. (Optional) Start Celery worker for background tasks
make celery-dev  # Worker + Beat together

# 6. (Optional) Run tests
make test

Services Running Locally

Service Host Port Purpose
FastAPI localhost 8000 Main application
PostgreSQL localhost 5432 Development database
PostgreSQL (test) localhost 5433 Test database
Redis localhost 6380 Cache and task broker
Celery Worker - - Background task processing
Celery Beat - - Scheduled task scheduler
Flower localhost 5555 Task monitoring dashboard
MkDocs localhost 9991 Documentation

Docker Compose Services

# docker-compose.yml
services:
  db:            # PostgreSQL 15 for development
  redis:         # Redis 7 for cache/queue
  api:           # FastAPI application (profile: full)
  celery-worker: # Background task processor (profile: full)
  celery-beat:   # Scheduled task scheduler (profile: full)
  flower:        # Task monitoring UI (profile: full)

Celery Commands

# Start worker only
make celery-worker

# Start scheduler only
make celery-beat

# Start worker + scheduler together (development)
make celery-dev

# Start Flower monitoring
make flower

# Check worker status
make celery-status

# Purge pending tasks
make celery-purge

Production Options

Best for: Teams who want direct server access, familiar with Linux administration.

┌─────────────────────────────────────────────────────────────┐
│                         VPS (4GB+ RAM)                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │   Nginx     │  │  Uvicorn    │  │ PostgreSQL  │          │
│  │  (reverse   │  │  (4 workers)│  │  (local)    │          │
│  │   proxy)    │  │             │  │             │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
│         │                │                │                  │
│         └────────────────┼────────────────┘                  │
│                          │                                   │
│  ┌─────────────┐  ┌─────────────┐                           │
│  │   Redis     │  │   Celery    │                           │
│  │  (local)    │  │  (workers)  │                           │
│  └─────────────┘  └─────────────┘                           │
└─────────────────────────────────────────────────────────────┘

Setup:

# On Ubuntu 22.04+ VPS

# 1. Install system packages
sudo apt update
sudo apt install -y nginx postgresql-15 redis-server python3.11 python3.11-venv

# 2. Create application user
sudo useradd -m -s /bin/bash wizamart
sudo su - wizamart

# 3. Clone and setup
git clone <repo> /home/wizamart/app
cd /home/wizamart/app
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
nano .env  # Edit with production values

# 5. Setup database
sudo -u postgres createuser wizamart_user
sudo -u postgres createdb wizamart_db -O wizamart_user
alembic upgrade head
python scripts/init_production.py

# 6. Create systemd service
sudo nano /etc/systemd/system/wizamart.service

Systemd Service:

# /etc/systemd/system/wizamart.service
[Unit]
Description=Wizamart API
After=network.target postgresql.service redis.service

[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/uvicorn main:app --host 127.0.0.1 --port 8000 --workers 4
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Celery Workers:

# /etc/systemd/system/wizamart-celery.service
[Unit]
Description=Wizamart Celery Worker
After=network.target redis.service

[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/celery -A app.celery worker --loglevel=info --concurrency=4
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Nginx Configuration:

# /etc/nginx/sites-available/wizamart
server {
    listen 80;
    server_name yourdomain.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;

    # Static files (served directly by Nginx)
    location /static {
        alias /home/wizamart/app/static;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }

    # Uploaded files
    location /uploads {
        alias /home/wizamart/app/uploads;
        expires 7d;
    }

    # API and application
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (for future real-time features)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Troubleshooting Commands:

# Check service status
sudo systemctl status wizamart
sudo systemctl status wizamart-celery
sudo systemctl status postgresql
sudo systemctl status redis

# View logs
sudo journalctl -u wizamart -f
sudo journalctl -u wizamart-celery -f

# Connect to database directly
sudo -u postgres psql wizamart_db

# Check Redis
redis-cli ping
redis-cli monitor  # Watch commands in real-time

# Restart services
sudo systemctl restart wizamart
sudo systemctl restart wizamart-celery

Option 2: Docker Compose Production

Best for: Consistent environments, easy rollbacks, container familiarity.

# docker-compose.prod.yml
services:
  api:
    build: .
    restart: always
    ports:
      - "127.0.0.1:8000:8000"
    environment:
      DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
      REDIS_URL: redis://redis:6379/0
      CELERY_BROKER_URL: redis://redis:6379/1
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      - ./uploads:/app/uploads
      - ./logs:/app/logs
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  celery:
    build: .
    restart: always
    command: celery -A app.celery worker --loglevel=info --concurrency=4
    environment:
      DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
      REDIS_URL: redis://redis:6379/0
      CELERY_BROKER_URL: redis://redis:6379/1
    depends_on:
      - db
      - redis
    volumes:
      - ./logs:/app/logs

  celery-beat:
    build: .
    restart: always
    command: celery -A app.celery beat --loglevel=info
    environment:
      DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
      CELERY_BROKER_URL: redis://redis:6379/1
    depends_on:
      - redis

  db:
    image: postgres:15
    restart: always
    environment:
      POSTGRES_DB: wizamart_db
      POSTGRES_USER: wizamart_user
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U wizamart_user -d wizamart_db"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    restart: always
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  nginx:
    image: nginx:alpine
    restart: always
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./static:/app/static:ro
      - ./uploads:/app/uploads:ro
      - /etc/letsencrypt:/etc/letsencrypt:ro
    depends_on:
      - api

volumes:
  postgres_data:
  redis_data:

Troubleshooting Commands:

# View all containers
docker compose -f docker-compose.prod.yml ps

# View logs
docker compose -f docker-compose.prod.yml logs -f api
docker compose -f docker-compose.prod.yml logs -f celery

# Access container shell
docker compose -f docker-compose.prod.yml exec api bash
docker compose -f docker-compose.prod.yml exec db psql -U wizamart_user -d wizamart_db

# Restart specific service
docker compose -f docker-compose.prod.yml restart api

# View resource usage
docker stats

Option 3: Managed Services (Minimal Ops)

Best for: Small teams, focus on product not infrastructure.

Component Service Cost (approx)
App Hosting Railway / Render / Fly.io $5-25/mo
Database Neon / Supabase / PlanetScale $0-25/mo
Redis Upstash / Redis Cloud $0-10/mo
File Storage Cloudflare R2 / AWS S3 $0-5/mo
Email Resend / SendGrid $0-20/mo

Example: Railway + Neon

# Deploy to Railway
railway login
railway init
railway up

# Configure environment
railway variables set DATABASE_URL="postgresql://..."
railway variables set REDIS_URL="redis://..."

Future High-End Architecture

Target Production Architecture

                            ┌─────────────────┐
                            │   CloudFlare    │
                            │   (CDN + WAF)   │
                            └────────┬────────┘
                                     │
                            ┌────────▼────────┐
                            │  Load Balancer  │
                            │  (HA Proxy/ALB) │
                            └────────┬────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
     ┌────────▼────────┐   ┌────────▼────────┐   ┌────────▼────────┐
     │   API Server 1  │   │   API Server 2  │   │   API Server N  │
     │   (Uvicorn)     │   │   (Uvicorn)     │   │   (Uvicorn)     │
     └────────┬────────┘   └────────┬────────┘   └────────┬────────┘
              │                      │                      │
              └──────────────────────┼──────────────────────┘
                                     │
         ┌───────────────────────────┼───────────────────────────┐
         │                           │                           │
┌────────▼────────┐        ┌────────▼────────┐        ┌────────▼────────┐
│   PostgreSQL    │        │     Redis       │        │   S3 / MinIO    │
│   (Primary)     │        │   (Cluster)     │        │   (Files)       │
│        │        │        │                 │        │                 │
│   ┌────▼────┐   │        │   ┌─────────┐   │        │                 │
│   │ Replica │   │        │   │ Sentinel│   │        │                 │
│   └─────────┘   │        │   └─────────┘   │        │                 │
└─────────────────┘        └─────────────────┘        └─────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
     ┌────────▼────────┐   ┌────────▼────────┐   ┌────────▼────────┐
     │ Celery Worker 1 │   │ Celery Worker 2 │   │ Celery Beat     │
     │ (General)       │   │ (Import Jobs)   │   │ (Scheduler)     │
     └─────────────────┘   └─────────────────┘   └─────────────────┘

                    ┌─────────────────────────────┐
                    │       Monitoring Stack      │
                    │  ┌─────────┐ ┌───────────┐  │
                    │  │Prometheus│ │  Grafana  │  │
                    │  └─────────┘ └───────────┘  │
                    │  ┌─────────┐ ┌───────────┐  │
                    │  │  Sentry │ │  Loki     │  │
                    │  └─────────┘ └───────────┘  │
                    └─────────────────────────────┘

Celery Task Queues

# app/celery.py (to be implemented)
from celery import Celery

celery_app = Celery(
    "wizamart",
    broker=settings.celery_broker_url,
    backend=settings.celery_result_backend,
)

celery_app.conf.task_queues = {
    "default": {"exchange": "default", "routing_key": "default"},
    "imports": {"exchange": "imports", "routing_key": "imports"},
    "emails": {"exchange": "emails", "routing_key": "emails"},
    "reports": {"exchange": "reports", "routing_key": "reports"},
}

celery_app.conf.task_routes = {
    "app.tasks.import_letzshop_products": {"queue": "imports"},
    "app.tasks.send_email": {"queue": "emails"},
    "app.tasks.generate_report": {"queue": "reports"},
}

Background Tasks to Implement

Task Queue Priority Description
import_letzshop_products imports High Marketplace product sync
import_letzshop_orders imports High Order sync from Letzshop
send_order_confirmation emails High Order emails
send_password_reset emails High Auth emails
send_invoice_email emails Medium Invoice delivery
generate_sales_report reports Low Analytics reports
cleanup_expired_sessions default Low Maintenance
sync_stripe_subscriptions default Medium Billing sync

Component Deep Dives

PostgreSQL Configuration

Production Settings (postgresql.conf):

# Memory (adjust based on server RAM)
shared_buffers = 256MB          # 25% of RAM for dedicated DB server
effective_cache_size = 768MB    # 75% of RAM
work_mem = 16MB
maintenance_work_mem = 128MB

# Connections
max_connections = 100

# Write-Ahead Log
wal_level = replica
max_wal_senders = 3

# Query Planning
random_page_cost = 1.1          # For SSD storage
effective_io_concurrency = 200  # For SSD storage

# Logging
log_min_duration_statement = 1000  # Log queries > 1 second
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '

Backup Strategy:

# Daily backup script
#!/bin/bash
BACKUP_DIR=/backups/postgresql
DATE=$(date +%Y%m%d_%H%M%S)
pg_dump -U wizamart_user wizamart_db | gzip > $BACKUP_DIR/wizamart_$DATE.sql.gz

# Keep last 7 days
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete

Redis Configuration

Use Cases:

Use Case Database TTL Description
Session Cache 0 24h User sessions
API Rate Limiting 0 1h Request counters
Celery Broker 1 - Task queue
Celery Results 2 24h Task results
Feature Flags 3 5m Feature gate cache

Configuration (redis.conf):

maxmemory 256mb
maxmemory-policy allkeys-lru
appendonly yes
appendfsync everysec

Nginx Tuning

# /etc/nginx/nginx.conf
worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    # Buffers
    client_body_buffer_size 10K;
    client_header_buffer_size 1k;
    client_max_body_size 50M;
    large_client_header_buffers 2 1k;

    # Timeouts
    client_body_timeout 12;
    client_header_timeout 12;
    keepalive_timeout 15;
    send_timeout 10;

    # Gzip
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml application/json application/javascript;
}

Troubleshooting Guide

Quick Diagnostics

# Check all services
systemctl status wizamart wizamart-celery postgresql redis nginx

# Check ports
ss -tlnp | grep -E '(8000|5432|6379|80|443)'

# Check disk space
df -h

# Check memory
free -h

# Check CPU/processes
htop

Database Issues

# Connect to database
sudo -u postgres psql wizamart_db

# Check active connections
SELECT count(*) FROM pg_stat_activity;

# Find slow queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

# Kill stuck query
SELECT pg_terminate_backend(pid);

# Check table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

# Analyze query performance
EXPLAIN ANALYZE SELECT ...;

Redis Issues

# Check connectivity
redis-cli ping

# Monitor real-time commands
redis-cli monitor

# Check memory usage
redis-cli info memory

# List all keys (careful in production!)
redis-cli --scan

# Check queue lengths
redis-cli llen celery

# Flush specific database
redis-cli -n 1 flushdb  # Flush Celery broker

Celery Issues

# Check worker status
celery -A app.celery inspect active
celery -A app.celery inspect reserved
celery -A app.celery inspect stats

# Purge all pending tasks
celery -A app.celery purge

# List registered tasks
celery -A app.celery inspect registered

Application Issues

# Check API health
curl -s http://localhost:8000/health | jq

# View recent logs
journalctl -u wizamart --since "10 minutes ago"

# Check for Python errors
journalctl -u wizamart | grep -i error | tail -20

# Test database connection
python -c "from app.core.database import engine; print(engine.connect())"

Common Problems & Solutions

Problem Diagnosis Solution
502 Bad Gateway systemctl status wizamart Restart app: systemctl restart wizamart
Database connection refused pg_isready Start PostgreSQL: systemctl start postgresql
High memory usage free -h, ps aux --sort=-%mem Restart app, check for memory leaks
Slow queries PostgreSQL slow query log Add indexes, optimize queries
Celery tasks stuck celery inspect active Restart workers, check Redis
Disk full df -h Clean logs, backups, temp files

Decision Matrix

When to Use Each Option

Scenario Recommended Reason
Solo developer, MVP Managed (Railway) Focus on product
Small team, budget conscious Traditional VPS Full control, low cost
Need direct DB access for debugging Traditional VPS Direct psql access
Familiar with Docker, want consistency Docker Compose Reproducible environments
High availability required Docker + Orchestration Easy scaling
Enterprise, compliance requirements Kubernetes Full orchestration

Cost Comparison (Monthly)

Setup Low Traffic Medium High
Managed (Railway + Neon) $10 $50 $200+
VPS (Hetzner/DigitalOcean) $5 $20 $80
Docker on VPS $5 $20 $80
AWS/GCP Full Stack $50 $200 $1000+

Migration Path

Phase 1: Current (Development) COMPLETE

  • PostgreSQL 15 (Docker)
  • FastAPI + Uvicorn
  • Local file storage

Phase 2: Production MVP COMPLETE

  • PostgreSQL (managed or VPS)
  • FastAPI + Uvicorn (systemd or Docker)
  • Redis 7 (cache + task broker)
  • Celery 5.3 (background jobs)
  • Celery Beat (scheduled tasks)
  • Flower (task monitoring)
  • S3/MinIO (file storage)
  • Sentry (error tracking)

Phase 3: Scale

  • Horizontal app scaling (multiple Uvicorn instances)
  • Load balancer (Nginx/HAProxy)
  • PostgreSQL read replicas
  • Redis Sentinel/cluster
  • CDN for static assets (CloudFlare)
  • Dedicated Celery workers per queue

Phase 4: High Availability

  • Multi-region deployment
  • Database failover
  • Container orchestration (Kubernetes)
  • Full monitoring stack (Prometheus/Grafana/Loki)

Next Steps

  1. Configure S3/MinIO - For production file storage (high priority)
  2. Set up Sentry - Error tracking (high priority)
  3. Add CloudFlare - CDN + DDoS protection (medium priority)
  4. Configure load balancer - When scaling horizontally
  5. Choose production deployment - VPS or Docker based on team preference

See also: