Files
orion/docs/deployment/infrastructure.md
Samir Boulahtit 3614d448e4 chore: PostgreSQL migration compatibility and infrastructure improvements
Database & Migrations:
- Update all Alembic migrations for PostgreSQL compatibility
- Remove SQLite-specific syntax (AUTOINCREMENT, etc.)
- Add database utility helpers for PostgreSQL operations
- Fix services to use PostgreSQL-compatible queries

Documentation:
- Add comprehensive Docker deployment guide
- Add production deployment documentation
- Add infrastructure architecture documentation
- Update database setup guide for PostgreSQL-only
- Expand troubleshooting guide

Architecture & Validation:
- Add migration.yaml rules for SQL compatibility checking
- Enhance validate_architecture.py with migration validation
- Update architecture rules to validate Alembic migrations

Development:
- Fix duplicate install-all target in Makefile
- Add Celery/Redis validation to install.py script
- Add docker-compose.test.yml for CI testing
- Add squash_migrations.py utility script
- Update tests for PostgreSQL compatibility
- Improve test fixtures in conftest.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 17:52:28 +01:00

27 KiB

Infrastructure Guide

This guide documents the complete infrastructure for the Wizamart platform, from development to high-end production.

Philosophy: We prioritize debuggability and operational simplicity over complexity. Every component should be directly accessible for troubleshooting.


Table of Contents


Architecture Overview

System Components

┌─────────────────────────────────────────────────────────────────────────┐
│                              CLIENTS                                     │
│  (Browsers, Mobile Apps, API Consumers)                                  │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         LOAD BALANCER / PROXY                            │
│  (Nginx, Caddy, or Cloud LB)                                            │
│  - SSL termination                                                       │
│  - Static file serving                                                   │
│  - Rate limiting                                                         │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         APPLICATION SERVERS                              │
│  (FastAPI + Uvicorn)                                                     │
│  - API endpoints                                                         │
│  - HTML rendering (Jinja2)                                              │
│  - WebSocket connections                                                 │
└─────────────────────────────────────────────────────────────────────────┘
                    │               │               │
                    ▼               ▼               ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│   PostgreSQL     │ │      Redis       │ │   File Storage   │
│   (Primary DB)   │ │  (Cache/Queue)   │ │  (S3/Local)      │
└──────────────────┘ └──────────────────┘ └──────────────────┘
                                │
                                ▼
                    ┌──────────────────┐
                    │  Celery Workers  │
                    │ (Background Jobs)│
                    └──────────────────┘

Data Flow

  1. Request → Nginx → Uvicorn → FastAPI → Service Layer → Database
  2. Background Job → API creates task → Redis Queue → Celery Worker → Database
  3. Static Files → Nginx serves directly (or CDN in production)

Current State

What We Have Now

Component Technology Status
Web Framework FastAPI + Uvicorn Production Ready
Database PostgreSQL 15 Production Ready
ORM SQLAlchemy 2.0 Production Ready
Migrations Alembic Production Ready
Templates Jinja2 + Tailwind CSS Production Ready
Authentication JWT (PyJWT) Production Ready
Email SMTP/SendGrid/Mailgun/SES Production Ready
Payments Stripe Production Ready
Background Jobs - Planned (Celery)
Caching - Planned (Redis)
File Storage Local filesystem Needs S3 for prod

What We Need to Add

Component Priority Reason
Redis High Session cache, Celery broker
Celery High Background jobs (imports, emails, reports)
S3/MinIO Medium Scalable file storage
Sentry Medium Error tracking
Prometheus/Grafana Low Metrics and dashboards

Development Environment

# 1. Start PostgreSQL
make docker-up

# 2. Run migrations
make migrate-up

# 3. Initialize data
make init-prod

# 4. Start development server
make dev

# 5. (Optional) Run tests
make test

Services Running Locally

Service Host Port Purpose
FastAPI localhost 8000 Main application
PostgreSQL localhost 5432 Development database
PostgreSQL (test) localhost 5433 Test database
MkDocs localhost 8001 Documentation

Docker Compose Services

# docker-compose.yml
services:
  db:          # PostgreSQL for development
  redis:       # Redis for cache/queue (coming soon)
  api:         # FastAPI application (optional)

Production Options

Best for: Teams who want direct server access, familiar with Linux administration.

┌─────────────────────────────────────────────────────────────┐
│                         VPS (4GB+ RAM)                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
│  │   Nginx     │  │  Uvicorn    │  │ PostgreSQL  │          │
│  │  (reverse   │  │  (4 workers)│  │  (local)    │          │
│  │   proxy)    │  │             │  │             │          │
│  └─────────────┘  └─────────────┘  └─────────────┘          │
│         │                │                │                  │
│         └────────────────┼────────────────┘                  │
│                          │                                   │
│  ┌─────────────┐  ┌─────────────┐                           │
│  │   Redis     │  │   Celery    │                           │
│  │  (local)    │  │  (workers)  │                           │
│  └─────────────┘  └─────────────┘                           │
└─────────────────────────────────────────────────────────────┘

Setup:

# On Ubuntu 22.04+ VPS

# 1. Install system packages
sudo apt update
sudo apt install -y nginx postgresql-15 redis-server python3.11 python3.11-venv

# 2. Create application user
sudo useradd -m -s /bin/bash wizamart
sudo su - wizamart

# 3. Clone and setup
git clone <repo> /home/wizamart/app
cd /home/wizamart/app
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
nano .env  # Edit with production values

# 5. Setup database
sudo -u postgres createuser wizamart_user
sudo -u postgres createdb wizamart_db -O wizamart_user
alembic upgrade head
python scripts/init_production.py

# 6. Create systemd service
sudo nano /etc/systemd/system/wizamart.service

Systemd Service:

# /etc/systemd/system/wizamart.service
[Unit]
Description=Wizamart API
After=network.target postgresql.service redis.service

[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/uvicorn main:app --host 127.0.0.1 --port 8000 --workers 4
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Celery Workers:

# /etc/systemd/system/wizamart-celery.service
[Unit]
Description=Wizamart Celery Worker
After=network.target redis.service

[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/celery -A app.celery worker --loglevel=info --concurrency=4
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Nginx Configuration:

# /etc/nginx/sites-available/wizamart
server {
    listen 80;
    server_name yourdomain.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;

    # Static files (served directly by Nginx)
    location /static {
        alias /home/wizamart/app/static;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }

    # Uploaded files
    location /uploads {
        alias /home/wizamart/app/uploads;
        expires 7d;
    }

    # API and application
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (for future real-time features)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Troubleshooting Commands:

# Check service status
sudo systemctl status wizamart
sudo systemctl status wizamart-celery
sudo systemctl status postgresql
sudo systemctl status redis

# View logs
sudo journalctl -u wizamart -f
sudo journalctl -u wizamart-celery -f

# Connect to database directly
sudo -u postgres psql wizamart_db

# Check Redis
redis-cli ping
redis-cli monitor  # Watch commands in real-time

# Restart services
sudo systemctl restart wizamart
sudo systemctl restart wizamart-celery

Option 2: Docker Compose Production

Best for: Consistent environments, easy rollbacks, container familiarity.

# docker-compose.prod.yml
services:
  api:
    build: .
    restart: always
    ports:
      - "127.0.0.1:8000:8000"
    environment:
      DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
      REDIS_URL: redis://redis:6379/0
      CELERY_BROKER_URL: redis://redis:6379/1
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      - ./uploads:/app/uploads
      - ./logs:/app/logs
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  celery:
    build: .
    restart: always
    command: celery -A app.celery worker --loglevel=info --concurrency=4
    environment:
      DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
      REDIS_URL: redis://redis:6379/0
      CELERY_BROKER_URL: redis://redis:6379/1
    depends_on:
      - db
      - redis
    volumes:
      - ./logs:/app/logs

  celery-beat:
    build: .
    restart: always
    command: celery -A app.celery beat --loglevel=info
    environment:
      DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
      CELERY_BROKER_URL: redis://redis:6379/1
    depends_on:
      - redis

  db:
    image: postgres:15
    restart: always
    environment:
      POSTGRES_DB: wizamart_db
      POSTGRES_USER: wizamart_user
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U wizamart_user -d wizamart_db"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    restart: always
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  nginx:
    image: nginx:alpine
    restart: always
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./static:/app/static:ro
      - ./uploads:/app/uploads:ro
      - /etc/letsencrypt:/etc/letsencrypt:ro
    depends_on:
      - api

volumes:
  postgres_data:
  redis_data:

Troubleshooting Commands:

# View all containers
docker compose -f docker-compose.prod.yml ps

# View logs
docker compose -f docker-compose.prod.yml logs -f api
docker compose -f docker-compose.prod.yml logs -f celery

# Access container shell
docker compose -f docker-compose.prod.yml exec api bash
docker compose -f docker-compose.prod.yml exec db psql -U wizamart_user -d wizamart_db

# Restart specific service
docker compose -f docker-compose.prod.yml restart api

# View resource usage
docker stats

Option 3: Managed Services (Minimal Ops)

Best for: Small teams, focus on product not infrastructure.

Component Service Cost (approx)
App Hosting Railway / Render / Fly.io $5-25/mo
Database Neon / Supabase / PlanetScale $0-25/mo
Redis Upstash / Redis Cloud $0-10/mo
File Storage Cloudflare R2 / AWS S3 $0-5/mo
Email Resend / SendGrid $0-20/mo

Example: Railway + Neon

# Deploy to Railway
railway login
railway init
railway up

# Configure environment
railway variables set DATABASE_URL="postgresql://..."
railway variables set REDIS_URL="redis://..."

Future High-End Architecture

Target Production Architecture

                            ┌─────────────────┐
                            │   CloudFlare    │
                            │   (CDN + WAF)   │
                            └────────┬────────┘
                                     │
                            ┌────────▼────────┐
                            │  Load Balancer  │
                            │  (HA Proxy/ALB) │
                            └────────┬────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
     ┌────────▼────────┐   ┌────────▼────────┐   ┌────────▼────────┐
     │   API Server 1  │   │   API Server 2  │   │   API Server N  │
     │   (Uvicorn)     │   │   (Uvicorn)     │   │   (Uvicorn)     │
     └────────┬────────┘   └────────┬────────┘   └────────┬────────┘
              │                      │                      │
              └──────────────────────┼──────────────────────┘
                                     │
         ┌───────────────────────────┼───────────────────────────┐
         │                           │                           │
┌────────▼────────┐        ┌────────▼────────┐        ┌────────▼────────┐
│   PostgreSQL    │        │     Redis       │        │   S3 / MinIO    │
│   (Primary)     │        │   (Cluster)     │        │   (Files)       │
│        │        │        │                 │        │                 │
│   ┌────▼────┐   │        │   ┌─────────┐   │        │                 │
│   │ Replica │   │        │   │ Sentinel│   │        │                 │
│   └─────────┘   │        │   └─────────┘   │        │                 │
└─────────────────┘        └─────────────────┘        └─────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
     ┌────────▼────────┐   ┌────────▼────────┐   ┌────────▼────────┐
     │ Celery Worker 1 │   │ Celery Worker 2 │   │ Celery Beat     │
     │ (General)       │   │ (Import Jobs)   │   │ (Scheduler)     │
     └─────────────────┘   └─────────────────┘   └─────────────────┘

                    ┌─────────────────────────────┐
                    │       Monitoring Stack      │
                    │  ┌─────────┐ ┌───────────┐  │
                    │  │Prometheus│ │  Grafana  │  │
                    │  └─────────┘ └───────────┘  │
                    │  ┌─────────┐ ┌───────────┐  │
                    │  │  Sentry │ │  Loki     │  │
                    │  └─────────┘ └───────────┘  │
                    └─────────────────────────────┘

Celery Task Queues

# app/celery.py (to be implemented)
from celery import Celery

celery_app = Celery(
    "wizamart",
    broker=settings.celery_broker_url,
    backend=settings.celery_result_backend,
)

celery_app.conf.task_queues = {
    "default": {"exchange": "default", "routing_key": "default"},
    "imports": {"exchange": "imports", "routing_key": "imports"},
    "emails": {"exchange": "emails", "routing_key": "emails"},
    "reports": {"exchange": "reports", "routing_key": "reports"},
}

celery_app.conf.task_routes = {
    "app.tasks.import_letzshop_products": {"queue": "imports"},
    "app.tasks.send_email": {"queue": "emails"},
    "app.tasks.generate_report": {"queue": "reports"},
}

Background Tasks to Implement

Task Queue Priority Description
import_letzshop_products imports High Marketplace product sync
import_letzshop_orders imports High Order sync from Letzshop
send_order_confirmation emails High Order emails
send_password_reset emails High Auth emails
send_invoice_email emails Medium Invoice delivery
generate_sales_report reports Low Analytics reports
cleanup_expired_sessions default Low Maintenance
sync_stripe_subscriptions default Medium Billing sync

Component Deep Dives

PostgreSQL Configuration

Production Settings (postgresql.conf):

# Memory (adjust based on server RAM)
shared_buffers = 256MB          # 25% of RAM for dedicated DB server
effective_cache_size = 768MB    # 75% of RAM
work_mem = 16MB
maintenance_work_mem = 128MB

# Connections
max_connections = 100

# Write-Ahead Log
wal_level = replica
max_wal_senders = 3

# Query Planning
random_page_cost = 1.1          # For SSD storage
effective_io_concurrency = 200  # For SSD storage

# Logging
log_min_duration_statement = 1000  # Log queries > 1 second
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '

Backup Strategy:

# Daily backup script
#!/bin/bash
BACKUP_DIR=/backups/postgresql
DATE=$(date +%Y%m%d_%H%M%S)
pg_dump -U wizamart_user wizamart_db | gzip > $BACKUP_DIR/wizamart_$DATE.sql.gz

# Keep last 7 days
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete

Redis Configuration

Use Cases:

Use Case Database TTL Description
Session Cache 0 24h User sessions
API Rate Limiting 0 1h Request counters
Celery Broker 1 - Task queue
Celery Results 2 24h Task results
Feature Flags 3 5m Feature gate cache

Configuration (redis.conf):

maxmemory 256mb
maxmemory-policy allkeys-lru
appendonly yes
appendfsync everysec

Nginx Tuning

# /etc/nginx/nginx.conf
worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    # Buffers
    client_body_buffer_size 10K;
    client_header_buffer_size 1k;
    client_max_body_size 50M;
    large_client_header_buffers 2 1k;

    # Timeouts
    client_body_timeout 12;
    client_header_timeout 12;
    keepalive_timeout 15;
    send_timeout 10;

    # Gzip
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml application/json application/javascript;
}

Troubleshooting Guide

Quick Diagnostics

# Check all services
systemctl status wizamart wizamart-celery postgresql redis nginx

# Check ports
ss -tlnp | grep -E '(8000|5432|6379|80|443)'

# Check disk space
df -h

# Check memory
free -h

# Check CPU/processes
htop

Database Issues

# Connect to database
sudo -u postgres psql wizamart_db

# Check active connections
SELECT count(*) FROM pg_stat_activity;

# Find slow queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

# Kill stuck query
SELECT pg_terminate_backend(pid);

# Check table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

# Analyze query performance
EXPLAIN ANALYZE SELECT ...;

Redis Issues

# Check connectivity
redis-cli ping

# Monitor real-time commands
redis-cli monitor

# Check memory usage
redis-cli info memory

# List all keys (careful in production!)
redis-cli --scan

# Check queue lengths
redis-cli llen celery

# Flush specific database
redis-cli -n 1 flushdb  # Flush Celery broker

Celery Issues

# Check worker status
celery -A app.celery inspect active
celery -A app.celery inspect reserved
celery -A app.celery inspect stats

# Purge all pending tasks
celery -A app.celery purge

# List registered tasks
celery -A app.celery inspect registered

Application Issues

# Check API health
curl -s http://localhost:8000/health | jq

# View recent logs
journalctl -u wizamart --since "10 minutes ago"

# Check for Python errors
journalctl -u wizamart | grep -i error | tail -20

# Test database connection
python -c "from app.core.database import engine; print(engine.connect())"

Common Problems & Solutions

Problem Diagnosis Solution
502 Bad Gateway systemctl status wizamart Restart app: systemctl restart wizamart
Database connection refused pg_isready Start PostgreSQL: systemctl start postgresql
High memory usage free -h, ps aux --sort=-%mem Restart app, check for memory leaks
Slow queries PostgreSQL slow query log Add indexes, optimize queries
Celery tasks stuck celery inspect active Restart workers, check Redis
Disk full df -h Clean logs, backups, temp files

Decision Matrix

When to Use Each Option

Scenario Recommended Reason
Solo developer, MVP Managed (Railway) Focus on product
Small team, budget conscious Traditional VPS Full control, low cost
Need direct DB access for debugging Traditional VPS Direct psql access
Familiar with Docker, want consistency Docker Compose Reproducible environments
High availability required Docker + Orchestration Easy scaling
Enterprise, compliance requirements Kubernetes Full orchestration

Cost Comparison (Monthly)

Setup Low Traffic Medium High
Managed (Railway + Neon) $10 $50 $200+
VPS (Hetzner/DigitalOcean) $5 $20 $80
Docker on VPS $5 $20 $80
AWS/GCP Full Stack $50 $200 $1000+

Migration Path

Phase 1: Current (Development)

  • PostgreSQL (Docker)
  • FastAPI + Uvicorn
  • Local file storage

Phase 2: Production MVP

  • PostgreSQL (managed or VPS)
  • FastAPI + Uvicorn (systemd or Docker)
  • Redis (session cache)
  • Celery (background jobs)
  • S3/MinIO (file storage)

Phase 3: Scale

  • Horizontal app scaling (multiple Uvicorn instances)
  • PostgreSQL read replicas
  • Redis cluster
  • CDN for static assets
  • Dedicated Celery workers per queue

Phase 4: High Availability

  • Multi-region deployment
  • Database failover
  • Container orchestration (Kubernetes)
  • Full monitoring stack

Next Steps

  1. Add Redis to docker-compose.yml - For session cache
  2. Implement Celery - Start with email and import tasks
  3. Configure S3/MinIO - For production file storage
  4. Set up Sentry - Error tracking
  5. Choose production deployment - VPS or Docker based on team preference

See also: