Files
orion/docs/deployment/infrastructure.md
Samir Boulahtit 3614d448e4 chore: PostgreSQL migration compatibility and infrastructure improvements
Database & Migrations:
- Update all Alembic migrations for PostgreSQL compatibility
- Remove SQLite-specific syntax (AUTOINCREMENT, etc.)
- Add database utility helpers for PostgreSQL operations
- Fix services to use PostgreSQL-compatible queries

Documentation:
- Add comprehensive Docker deployment guide
- Add production deployment documentation
- Add infrastructure architecture documentation
- Update database setup guide for PostgreSQL-only
- Expand troubleshooting guide

Architecture & Validation:
- Add migration.yaml rules for SQL compatibility checking
- Enhance validate_architecture.py with migration validation
- Update architecture rules to validate Alembic migrations

Development:
- Fix duplicate install-all target in Makefile
- Add Celery/Redis validation to install.py script
- Add docker-compose.test.yml for CI testing
- Add squash_migrations.py utility script
- Update tests for PostgreSQL compatibility
- Improve test fixtures in conftest.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 17:52:28 +01:00

846 lines
27 KiB
Markdown

# Infrastructure Guide
This guide documents the complete infrastructure for the Wizamart platform, from development to high-end production.
**Philosophy:** We prioritize **debuggability and operational simplicity** over complexity. Every component should be directly accessible for troubleshooting.
---
## Table of Contents
- [Architecture Overview](#architecture-overview)
- [Current State](#current-state)
- [Development Environment](#development-environment)
- [Production Options](#production-options)
- [Future High-End Architecture](#future-high-end-architecture)
- [Component Deep Dives](#component-deep-dives)
- [Troubleshooting Guide](#troubleshooting-guide)
- [Decision Matrix](#decision-matrix)
---
## Architecture Overview
### System Components
```
┌─────────────────────────────────────────────────────────────────────────┐
│ CLIENTS │
│ (Browsers, Mobile Apps, API Consumers) │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ LOAD BALANCER / PROXY │
│ (Nginx, Caddy, or Cloud LB) │
│ - SSL termination │
│ - Static file serving │
│ - Rate limiting │
└─────────────────────────────────────────────────────────────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ APPLICATION SERVERS │
│ (FastAPI + Uvicorn) │
│ - API endpoints │
│ - HTML rendering (Jinja2) │
│ - WebSocket connections │
└─────────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ PostgreSQL │ │ Redis │ │ File Storage │
│ (Primary DB) │ │ (Cache/Queue) │ │ (S3/Local) │
└──────────────────┘ └──────────────────┘ └──────────────────┘
┌──────────────────┐
│ Celery Workers │
│ (Background Jobs)│
└──────────────────┘
```
### Data Flow
1. **Request** → Nginx → Uvicorn → FastAPI → Service Layer → Database
2. **Background Job** → API creates task → Redis Queue → Celery Worker → Database
3. **Static Files** → Nginx serves directly (or CDN in production)
---
## Current State
### What We Have Now
| Component | Technology | Status |
|-----------|------------|--------|
| Web Framework | FastAPI + Uvicorn | ✅ Production Ready |
| Database | PostgreSQL 15 | ✅ Production Ready |
| ORM | SQLAlchemy 2.0 | ✅ Production Ready |
| Migrations | Alembic | ✅ Production Ready |
| Templates | Jinja2 + Tailwind CSS | ✅ Production Ready |
| Authentication | JWT (PyJWT) | ✅ Production Ready |
| Email | SMTP/SendGrid/Mailgun/SES | ✅ Production Ready |
| Payments | Stripe | ✅ Production Ready |
| Background Jobs | - | ⏳ Planned (Celery) |
| Caching | - | ⏳ Planned (Redis) |
| File Storage | Local filesystem | ⏳ Needs S3 for prod |
### What We Need to Add
| Component | Priority | Reason |
|-----------|----------|--------|
| Redis | High | Session cache, Celery broker |
| Celery | High | Background jobs (imports, emails, reports) |
| S3/MinIO | Medium | Scalable file storage |
| Sentry | Medium | Error tracking |
| Prometheus/Grafana | Low | Metrics and dashboards |
---
## Development Environment
### Local Setup (Recommended)
```bash
# 1. Start PostgreSQL
make docker-up
# 2. Run migrations
make migrate-up
# 3. Initialize data
make init-prod
# 4. Start development server
make dev
# 5. (Optional) Run tests
make test
```
### Services Running Locally
| Service | Host | Port | Purpose |
|---------|------|------|---------|
| FastAPI | localhost | 8000 | Main application |
| PostgreSQL | localhost | 5432 | Development database |
| PostgreSQL (test) | localhost | 5433 | Test database |
| MkDocs | localhost | 8001 | Documentation |
### Docker Compose Services
```yaml
# docker-compose.yml
services:
db: # PostgreSQL for development
redis: # Redis for cache/queue (coming soon)
api: # FastAPI application (optional)
```
---
## Production Options
### Option 1: Traditional VPS (Recommended for Troubleshooting)
**Best for:** Teams who want direct server access, familiar with Linux administration.
```
┌─────────────────────────────────────────────────────────────┐
│ VPS (4GB+ RAM) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Nginx │ │ Uvicorn │ │ PostgreSQL │ │
│ │ (reverse │ │ (4 workers)│ │ (local) │ │
│ │ proxy) │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Redis │ │ Celery │ │
│ │ (local) │ │ (workers) │ │
│ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Setup:**
```bash
# On Ubuntu 22.04+ VPS
# 1. Install system packages
sudo apt update
sudo apt install -y nginx postgresql-15 redis-server python3.11 python3.11-venv
# 2. Create application user
sudo useradd -m -s /bin/bash wizamart
sudo su - wizamart
# 3. Clone and setup
git clone <repo> /home/wizamart/app
cd /home/wizamart/app
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
nano .env # Edit with production values
# 5. Setup database
sudo -u postgres createuser wizamart_user
sudo -u postgres createdb wizamart_db -O wizamart_user
alembic upgrade head
python scripts/init_production.py
# 6. Create systemd service
sudo nano /etc/systemd/system/wizamart.service
```
**Systemd Service:**
```ini
# /etc/systemd/system/wizamart.service
[Unit]
Description=Wizamart API
After=network.target postgresql.service redis.service
[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/uvicorn main:app --host 127.0.0.1 --port 8000 --workers 4
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
```
**Celery Workers:**
```ini
# /etc/systemd/system/wizamart-celery.service
[Unit]
Description=Wizamart Celery Worker
After=network.target redis.service
[Service]
User=wizamart
Group=wizamart
WorkingDirectory=/home/wizamart/app
Environment="PATH=/home/wizamart/app/.venv/bin"
EnvironmentFile=/home/wizamart/app/.env
ExecStart=/home/wizamart/app/.venv/bin/celery -A app.celery worker --loglevel=info --concurrency=4
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
```
**Nginx Configuration:**
```nginx
# /etc/nginx/sites-available/wizamart
server {
listen 80;
server_name yourdomain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
# Static files (served directly by Nginx)
location /static {
alias /home/wizamart/app/static;
expires 30d;
add_header Cache-Control "public, immutable";
}
# Uploaded files
location /uploads {
alias /home/wizamart/app/uploads;
expires 7d;
}
# API and application
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (for future real-time features)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
```
**Troubleshooting Commands:**
```bash
# Check service status
sudo systemctl status wizamart
sudo systemctl status wizamart-celery
sudo systemctl status postgresql
sudo systemctl status redis
# View logs
sudo journalctl -u wizamart -f
sudo journalctl -u wizamart-celery -f
# Connect to database directly
sudo -u postgres psql wizamart_db
# Check Redis
redis-cli ping
redis-cli monitor # Watch commands in real-time
# Restart services
sudo systemctl restart wizamart
sudo systemctl restart wizamart-celery
```
---
### Option 2: Docker Compose Production
**Best for:** Consistent environments, easy rollbacks, container familiarity.
```yaml
# docker-compose.prod.yml
services:
api:
build: .
restart: always
ports:
- "127.0.0.1:8000:8000"
environment:
DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
REDIS_URL: redis://redis:6379/0
CELERY_BROKER_URL: redis://redis:6379/1
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
volumes:
- ./uploads:/app/uploads
- ./logs:/app/logs
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
celery:
build: .
restart: always
command: celery -A app.celery worker --loglevel=info --concurrency=4
environment:
DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
REDIS_URL: redis://redis:6379/0
CELERY_BROKER_URL: redis://redis:6379/1
depends_on:
- db
- redis
volumes:
- ./logs:/app/logs
celery-beat:
build: .
restart: always
command: celery -A app.celery beat --loglevel=info
environment:
DATABASE_URL: postgresql://wizamart_user:${DB_PASSWORD}@db:5432/wizamart_db
CELERY_BROKER_URL: redis://redis:6379/1
depends_on:
- redis
db:
image: postgres:15
restart: always
environment:
POSTGRES_DB: wizamart_db
POSTGRES_USER: wizamart_user
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U wizamart_user -d wizamart_db"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
restart: always
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
nginx:
image: nginx:alpine
restart: always
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./static:/app/static:ro
- ./uploads:/app/uploads:ro
- /etc/letsencrypt:/etc/letsencrypt:ro
depends_on:
- api
volumes:
postgres_data:
redis_data:
```
**Troubleshooting Commands:**
```bash
# View all containers
docker compose -f docker-compose.prod.yml ps
# View logs
docker compose -f docker-compose.prod.yml logs -f api
docker compose -f docker-compose.prod.yml logs -f celery
# Access container shell
docker compose -f docker-compose.prod.yml exec api bash
docker compose -f docker-compose.prod.yml exec db psql -U wizamart_user -d wizamart_db
# Restart specific service
docker compose -f docker-compose.prod.yml restart api
# View resource usage
docker stats
```
---
### Option 3: Managed Services (Minimal Ops)
**Best for:** Small teams, focus on product not infrastructure.
| Component | Service | Cost (approx) |
|-----------|---------|---------------|
| App Hosting | Railway / Render / Fly.io | $5-25/mo |
| Database | Neon / Supabase / PlanetScale | $0-25/mo |
| Redis | Upstash / Redis Cloud | $0-10/mo |
| File Storage | Cloudflare R2 / AWS S3 | $0-5/mo |
| Email | Resend / SendGrid | $0-20/mo |
**Example: Railway + Neon**
```bash
# Deploy to Railway
railway login
railway init
railway up
# Configure environment
railway variables set DATABASE_URL="postgresql://..."
railway variables set REDIS_URL="redis://..."
```
---
## Future High-End Architecture
### Target Production Architecture
```
┌─────────────────┐
│ CloudFlare │
│ (CDN + WAF) │
└────────┬────────┘
┌────────▼────────┐
│ Load Balancer │
│ (HA Proxy/ALB) │
└────────┬────────┘
┌──────────────────────┼──────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ API Server 1 │ │ API Server 2 │ │ API Server N │
│ (Uvicorn) │ │ (Uvicorn) │ │ (Uvicorn) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└──────────────────────┼──────────────────────┘
┌───────────────────────────┼───────────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ PostgreSQL │ │ Redis │ │ S3 / MinIO │
│ (Primary) │ │ (Cluster) │ │ (Files) │
│ │ │ │ │ │ │
│ ┌────▼────┐ │ │ ┌─────────┐ │ │ │
│ │ Replica │ │ │ │ Sentinel│ │ │ │
│ └─────────┘ │ │ └─────────┘ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌──────────────────────┼──────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ Celery Worker 1 │ │ Celery Worker 2 │ │ Celery Beat │
│ (General) │ │ (Import Jobs) │ │ (Scheduler) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────────────────┐
│ Monitoring Stack │
│ ┌─────────┐ ┌───────────┐ │
│ │Prometheus│ │ Grafana │ │
│ └─────────┘ └───────────┘ │
│ ┌─────────┐ ┌───────────┐ │
│ │ Sentry │ │ Loki │ │
│ └─────────┘ └───────────┘ │
└─────────────────────────────┘
```
### Celery Task Queues
```python
# app/celery.py (to be implemented)
from celery import Celery
celery_app = Celery(
"wizamart",
broker=settings.celery_broker_url,
backend=settings.celery_result_backend,
)
celery_app.conf.task_queues = {
"default": {"exchange": "default", "routing_key": "default"},
"imports": {"exchange": "imports", "routing_key": "imports"},
"emails": {"exchange": "emails", "routing_key": "emails"},
"reports": {"exchange": "reports", "routing_key": "reports"},
}
celery_app.conf.task_routes = {
"app.tasks.import_letzshop_products": {"queue": "imports"},
"app.tasks.send_email": {"queue": "emails"},
"app.tasks.generate_report": {"queue": "reports"},
}
```
### Background Tasks to Implement
| Task | Queue | Priority | Description |
|------|-------|----------|-------------|
| `import_letzshop_products` | imports | High | Marketplace product sync |
| `import_letzshop_orders` | imports | High | Order sync from Letzshop |
| `send_order_confirmation` | emails | High | Order emails |
| `send_password_reset` | emails | High | Auth emails |
| `send_invoice_email` | emails | Medium | Invoice delivery |
| `generate_sales_report` | reports | Low | Analytics reports |
| `cleanup_expired_sessions` | default | Low | Maintenance |
| `sync_stripe_subscriptions` | default | Medium | Billing sync |
---
## Component Deep Dives
### PostgreSQL Configuration
**Production Settings (`postgresql.conf`):**
```ini
# Memory (adjust based on server RAM)
shared_buffers = 256MB # 25% of RAM for dedicated DB server
effective_cache_size = 768MB # 75% of RAM
work_mem = 16MB
maintenance_work_mem = 128MB
# Connections
max_connections = 100
# Write-Ahead Log
wal_level = replica
max_wal_senders = 3
# Query Planning
random_page_cost = 1.1 # For SSD storage
effective_io_concurrency = 200 # For SSD storage
# Logging
log_min_duration_statement = 1000 # Log queries > 1 second
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
```
**Backup Strategy:**
```bash
# Daily backup script
#!/bin/bash
BACKUP_DIR=/backups/postgresql
DATE=$(date +%Y%m%d_%H%M%S)
pg_dump -U wizamart_user wizamart_db | gzip > $BACKUP_DIR/wizamart_$DATE.sql.gz
# Keep last 7 days
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete
```
### Redis Configuration
**Use Cases:**
| Use Case | Database | TTL | Description |
|----------|----------|-----|-------------|
| Session Cache | 0 | 24h | User sessions |
| API Rate Limiting | 0 | 1h | Request counters |
| Celery Broker | 1 | - | Task queue |
| Celery Results | 2 | 24h | Task results |
| Feature Flags | 3 | 5m | Feature gate cache |
**Configuration (`redis.conf`):**
```ini
maxmemory 256mb
maxmemory-policy allkeys-lru
appendonly yes
appendfsync everysec
```
### Nginx Tuning
```nginx
# /etc/nginx/nginx.conf
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
http {
# Buffers
client_body_buffer_size 10K;
client_header_buffer_size 1k;
client_max_body_size 50M;
large_client_header_buffers 2 1k;
# Timeouts
client_body_timeout 12;
client_header_timeout 12;
keepalive_timeout 15;
send_timeout 10;
# Gzip
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml application/json application/javascript;
}
```
---
## Troubleshooting Guide
### Quick Diagnostics
```bash
# Check all services
systemctl status wizamart wizamart-celery postgresql redis nginx
# Check ports
ss -tlnp | grep -E '(8000|5432|6379|80|443)'
# Check disk space
df -h
# Check memory
free -h
# Check CPU/processes
htop
```
### Database Issues
```bash
# Connect to database
sudo -u postgres psql wizamart_db
# Check active connections
SELECT count(*) FROM pg_stat_activity;
# Find slow queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
# Kill stuck query
SELECT pg_terminate_backend(pid);
# Check table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
# Analyze query performance
EXPLAIN ANALYZE SELECT ...;
```
### Redis Issues
```bash
# Check connectivity
redis-cli ping
# Monitor real-time commands
redis-cli monitor
# Check memory usage
redis-cli info memory
# List all keys (careful in production!)
redis-cli --scan
# Check queue lengths
redis-cli llen celery
# Flush specific database
redis-cli -n 1 flushdb # Flush Celery broker
```
### Celery Issues
```bash
# Check worker status
celery -A app.celery inspect active
celery -A app.celery inspect reserved
celery -A app.celery inspect stats
# Purge all pending tasks
celery -A app.celery purge
# List registered tasks
celery -A app.celery inspect registered
```
### Application Issues
```bash
# Check API health
curl -s http://localhost:8000/health | jq
# View recent logs
journalctl -u wizamart --since "10 minutes ago"
# Check for Python errors
journalctl -u wizamart | grep -i error | tail -20
# Test database connection
python -c "from app.core.database import engine; print(engine.connect())"
```
### Common Problems & Solutions
| Problem | Diagnosis | Solution |
|---------|-----------|----------|
| 502 Bad Gateway | `systemctl status wizamart` | Restart app: `systemctl restart wizamart` |
| Database connection refused | `pg_isready` | Start PostgreSQL: `systemctl start postgresql` |
| High memory usage | `free -h`, `ps aux --sort=-%mem` | Restart app, check for memory leaks |
| Slow queries | PostgreSQL slow query log | Add indexes, optimize queries |
| Celery tasks stuck | `celery inspect active` | Restart workers, check Redis |
| Disk full | `df -h` | Clean logs, backups, temp files |
---
## Decision Matrix
### When to Use Each Option
| Scenario | Recommended | Reason |
|----------|-------------|--------|
| Solo developer, MVP | Managed (Railway) | Focus on product |
| Small team, budget conscious | Traditional VPS | Full control, low cost |
| Need direct DB access for debugging | Traditional VPS | Direct psql access |
| Familiar with Docker, want consistency | Docker Compose | Reproducible environments |
| High availability required | Docker + Orchestration | Easy scaling |
| Enterprise, compliance requirements | Kubernetes | Full orchestration |
### Cost Comparison (Monthly)
| Setup | Low Traffic | Medium | High |
|-------|-------------|--------|------|
| Managed (Railway + Neon) | $10 | $50 | $200+ |
| VPS (Hetzner/DigitalOcean) | $5 | $20 | $80 |
| Docker on VPS | $5 | $20 | $80 |
| AWS/GCP Full Stack | $50 | $200 | $1000+ |
---
## Migration Path
### Phase 1: Current (Development)
- ✅ PostgreSQL (Docker)
- ✅ FastAPI + Uvicorn
- ✅ Local file storage
### Phase 2: Production MVP
- ✅ PostgreSQL (managed or VPS)
- ✅ FastAPI + Uvicorn (systemd or Docker)
- ⏳ Redis (session cache)
- ⏳ Celery (background jobs)
- ⏳ S3/MinIO (file storage)
### Phase 3: Scale
- Horizontal app scaling (multiple Uvicorn instances)
- PostgreSQL read replicas
- Redis cluster
- CDN for static assets
- Dedicated Celery workers per queue
### Phase 4: High Availability
- Multi-region deployment
- Database failover
- Container orchestration (Kubernetes)
- Full monitoring stack
---
## Next Steps
1. **Add Redis to docker-compose.yml** - For session cache
2. **Implement Celery** - Start with email and import tasks
3. **Configure S3/MinIO** - For production file storage
4. **Set up Sentry** - Error tracking
5. **Choose production deployment** - VPS or Docker based on team preference
See also:
- [Production Deployment Guide](production.md)
- [Docker Deployment](docker.md)
- [Environment Configuration](environment.md)