Replace all ~1,086 occurrences of Wizamart/wizamart/WIZAMART/WizaMart with Orion/orion/ORION across 184 files. This includes database identifiers, email addresses, domain references, R2 bucket names, DNS prefixes, encryption salt, Celery app name, config defaults, Docker configs, CI configs, documentation, seed data, and templates. Renames homepage-wizamart.html template to homepage-orion.html. Fixes duplicate file_pattern key in api.yaml architecture rule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
964 lines
31 KiB
Markdown
964 lines
31 KiB
Markdown
# Infrastructure Guide
|
|
|
|
This guide documents the complete infrastructure for the Orion platform, from development to high-end production.
|
|
|
|
**Philosophy:** We prioritize **debuggability and operational simplicity** over complexity. Every component should be directly accessible for troubleshooting.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
- [Architecture Overview](#architecture-overview)
|
|
- [Current State](#current-state)
|
|
- [Development Environment](#development-environment)
|
|
- [Production Options](#production-options)
|
|
- [Future High-End Architecture](#future-high-end-architecture)
|
|
- [Component Deep Dives](#component-deep-dives)
|
|
- [Troubleshooting Guide](#troubleshooting-guide)
|
|
- [Decision Matrix](#decision-matrix)
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
### System Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ CLIENTS │
|
|
│ (Browsers, Mobile Apps, API Consumers) │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ LOAD BALANCER / PROXY │
|
|
│ (Nginx, Caddy, or Cloud LB) │
|
|
│ - SSL termination │
|
|
│ - Static file serving │
|
|
│ - Rate limiting │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌───────────────┼───────────────┐
|
|
▼ ▼ ▼
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ APPLICATION SERVERS │
|
|
│ (FastAPI + Uvicorn) │
|
|
│ - API endpoints │
|
|
│ - HTML rendering (Jinja2) │
|
|
│ - WebSocket connections │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
|
|
│ PostgreSQL │ │ Redis │ │ File Storage │
|
|
│ (Primary DB) │ │ (Cache/Queue) │ │ (S3/Local) │
|
|
└──────────────────┘ └──────────────────┘ └──────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Celery Workers │
|
|
│ (Background Jobs)│
|
|
└──────────────────┘
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
1. **Request** → Nginx → Uvicorn → FastAPI → Service Layer → Database
|
|
2. **Background Job** → API creates task → Redis Queue → Celery Worker → Database
|
|
3. **Static Files** → Nginx serves directly (or CDN in production)
|
|
|
|
---
|
|
|
|
## Current State
|
|
|
|
### What We Have Now
|
|
|
|
| Component | Technology | Dev Required | Prod Required | Status |
|
|
|-----------|------------|--------------|---------------|--------|
|
|
| Web Framework | FastAPI + Uvicorn | ✅ | ✅ | ✅ Production Ready |
|
|
| Database | PostgreSQL 15 | ✅ | ✅ | ✅ Production Ready |
|
|
| ORM | SQLAlchemy 2.0 | ✅ | ✅ | ✅ Production Ready |
|
|
| Migrations | Alembic | ✅ | ✅ | ✅ Production Ready |
|
|
| Templates | Jinja2 + Tailwind CSS | ✅ | ✅ | ✅ Production Ready |
|
|
| Authentication | JWT (PyJWT) | ✅ | ✅ | ✅ Production Ready |
|
|
| Email | SMTP/SendGrid/Mailgun/SES | ❌ | ✅ | ✅ Production Ready |
|
|
| Payments | Stripe | ❌ | ✅ | ✅ Production Ready |
|
|
| Task Queue | Celery 5.3 + Redis | ❌ | ✅ | ✅ Production Ready |
|
|
| Task Scheduler | Celery Beat | ❌ | ✅ | ✅ Production Ready |
|
|
| Task Monitoring | Flower | ❌ | ⚪ Optional | ✅ Production Ready |
|
|
| Caching | Redis 7 | ❌ | ✅ | ✅ Production Ready |
|
|
| File Storage | Local / Cloudflare R2 | Local | R2 | ✅ Production Ready |
|
|
| Error Tracking | Sentry | ❌ | ⚪ Recommended | ✅ Production Ready |
|
|
| CDN / WAF | CloudFlare | ❌ | ⚪ Recommended | ✅ Production Ready |
|
|
|
|
**Legend:** ✅ Required | ⚪ Optional/Recommended | ❌ Not needed
|
|
|
|
### Development vs Production
|
|
|
|
**Development** requires only:
|
|
- PostgreSQL (via Docker: `make docker-up`)
|
|
- Python 3.11+ with dependencies
|
|
|
|
**Production** adds:
|
|
- Redis (for Celery task queue)
|
|
- Celery workers (for background tasks)
|
|
- Reverse proxy (Nginx)
|
|
- SSL certificates
|
|
|
|
**Optional but recommended for Production:**
|
|
- Sentry (error tracking) - Set `SENTRY_DSN` to enable
|
|
- Cloudflare R2 (cloud storage) - Set `STORAGE_BACKEND=r2` to enable
|
|
- CloudFlare CDN (caching/DDoS) - Set `CLOUDFLARE_ENABLED=true` to enable
|
|
|
|
### What We Need for Enterprise (Future Growth)
|
|
|
|
| Component | Priority | When Needed | Estimated Users |
|
|
|-----------|----------|-------------|-----------------|
|
|
| Load Balancer | Medium | Horizontal scaling | 1,000+ concurrent |
|
|
| Database Replica | Medium | Read-heavy workloads | 1,000+ concurrent |
|
|
| Redis Sentinel | Low | Cache redundancy | 5,000+ concurrent |
|
|
| Prometheus/Grafana | Low | Advanced metrics | Any (nice to have) |
|
|
| Kubernetes | Low | Multi-region/HA | 10,000+ concurrent |
|
|
|
|
---
|
|
|
|
## Development Environment
|
|
|
|
### Local Setup (Recommended)
|
|
|
|
```bash
|
|
# 1. Start PostgreSQL and Redis
|
|
make docker-up
|
|
|
|
# 2. Run migrations
|
|
make migrate-up
|
|
|
|
# 3. Initialize data
|
|
make init-prod
|
|
|
|
# 4. Start development server
|
|
make dev
|
|
|
|
# 5. (Optional) Start Celery worker for background tasks
|
|
make celery-dev # Worker + Beat together
|
|
|
|
# 6. (Optional) Run tests
|
|
make test
|
|
```
|
|
|
|
### Services Running Locally
|
|
|
|
| Service | Host | Port | Purpose |
|
|
|---------|------|------|---------|
|
|
| FastAPI | localhost | 8000 | Main application |
|
|
| PostgreSQL | localhost | 5432 | Development database |
|
|
| PostgreSQL (test) | localhost | 5433 | Test database |
|
|
| Redis | localhost | 6380 | Cache and task broker |
|
|
| Celery Worker | - | - | Background task processing |
|
|
| Celery Beat | - | - | Scheduled task scheduler |
|
|
| Flower | localhost | 5555 | Task monitoring dashboard |
|
|
| MkDocs | localhost | 9991 | Documentation |
|
|
|
|
### Docker Compose Services
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
services:
|
|
db: # PostgreSQL 15 for development
|
|
redis: # Redis 7 for cache/queue
|
|
api: # FastAPI application (profile: full)
|
|
celery-worker: # Background task processor (profile: full)
|
|
celery-beat: # Scheduled task scheduler (profile: full)
|
|
flower: # Task monitoring UI (profile: full)
|
|
```
|
|
|
|
### Celery Commands
|
|
|
|
```bash
|
|
# Start worker only
|
|
make celery-worker
|
|
|
|
# Start scheduler only
|
|
make celery-beat
|
|
|
|
# Start worker + scheduler together (development)
|
|
make celery-dev
|
|
|
|
# Start Flower monitoring
|
|
make flower
|
|
|
|
# Check worker status
|
|
make celery-status
|
|
|
|
# Purge pending tasks
|
|
make celery-purge
|
|
```
|
|
|
|
---
|
|
|
|
## Production Options
|
|
|
|
### Option 1: Traditional VPS (Recommended for Troubleshooting)
|
|
|
|
**Best for:** Teams who want direct server access, familiar with Linux administration.
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ VPS (4GB+ RAM) │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ Nginx │ │ Uvicorn │ │ PostgreSQL │ │
|
|
│ │ (reverse │ │ (4 workers)│ │ (local) │ │
|
|
│ │ proxy) │ │ │ │ │ │
|
|
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
|
│ │ │ │ │
|
|
│ └────────────────┼────────────────┘ │
|
|
│ │ │
|
|
│ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ Redis │ │ Celery │ │
|
|
│ │ (local) │ │ (workers) │ │
|
|
│ └─────────────┘ └─────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Setup:**
|
|
|
|
```bash
|
|
# On Ubuntu 22.04+ VPS
|
|
|
|
# 1. Install system packages
|
|
sudo apt update
|
|
sudo apt install -y nginx postgresql-15 redis-server python3.11 python3.11-venv
|
|
|
|
# 2. Create application user
|
|
sudo useradd -m -s /bin/bash orion
|
|
sudo su - orion
|
|
|
|
# 3. Clone and setup
|
|
git clone <repo> /home/orion/app
|
|
cd /home/orion/app
|
|
python3.11 -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
|
|
# 4. Configure environment
|
|
cp .env.example .env
|
|
nano .env # Edit with production values
|
|
|
|
# 5. Setup database
|
|
sudo -u postgres createuser orion_user
|
|
sudo -u postgres createdb orion_db -O orion_user
|
|
alembic upgrade head
|
|
python scripts/seed/init_production.py
|
|
|
|
# 6. Create systemd service
|
|
sudo nano /etc/systemd/system/orion.service
|
|
```
|
|
|
|
**Systemd Service:**
|
|
|
|
```ini
|
|
# /etc/systemd/system/orion.service
|
|
[Unit]
|
|
Description=Orion API
|
|
After=network.target postgresql.service redis.service
|
|
|
|
[Service]
|
|
User=orion
|
|
Group=orion
|
|
WorkingDirectory=/home/orion/app
|
|
Environment="PATH=/home/orion/app/.venv/bin"
|
|
EnvironmentFile=/home/orion/app/.env
|
|
ExecStart=/home/orion/app/.venv/bin/uvicorn main:app --host 127.0.0.1 --port 8000 --workers 4
|
|
Restart=always
|
|
RestartSec=3
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
**Celery Workers:**
|
|
|
|
```ini
|
|
# /etc/systemd/system/orion-celery.service
|
|
[Unit]
|
|
Description=Orion Celery Worker
|
|
After=network.target redis.service
|
|
|
|
[Service]
|
|
User=orion
|
|
Group=orion
|
|
WorkingDirectory=/home/orion/app
|
|
Environment="PATH=/home/orion/app/.venv/bin"
|
|
EnvironmentFile=/home/orion/app/.env
|
|
ExecStart=/home/orion/app/.venv/bin/celery -A app.celery worker --loglevel=info --concurrency=4
|
|
Restart=always
|
|
RestartSec=3
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
**Nginx Configuration:**
|
|
|
|
```nginx
|
|
# /etc/nginx/sites-available/orion
|
|
server {
|
|
listen 80;
|
|
server_name yourdomain.com;
|
|
return 301 https://$server_name$request_uri;
|
|
}
|
|
|
|
server {
|
|
listen 443 ssl http2;
|
|
server_name yourdomain.com;
|
|
|
|
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
|
|
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
|
|
|
|
# Security headers
|
|
add_header X-Frame-Options "SAMEORIGIN" always;
|
|
add_header X-Content-Type-Options "nosniff" always;
|
|
add_header X-XSS-Protection "1; mode=block" always;
|
|
|
|
# Static files (served directly by Nginx)
|
|
location /static {
|
|
alias /home/orion/app/static;
|
|
expires 30d;
|
|
add_header Cache-Control "public, immutable";
|
|
}
|
|
|
|
# Uploaded files
|
|
location /uploads {
|
|
alias /home/orion/app/uploads;
|
|
expires 7d;
|
|
}
|
|
|
|
# API and application
|
|
location / {
|
|
proxy_pass http://127.0.0.1:8000;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
|
|
# WebSocket support (for future real-time features)
|
|
proxy_http_version 1.1;
|
|
proxy_set_header Upgrade $http_upgrade;
|
|
proxy_set_header Connection "upgrade";
|
|
}
|
|
}
|
|
```
|
|
|
|
**Troubleshooting Commands:**
|
|
|
|
```bash
|
|
# Check service status
|
|
sudo systemctl status orion
|
|
sudo systemctl status orion-celery
|
|
sudo systemctl status postgresql
|
|
sudo systemctl status redis
|
|
|
|
# View logs
|
|
sudo journalctl -u orion -f
|
|
sudo journalctl -u orion-celery -f
|
|
|
|
# Connect to database directly
|
|
sudo -u postgres psql orion_db
|
|
|
|
# Check Redis
|
|
redis-cli ping
|
|
redis-cli monitor # Watch commands in real-time
|
|
|
|
# Restart services
|
|
sudo systemctl restart orion
|
|
sudo systemctl restart orion-celery
|
|
```
|
|
|
|
---
|
|
|
|
### Option 2: Docker Compose Production
|
|
|
|
**Best for:** Consistent environments, easy rollbacks, container familiarity.
|
|
|
|
```yaml
|
|
# docker-compose.prod.yml
|
|
services:
|
|
api:
|
|
build: .
|
|
restart: always
|
|
ports:
|
|
- "127.0.0.1:8000:8000"
|
|
environment:
|
|
DATABASE_URL: postgresql://orion_user:${DB_PASSWORD}@db:5432/orion_db
|
|
REDIS_URL: redis://redis:6379/0
|
|
CELERY_BROKER_URL: redis://redis:6379/1
|
|
depends_on:
|
|
db:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
volumes:
|
|
- ./uploads:/app/uploads
|
|
- ./logs:/app/logs
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
|
|
celery:
|
|
build: .
|
|
restart: always
|
|
command: celery -A app.celery worker --loglevel=info --concurrency=4
|
|
environment:
|
|
DATABASE_URL: postgresql://orion_user:${DB_PASSWORD}@db:5432/orion_db
|
|
REDIS_URL: redis://redis:6379/0
|
|
CELERY_BROKER_URL: redis://redis:6379/1
|
|
depends_on:
|
|
- db
|
|
- redis
|
|
volumes:
|
|
- ./logs:/app/logs
|
|
|
|
celery-beat:
|
|
build: .
|
|
restart: always
|
|
command: celery -A app.celery beat --loglevel=info
|
|
environment:
|
|
DATABASE_URL: postgresql://orion_user:${DB_PASSWORD}@db:5432/orion_db
|
|
CELERY_BROKER_URL: redis://redis:6379/1
|
|
depends_on:
|
|
- redis
|
|
|
|
db:
|
|
image: postgres:15
|
|
restart: always
|
|
environment:
|
|
POSTGRES_DB: orion_db
|
|
POSTGRES_USER: orion_user
|
|
POSTGRES_PASSWORD: ${DB_PASSWORD}
|
|
volumes:
|
|
- postgres_data:/var/lib/postgresql/data
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U orion_user -d orion_db"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
|
|
redis:
|
|
image: redis:7-alpine
|
|
restart: always
|
|
volumes:
|
|
- redis_data:/data
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "ping"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
|
|
nginx:
|
|
image: nginx:alpine
|
|
restart: always
|
|
ports:
|
|
- "80:80"
|
|
- "443:443"
|
|
volumes:
|
|
- ./nginx.conf:/etc/nginx/nginx.conf:ro
|
|
- ./static:/app/static:ro
|
|
- ./uploads:/app/uploads:ro
|
|
- /etc/letsencrypt:/etc/letsencrypt:ro
|
|
depends_on:
|
|
- api
|
|
|
|
volumes:
|
|
postgres_data:
|
|
redis_data:
|
|
```
|
|
|
|
**Troubleshooting Commands:**
|
|
|
|
```bash
|
|
# View all containers
|
|
docker compose -f docker-compose.prod.yml ps
|
|
|
|
# View logs
|
|
docker compose -f docker-compose.prod.yml logs -f api
|
|
docker compose -f docker-compose.prod.yml logs -f celery
|
|
|
|
# Access container shell
|
|
docker compose -f docker-compose.prod.yml exec api bash
|
|
docker compose -f docker-compose.prod.yml exec db psql -U orion_user -d orion_db
|
|
|
|
# Restart specific service
|
|
docker compose -f docker-compose.prod.yml restart api
|
|
|
|
# View resource usage
|
|
docker stats
|
|
```
|
|
|
|
---
|
|
|
|
### Option 3: Managed Services (Minimal Ops)
|
|
|
|
**Best for:** Small teams, focus on product not infrastructure.
|
|
|
|
| Component | Service | Cost (approx) |
|
|
|-----------|---------|---------------|
|
|
| App Hosting | Railway / Render / Fly.io | $5-25/mo |
|
|
| Database | Neon / Supabase / PlanetScale | $0-25/mo |
|
|
| Redis | Upstash / Redis Cloud | $0-10/mo |
|
|
| File Storage | Cloudflare R2 / AWS S3 | $0-5/mo |
|
|
| Email | Resend / SendGrid | $0-20/mo |
|
|
|
|
**Example: Railway + Neon**
|
|
|
|
```bash
|
|
# Deploy to Railway
|
|
railway login
|
|
railway init
|
|
railway up
|
|
|
|
# Configure environment
|
|
railway variables set DATABASE_URL="postgresql://..."
|
|
railway variables set REDIS_URL="redis://..."
|
|
```
|
|
|
|
---
|
|
|
|
## Future High-End Architecture
|
|
|
|
### Target Production Architecture
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ CloudFlare │
|
|
│ (CDN + WAF) │
|
|
└────────┬────────┘
|
|
│
|
|
┌────────▼────────┐
|
|
│ Load Balancer │
|
|
│ (HA Proxy/ALB) │
|
|
└────────┬────────┘
|
|
│
|
|
┌──────────────────────┼──────────────────────┐
|
|
│ │ │
|
|
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
|
|
│ API Server 1 │ │ API Server 2 │ │ API Server N │
|
|
│ (Uvicorn) │ │ (Uvicorn) │ │ (Uvicorn) │
|
|
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
|
|
│ │ │
|
|
└──────────────────────┼──────────────────────┘
|
|
│
|
|
┌───────────────────────────┼───────────────────────────┐
|
|
│ │ │
|
|
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
|
|
│ PostgreSQL │ │ Redis │ │ S3 / MinIO │
|
|
│ (Primary) │ │ (Cluster) │ │ (Files) │
|
|
│ │ │ │ │ │ │
|
|
│ ┌────▼────┐ │ │ ┌─────────┐ │ │ │
|
|
│ │ Replica │ │ │ │ Sentinel│ │ │ │
|
|
│ └─────────┘ │ │ └─────────┘ │ │ │
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
│
|
|
┌──────────────────────┼──────────────────────┐
|
|
│ │ │
|
|
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
|
|
│ Celery Worker 1 │ │ Celery Worker 2 │ │ Celery Beat │
|
|
│ (General) │ │ (Import Jobs) │ │ (Scheduler) │
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
|
|
┌─────────────────────────────┐
|
|
│ Monitoring Stack │
|
|
│ ┌─────────┐ ┌───────────┐ │
|
|
│ │Prometheus│ │ Grafana │ │
|
|
│ └─────────┘ └───────────┘ │
|
|
│ ┌─────────┐ ┌───────────┐ │
|
|
│ │ Sentry │ │ Loki │ │
|
|
│ └─────────┘ └───────────┘ │
|
|
└─────────────────────────────┘
|
|
```
|
|
|
|
### Celery Task Queues
|
|
|
|
```python
|
|
# app/celery.py (to be implemented)
|
|
from celery import Celery
|
|
|
|
celery_app = Celery(
|
|
"orion",
|
|
broker=settings.celery_broker_url,
|
|
backend=settings.celery_result_backend,
|
|
)
|
|
|
|
celery_app.conf.task_queues = {
|
|
"default": {"exchange": "default", "routing_key": "default"},
|
|
"imports": {"exchange": "imports", "routing_key": "imports"},
|
|
"emails": {"exchange": "emails", "routing_key": "emails"},
|
|
"reports": {"exchange": "reports", "routing_key": "reports"},
|
|
}
|
|
|
|
celery_app.conf.task_routes = {
|
|
"app.tasks.import_letzshop_products": {"queue": "imports"},
|
|
"app.tasks.send_email": {"queue": "emails"},
|
|
"app.tasks.generate_report": {"queue": "reports"},
|
|
}
|
|
```
|
|
|
|
### Background Tasks to Implement
|
|
|
|
| Task | Queue | Priority | Description |
|
|
|------|-------|----------|-------------|
|
|
| `import_letzshop_products` | imports | High | Marketplace product sync |
|
|
| `import_letzshop_orders` | imports | High | Order sync from Letzshop |
|
|
| `send_order_confirmation` | emails | High | Order emails |
|
|
| `send_password_reset` | emails | High | Auth emails |
|
|
| `send_invoice_email` | emails | Medium | Invoice delivery |
|
|
| `generate_sales_report` | reports | Low | Analytics reports |
|
|
| `cleanup_expired_sessions` | default | Low | Maintenance |
|
|
| `sync_stripe_subscriptions` | default | Medium | Billing sync |
|
|
|
|
---
|
|
|
|
## Component Deep Dives
|
|
|
|
### PostgreSQL Configuration
|
|
|
|
**Production Settings (`postgresql.conf`):**
|
|
|
|
```ini
|
|
# Memory (adjust based on server RAM)
|
|
shared_buffers = 256MB # 25% of RAM for dedicated DB server
|
|
effective_cache_size = 768MB # 75% of RAM
|
|
work_mem = 16MB
|
|
maintenance_work_mem = 128MB
|
|
|
|
# Connections
|
|
max_connections = 100
|
|
|
|
# Write-Ahead Log
|
|
wal_level = replica
|
|
max_wal_senders = 3
|
|
|
|
# Query Planning
|
|
random_page_cost = 1.1 # For SSD storage
|
|
effective_io_concurrency = 200 # For SSD storage
|
|
|
|
# Logging
|
|
log_min_duration_statement = 1000 # Log queries > 1 second
|
|
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
|
|
```
|
|
|
|
**Backup Strategy:**
|
|
|
|
```bash
|
|
# Daily backup script
|
|
#!/bin/bash
|
|
BACKUP_DIR=/backups/postgresql
|
|
DATE=$(date +%Y%m%d_%H%M%S)
|
|
pg_dump -U orion_user orion_db | gzip > $BACKUP_DIR/orion_$DATE.sql.gz
|
|
|
|
# Keep last 7 days
|
|
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete
|
|
```
|
|
|
|
### Redis Configuration
|
|
|
|
**Use Cases:**
|
|
|
|
| Use Case | Database | TTL | Description |
|
|
|----------|----------|-----|-------------|
|
|
| Session Cache | 0 | 24h | User sessions |
|
|
| API Rate Limiting | 0 | 1h | Request counters |
|
|
| Celery Broker | 1 | - | Task queue |
|
|
| Celery Results | 2 | 24h | Task results |
|
|
| Feature Flags | 3 | 5m | Feature gate cache |
|
|
|
|
**Configuration (`redis.conf`):**
|
|
|
|
```ini
|
|
maxmemory 256mb
|
|
maxmemory-policy allkeys-lru
|
|
appendonly yes
|
|
appendfsync everysec
|
|
```
|
|
|
|
### Nginx Tuning
|
|
|
|
```nginx
|
|
# /etc/nginx/nginx.conf
|
|
worker_processes auto;
|
|
worker_rlimit_nofile 65535;
|
|
|
|
events {
|
|
worker_connections 4096;
|
|
use epoll;
|
|
multi_accept on;
|
|
}
|
|
|
|
http {
|
|
# Buffers
|
|
client_body_buffer_size 10K;
|
|
client_header_buffer_size 1k;
|
|
client_max_body_size 50M;
|
|
large_client_header_buffers 2 1k;
|
|
|
|
# Timeouts
|
|
client_body_timeout 12;
|
|
client_header_timeout 12;
|
|
keepalive_timeout 15;
|
|
send_timeout 10;
|
|
|
|
# Gzip
|
|
gzip on;
|
|
gzip_vary on;
|
|
gzip_proxied any;
|
|
gzip_comp_level 6;
|
|
gzip_types text/plain text/css text/xml application/json application/javascript;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting Guide
|
|
|
|
### Quick Diagnostics
|
|
|
|
```bash
|
|
# Check all services
|
|
systemctl status orion orion-celery postgresql redis nginx
|
|
|
|
# Check ports
|
|
ss -tlnp | grep -E '(8000|5432|6379|80|443)'
|
|
|
|
# Check disk space
|
|
df -h
|
|
|
|
# Check memory
|
|
free -h
|
|
|
|
# Check CPU/processes
|
|
htop
|
|
```
|
|
|
|
### Database Issues
|
|
|
|
```bash
|
|
# Connect to database
|
|
sudo -u postgres psql orion_db
|
|
|
|
# Check active connections
|
|
SELECT count(*) FROM pg_stat_activity;
|
|
|
|
# Find slow queries
|
|
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
|
|
FROM pg_stat_activity
|
|
WHERE state != 'idle'
|
|
ORDER BY duration DESC;
|
|
|
|
# Kill stuck query
|
|
SELECT pg_terminate_backend(pid);
|
|
|
|
# Check table sizes
|
|
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
|
|
FROM pg_catalog.pg_statio_user_tables
|
|
ORDER BY pg_total_relation_size(relid) DESC;
|
|
|
|
# Analyze query performance
|
|
EXPLAIN ANALYZE SELECT ...;
|
|
```
|
|
|
|
### Redis Issues
|
|
|
|
```bash
|
|
# Check connectivity
|
|
redis-cli ping
|
|
|
|
# Monitor real-time commands
|
|
redis-cli monitor
|
|
|
|
# Check memory usage
|
|
redis-cli info memory
|
|
|
|
# List all keys (careful in production!)
|
|
redis-cli --scan
|
|
|
|
# Check queue lengths
|
|
redis-cli llen celery
|
|
|
|
# Flush specific database
|
|
redis-cli -n 1 flushdb # Flush Celery broker
|
|
```
|
|
|
|
### Celery Issues
|
|
|
|
```bash
|
|
# Check worker status
|
|
celery -A app.celery inspect active
|
|
celery -A app.celery inspect reserved
|
|
celery -A app.celery inspect stats
|
|
|
|
# Purge all pending tasks
|
|
celery -A app.celery purge
|
|
|
|
# List registered tasks
|
|
celery -A app.celery inspect registered
|
|
```
|
|
|
|
### Application Issues
|
|
|
|
```bash
|
|
# Check API health
|
|
curl -s http://localhost:8000/health | jq
|
|
|
|
# View recent logs
|
|
journalctl -u orion --since "10 minutes ago"
|
|
|
|
# Check for Python errors
|
|
journalctl -u orion | grep -i error | tail -20
|
|
|
|
# Test database connection
|
|
python -c "from app.core.database import engine; print(engine.connect())"
|
|
```
|
|
|
|
### Common Problems & Solutions
|
|
|
|
| Problem | Diagnosis | Solution |
|
|
|---------|-----------|----------|
|
|
| 502 Bad Gateway | `systemctl status orion` | Restart app: `systemctl restart orion` |
|
|
| Database connection refused | `pg_isready` | Start PostgreSQL: `systemctl start postgresql` |
|
|
| High memory usage | `free -h`, `ps aux --sort=-%mem` | Restart app, check for memory leaks |
|
|
| Slow queries | PostgreSQL slow query log | Add indexes, optimize queries |
|
|
| Celery tasks stuck | `celery inspect active` | Restart workers, check Redis |
|
|
| Disk full | `df -h` | Clean logs, backups, temp files |
|
|
|
|
---
|
|
|
|
## Decision Matrix
|
|
|
|
### When to Use Each Option
|
|
|
|
| Scenario | Recommended | Reason |
|
|
|----------|-------------|--------|
|
|
| Solo developer, MVP | Managed (Railway) | Focus on product |
|
|
| Small team, budget conscious | Traditional VPS | Full control, low cost |
|
|
| Need direct DB access for debugging | Traditional VPS | Direct psql access |
|
|
| Familiar with Docker, want consistency | Docker Compose | Reproducible environments |
|
|
| High availability required | Docker + Orchestration | Easy scaling |
|
|
| Enterprise, compliance requirements | Kubernetes | Full orchestration |
|
|
|
|
### Cost Comparison (Monthly)
|
|
|
|
| Setup | Low Traffic | Medium | High |
|
|
|-------|-------------|--------|------|
|
|
| Managed (Railway + Neon) | $10 | $50 | $200+ |
|
|
| VPS (Hetzner/DigitalOcean) | $5 | $20 | $80 |
|
|
| Docker on VPS | $5 | $20 | $80 |
|
|
| AWS/GCP Full Stack | $50 | $200 | $1000+ |
|
|
|
|
---
|
|
|
|
## Migration Path
|
|
|
|
### Phase 1: Development ✅ COMPLETE
|
|
- ✅ PostgreSQL 15 (Docker)
|
|
- ✅ FastAPI + Uvicorn
|
|
- ✅ Local file storage
|
|
|
|
### Phase 2: Production MVP ✅ COMPLETE
|
|
- ✅ PostgreSQL (managed or VPS)
|
|
- ✅ FastAPI + Uvicorn (systemd or Docker)
|
|
- ✅ Redis 7 (cache + task broker)
|
|
- ✅ Celery 5.3 (background jobs)
|
|
- ✅ Celery Beat (scheduled tasks)
|
|
- ✅ Flower (task monitoring)
|
|
- ✅ Cloudflare R2 (cloud file storage)
|
|
- ✅ Sentry (error tracking)
|
|
- ✅ CloudFlare CDN (caching + DDoS protection)
|
|
|
|
### Phase 3: Scale (1,000+ Users)
|
|
- ⏳ Load balancer (Nginx/HAProxy/ALB)
|
|
- ⏳ Horizontal app scaling (2-4 Uvicorn instances)
|
|
- ⏳ PostgreSQL read replica
|
|
- ⏳ Dedicated Celery workers per queue
|
|
|
|
### Phase 4: Enterprise (5,000+ Users)
|
|
- ⏳ Redis Sentinel/cluster
|
|
- ⏳ Database connection pooling (PgBouncer)
|
|
- ⏳ Full monitoring stack (Prometheus/Grafana)
|
|
- ⏳ Log aggregation (Loki/ELK)
|
|
|
|
### Phase 5: High Availability (10,000+ Users)
|
|
- ⏳ Multi-region deployment
|
|
- ⏳ Database failover (streaming replication)
|
|
- ⏳ Container orchestration (Kubernetes)
|
|
- ⏳ Global CDN with edge caching
|
|
|
|
---
|
|
|
|
## Enterprise Upgrade Checklist
|
|
|
|
When you're ready to scale beyond 1,000 concurrent users:
|
|
|
|
### Infrastructure
|
|
|
|
- [ ] **Load Balancer** - Add Nginx/HAProxy in front of API servers
|
|
- Enables horizontal scaling
|
|
- Health checks and automatic failover
|
|
- SSL termination at edge
|
|
|
|
- [ ] **Multiple API Servers** - Run 2-4 Uvicorn instances
|
|
- Scale horizontally instead of vertically
|
|
- Blue-green deployments possible
|
|
|
|
- [ ] **Database Read Replica** - Add PostgreSQL replica
|
|
- Offload read queries from primary
|
|
- Backup without impacting production
|
|
|
|
- [ ] **Connection Pooling** - Add PgBouncer
|
|
- Reduce database connection overhead
|
|
- Handle connection spikes
|
|
|
|
### Monitoring & Observability
|
|
|
|
- [ ] **Prometheus + Grafana** - Metrics dashboards
|
|
- Request latency, error rates, saturation
|
|
- Database connection pool metrics
|
|
- Celery queue lengths
|
|
|
|
- [ ] **Log Aggregation** - Loki or ELK stack
|
|
- Centralized logs from all services
|
|
- Search and alerting
|
|
|
|
- [ ] **Alerting** - PagerDuty/OpsGenie integration
|
|
- On-call rotation
|
|
- Escalation policies
|
|
|
|
### Security
|
|
|
|
- [ ] **WAF Rules** - CloudFlare or AWS WAF
|
|
- SQL injection protection
|
|
- Rate limiting at edge
|
|
- Bot protection
|
|
|
|
- [ ] **Secrets Management** - HashiCorp Vault
|
|
- Rotate credentials automatically
|
|
- Audit access to secrets
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
**You're production-ready now!** Optional improvements:
|
|
|
|
1. **Enable Sentry** - Add `SENTRY_DSN` for error tracking (free tier)
|
|
2. **Enable R2** - Set `STORAGE_BACKEND=r2` for cloud storage (~$5/mo)
|
|
3. **Enable CloudFlare** - Proxy domain for CDN + DDoS protection (free tier)
|
|
4. **Add load balancer** - When you need horizontal scaling
|
|
|
|
See also:
|
|
- [Production Deployment Guide](production.md)
|
|
- [CloudFlare Setup Guide](cloudflare.md)
|
|
- [Docker Deployment](docker.md)
|
|
- [Environment Configuration](environment.md)
|
|
- [Background Tasks Architecture](../architecture/background-tasks.md)
|