Files
orion/docs/deployment/docker.md
Samir Boulahtit 661547f6cf docs: update deployment docs for CI timeouts, build info, and prod safety
- hetzner-server-setup: runner timeout 3h, shutdown_timeout 300s,
  deploy.sh now writes .build-info and uses explicit -f flag
- gitea: document unit-only CI tests and xdist incompatibility
- docker: add build info section, document volume mount approach

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 14:00:35 +01:00

595 lines
14 KiB
Markdown

# Docker Deployment
This guide covers deploying Orion using Docker and Docker Compose.
**Best for:** Teams who want consistent environments and easy rollbacks.
---
## Development vs Production
| Aspect | Development | Production |
|--------|-------------|------------|
| Compose file | `docker-compose.yml` | `docker-compose.prod.yml` |
| App server | Hot reload enabled | Multiple workers |
| Database | Local volume | Persistent volume with backups |
| SSL | Not needed | Required (via Nginx) |
| Logging | Console | File + centralized |
---
## Development Setup
```bash
# Start all services
make docker-up
# Or manually
docker compose up -d
# View logs
docker compose logs -f
# Stop services
make docker-down
```
### Dev vs Prod Compose Architecture
The project uses a **compose override** pattern to separate dev and prod concerns:
- **`docker-compose.yml`** — Base config. Services like `db` and `redis` do **not** expose ports to the host (they communicate over internal Docker networks only).
- **`docker-compose.override.yml`** — Automatically loaded by `docker compose up` in dev. Exposes `db` (5432) and `redis` (6379) to `localhost` so the app can connect from outside Docker.
- **`scripts/deploy.sh`** — Uses `docker compose -f docker-compose.yml --profile full` which **explicitly skips** the override file.
> **CRITICAL — Production Safety Rule**
>
> The `-f docker-compose.yml` flag in `scripts/deploy.sh` is a deliberate security measure that prevents
> `docker-compose.override.yml` from being loaded in production. This flag **must never be removed**.
>
> When running **any** `docker compose` command directly on the production server, you **must** always
> pass `-f docker-compose.yml` explicitly to avoid accidentally loading the override file:
>
> ```bash
> # CORRECT — on production server
> docker compose -f docker-compose.yml --profile full up -d
> docker compose -f docker-compose.yml --profile full logs -f api
> docker compose -f docker-compose.yml --profile full exec db psql -U orion_user -d orion_db
>
> # WRONG — never run bare "docker compose" on production
> docker compose up -d # ← loads override, exposes ports!
> docker compose --profile full up -d # ← also loads override!
> ```
### Current Services
| Service | Port | Profile | Purpose |
|---------|------|---------|---------|
| db | 5432 | (default) | PostgreSQL database |
| redis | 6379 | (default) | Cache and queue broker |
| api | 8000 | full | FastAPI application |
| celery-worker | — | full | Background task processing |
| celery-beat | — | full | Scheduled task scheduler |
| flower | 5555 | full | Celery monitoring dashboard |
| prometheus | 9090 | full | Metrics storage (15-day retention) |
| grafana | 3001 | full | Dashboards (`https://grafana.wizard.lu`) |
| node-exporter | 9100 | full | Host CPU/RAM/disk metrics |
| cadvisor | 8080 | full | Per-container resource metrics |
Use `docker compose --profile full up -d` to start all services, or `docker compose up -d` for just db + redis (local development).
---
## Production Deployment
### 1. Create Production Compose File
```yaml
# docker-compose.prod.yml
services:
api:
build:
context: .
dockerfile: Dockerfile
restart: always
ports:
- "127.0.0.1:8000:8000"
environment:
DATABASE_URL: postgresql://orion_user:${DB_PASSWORD}@db:5432/orion_db
REDIS_URL: redis://redis:6379/0
CELERY_BROKER_URL: redis://redis:6379/1
env_file:
- .env
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
volumes:
- uploads:/app/uploads
- logs:/app/logs
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
deploy:
resources:
limits:
memory: 1G
celery:
build: .
restart: always
command: celery -A app.core.celery_config worker --loglevel=info -Q default,long_running,scheduled
environment:
DATABASE_URL: postgresql://orion_user:${DB_PASSWORD}@db:5432/orion_db
REDIS_URL: redis://redis:6379/0
env_file:
- .env
depends_on:
- db
- redis
volumes:
- logs:/app/logs
deploy:
resources:
limits:
memory: 768M
celery-beat:
build: .
restart: always
command: celery -A app.celery beat --loglevel=info
environment:
CELERY_BROKER_URL: redis://redis:6379/1
env_file:
- .env
depends_on:
- redis
deploy:
resources:
limits:
memory: 256M
db:
image: postgres:15-alpine
restart: always
environment:
POSTGRES_DB: orion_db
POSTGRES_USER: orion_user
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U orion_user -d orion_db"]
interval: 10s
timeout: 5s
retries: 5
deploy:
resources:
limits:
memory: 512M
redis:
image: redis:7-alpine
restart: always
command: redis-server --maxmemory 100mb --maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
deploy:
resources:
limits:
memory: 300M
nginx:
image: nginx:alpine
restart: always
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/conf.d:/etc/nginx/conf.d:ro
- ./static:/app/static:ro
- uploads:/app/uploads:ro
- /etc/letsencrypt:/etc/letsencrypt:ro
depends_on:
- api
deploy:
resources:
limits:
memory: 128M
volumes:
postgres_data:
redis_data:
uploads:
logs:
```
### 2. Create Dockerfile
```dockerfile
# Dockerfile
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Tailwind CLI
RUN curl -sLO https://github.com/tailwindlabs/tailwindcss/releases/latest/download/tailwindcss-linux-x64 \
&& chmod +x tailwindcss-linux-x64 \
&& mv tailwindcss-linux-x64 /usr/local/bin/tailwindcss
WORKDIR /app
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Build Tailwind CSS
RUN tailwindcss -i ./static/admin/css/tailwind.css -o ./static/admin/css/tailwind.output.css --minify \
&& tailwindcss -i ./static/store/css/tailwind.css -o ./static/store/css/tailwind.output.css --minify \
&& tailwindcss -i ./static/storefront/css/tailwind.css -o ./static/storefront/css/tailwind.output.css --minify \
&& tailwindcss -i ./static/public/css/tailwind.css -o ./static/public/css/tailwind.output.css --minify
# Create non-root user
RUN useradd -m -u 1000 orion && chown -R orion:orion /app
USER orion
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
```
### 3. Nginx Configuration
```bash
mkdir -p nginx/conf.d
```
```nginx
# nginx/conf.d/orion.conf
upstream api {
server api:8000;
}
server {
listen 80;
server_name yourdomain.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name yourdomain.com;
ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
# Static files
location /static {
alias /app/static;
expires 30d;
add_header Cache-Control "public, immutable";
}
location /uploads {
alias /app/uploads;
expires 7d;
}
location / {
proxy_pass http://api;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
```
### 4. Deploy
```bash
# Create .env file with production values
cp .env.example .env
nano .env
# Set database password
export DB_PASSWORD=$(openssl rand -hex 16)
echo "DB_PASSWORD=$DB_PASSWORD" >> .env
# Build and start
docker compose -f docker-compose.prod.yml build
docker compose -f docker-compose.prod.yml up -d
# Run migrations
docker compose -f docker-compose.prod.yml exec api alembic upgrade head
# Initialize data
docker compose -f docker-compose.prod.yml exec api python scripts/seed/init_production.py
```
---
## Build Info
The deploy script writes a `.build-info` JSON file (commit SHA + deploy timestamp) before rebuilding containers. This file is mounted as a read-only volume into the API container:
```yaml
# In docker-compose.yml
volumes:
- ./.build-info:/app/.build-info:ro
```
The app reads it via `app/core/build_info.py` and exposes it in:
- **`/health` endpoint** — `commit` and `deployed_at` fields
- **Admin sidebar** — version, commit, and deploy timestamp
In local development (where `.build-info` doesn't exist), the app falls back to `git rev-parse` for the commit SHA.
---
## Daily Operations
### View Logs
```bash
# All services
docker compose -f docker-compose.prod.yml logs -f
# Specific service
docker compose -f docker-compose.prod.yml logs -f api
docker compose -f docker-compose.prod.yml logs -f celery
# Last 100 lines
docker compose -f docker-compose.prod.yml logs --tail 100 api
```
### Access Container Shell
```bash
# API container
docker compose -f docker-compose.prod.yml exec api bash
# Database
docker compose -f docker-compose.prod.yml exec db psql -U orion_user -d orion_db
# Redis
docker compose -f docker-compose.prod.yml exec redis redis-cli
```
### Restart Services
```bash
# Single service
docker compose -f docker-compose.prod.yml restart api
# All services
docker compose -f docker-compose.prod.yml restart
```
### Deploy Updates
```bash
# Pull latest code
git pull origin main
# Rebuild and restart
docker compose -f docker-compose.prod.yml build api celery
docker compose -f docker-compose.prod.yml up -d api celery
# Run migrations if needed
docker compose -f docker-compose.prod.yml exec api alembic upgrade head
```
### Rollback
```bash
# View image history
docker images orion-api
# Tag current as backup
docker tag orion-api:latest orion-api:backup
# Rollback to previous
docker compose -f docker-compose.prod.yml down api
docker tag orion-api:previous orion-api:latest
docker compose -f docker-compose.prod.yml up -d api
```
---
## Backups
Automated backup scripts handle daily pg_dump with rotation and optional Cloudflare R2 offsite sync:
```bash
# Run backup (Orion + Gitea databases)
bash scripts/backup.sh
# Run backup with R2 upload
bash scripts/backup.sh --upload
# Restore from backup
bash scripts/restore.sh orion ~/backups/orion/daily/orion_20260214_030000.sql.gz
bash scripts/restore.sh gitea ~/backups/gitea/daily/gitea_20260214_030000.sql.gz
```
Backups are stored in `~/backups/{orion,gitea}/{daily,weekly}/` with 7-day daily and 4-week weekly retention. A systemd timer runs the backup daily at 03:00.
See [Hetzner Server Setup — Step 17](hetzner-server-setup.md#step-17-backups) for full setup instructions.
### Manual Database Backup
```bash
# One-off backup
docker compose exec db pg_dump -U orion_user orion_db | gzip > backup_$(date +%Y%m%d).sql.gz
# Restore
gunzip -c backup_20240115.sql.gz | docker compose exec -T db psql -U orion_user -d orion_db
```
---
## Monitoring
The monitoring stack (Prometheus, Grafana, node-exporter, cAdvisor) runs under `profiles: [full]`. See [Hetzner Server Setup — Step 18](hetzner-server-setup.md#step-18-monitoring-observability) for full setup and [Observability Framework](../architecture/observability.md) for the application-level metrics architecture.
### Resource Usage
```bash
docker stats --no-stream
```
### Health Checks
```bash
# Check service health
docker compose --profile full ps
# API health
curl -s http://localhost:8000/health | jq
# Prometheus metrics
curl -s http://localhost:8000/metrics | head -5
# Prometheus targets
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep health
```
---
## Troubleshooting
### Container Won't Start
```bash
# Check logs
docker compose -f docker-compose.prod.yml logs api
# Check container status
docker compose -f docker-compose.prod.yml ps -a
# Inspect container
docker inspect <container_id>
```
### Database Connection Issues
```bash
# Test from API container
docker compose -f docker-compose.prod.yml exec api python -c "
from app.core.database import engine
with engine.connect() as conn:
print('Connected!')
"
```
### Out of Disk Space
```bash
# Check disk usage
docker system df
# Clean up
docker system prune -a --volumes
```
### Memory Issues
```bash
# Check memory usage
docker stats --no-stream
# Increase limits in docker-compose.prod.yml
deploy:
resources:
limits:
memory: 2G
```
---
## Security
### Non-Root User
All containers run as non-root users. The Dockerfile creates a `orion` user.
### Secret Management
```bash
# Use Docker secrets (Swarm mode)
echo "your-password" | docker secret create db_password -
# Or use environment files
# Never commit .env to git
```
### Network Isolation
```yaml
# Add to docker-compose.prod.yml
networks:
frontend:
backend:
services:
nginx:
networks:
- frontend
api:
networks:
- frontend
- backend
db:
networks:
- backend
redis:
networks:
- backend
```
---
## Scaling
### Horizontal Scaling
```bash
# Scale API containers
docker compose -f docker-compose.prod.yml up -d --scale api=3
# Update nginx upstream
upstream api {
server api_1:8000;
server api_2:8000;
server api_3:8000;
}
```
### Moving to Kubernetes
When you outgrow Docker Compose, see our Kubernetes migration guide (coming soon).