docs: update observability and deployment docs to match production stack
Some checks failed
CI / ruff (push) Successful in 12s
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / pytest (push) Has been cancelled

Update observability.md with production container table, actual init code,
and correct env var names. Update docker.md with full 10-service table and
backup/monitoring cross-references. Add explicit AAAA records to DNS tables.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-15 16:44:05 +01:00
parent 10aa75aa69
commit 677e5211f9
4 changed files with 115 additions and 47 deletions

View File

@@ -36,11 +36,20 @@ make docker-down
### Current Services
| Service | Port | Purpose |
|---------|------|---------|
| db | 5432 | PostgreSQL database |
| redis | 6379 | Cache and queue broker |
| api | 8000 | FastAPI application |
| Service | Port | Profile | Purpose |
|---------|------|---------|---------|
| db | 5432 | (default) | PostgreSQL database |
| redis | 6379 | (default) | Cache and queue broker |
| api | 8000 | full | FastAPI application |
| celery-worker | — | full | Background task processing |
| celery-beat | — | full | Scheduled task scheduler |
| flower | 5555 | full | Celery monitoring dashboard |
| prometheus | 9090 | full | Metrics storage (15-day retention) |
| grafana | 3001 | full | Dashboards (`https://grafana.wizard.lu`) |
| node-exporter | 9100 | full | Host CPU/RAM/disk metrics |
| cadvisor | 8080 | full | Per-container resource metrics |
Use `docker compose --profile full up -d` to start all services, or `docker compose up -d` for just db + redis (local development).
---
@@ -368,44 +377,60 @@ docker compose -f docker-compose.prod.yml up -d api
## Backups
### Database Backup
Automated backup scripts handle daily pg_dump with rotation and optional Cloudflare R2 offsite sync:
```bash
# Create backup
docker compose -f docker-compose.prod.yml exec db pg_dump -U orion_user orion_db | gzip > backup_$(date +%Y%m%d).sql.gz
# Run backup (Orion + Gitea databases)
bash scripts/backup.sh
# Restore backup
gunzip -c backup_20240115.sql.gz | docker compose -f docker-compose.prod.yml exec -T db psql -U orion_user -d orion_db
# Run backup with R2 upload
bash scripts/backup.sh --upload
# Restore from backup
bash scripts/restore.sh orion ~/backups/orion/daily/orion_20260214_030000.sql.gz
bash scripts/restore.sh gitea ~/backups/gitea/daily/gitea_20260214_030000.sql.gz
```
### Volume Backup
Backups are stored in `~/backups/{orion,gitea}/{daily,weekly}/` with 7-day daily and 4-week weekly retention. A systemd timer runs the backup daily at 03:00.
See [Hetzner Server Setup — Step 17](hetzner-server-setup.md#step-17-backups) for full setup instructions.
### Manual Database Backup
```bash
# Backup all volumes
docker run --rm \
-v orion_postgres_data:/data \
-v $(pwd)/backups:/backup \
alpine tar czf /backup/postgres_$(date +%Y%m%d).tar.gz /data
# One-off backup
docker compose exec db pg_dump -U orion_user orion_db | gzip > backup_$(date +%Y%m%d).sql.gz
# Restore
gunzip -c backup_20240115.sql.gz | docker compose exec -T db psql -U orion_user -d orion_db
```
---
## Monitoring
The monitoring stack (Prometheus, Grafana, node-exporter, cAdvisor) runs under `profiles: [full]`. See [Hetzner Server Setup — Step 18](hetzner-server-setup.md#step-18-monitoring-observability) for full setup and [Observability Framework](../architecture/observability.md) for the application-level metrics architecture.
### Resource Usage
```bash
docker stats
docker stats --no-stream
```
### Health Checks
```bash
# Check service health
docker compose -f docker-compose.prod.yml ps
docker compose --profile full ps
# Test API health
# API health
curl -s http://localhost:8000/health | jq
# Prometheus metrics
curl -s http://localhost:8000/metrics | head -5
# Prometheus targets
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep health
```
---

View File

@@ -444,6 +444,8 @@ Before setting up Caddy, point your domain's DNS to the server.
|---|---|---|---|
| A | `@` | `91.99.65.229` | 300 |
| A | `www` | `91.99.65.229` | 300 |
| AAAA | `@` | `2a01:4f8:1c1a:b39c::1` | 300 |
| AAAA | `www` | `2a01:4f8:1c1a:b39c::1` | 300 |
### rewardflow.lu (Loyalty+ Platform) — TODO
@@ -451,6 +453,8 @@ Before setting up Caddy, point your domain's DNS to the server.
|---|---|---|---|
| A | `@` | `91.99.65.229` | 300 |
| A | `www` | `91.99.65.229` | 300 |
| AAAA | `@` | `2a01:4f8:1c1a:b39c::1` | 300 |
| AAAA | `www` | `2a01:4f8:1c1a:b39c::1` | 300 |
### IPv6 (AAAA) Records — TODO

View File

@@ -328,7 +328,10 @@ sudo systemctl restart orion orion-celery
## Backups
### Database Backup Script
!!! tip "Docker deployment"
For Docker-based deployments, use the automated backup scripts (`scripts/backup.sh` and `scripts/restore.sh`) with systemd timer. See [Hetzner Server Setup — Step 17](hetzner-server-setup.md#step-17-backups).
### Database Backup Script (VPS without Docker)
```bash
sudo nano /home/orion/backup.sh
@@ -373,6 +376,9 @@ sudo -u orion crontab -e
## Monitoring
!!! tip "Docker deployment"
For Docker-based deployments, a full Prometheus + Grafana + node-exporter + cAdvisor stack is included in `docker-compose.yml`. See [Hetzner Server Setup — Step 18](hetzner-server-setup.md#step-18-monitoring-observability) and [Observability Framework](../architecture/observability.md).
### Basic Health Check
```bash