docs: update observability and deployment docs to match production stack
Some checks failed
Some checks failed
Update observability.md with production container table, actual init code, and correct env var names. Update docker.md with full 10-service table and backup/monitoring cross-references. Add explicit AAAA records to DNS tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -36,11 +36,20 @@ make docker-down
|
||||
|
||||
### Current Services
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| db | 5432 | PostgreSQL database |
|
||||
| redis | 6379 | Cache and queue broker |
|
||||
| api | 8000 | FastAPI application |
|
||||
| Service | Port | Profile | Purpose |
|
||||
|---------|------|---------|---------|
|
||||
| db | 5432 | (default) | PostgreSQL database |
|
||||
| redis | 6379 | (default) | Cache and queue broker |
|
||||
| api | 8000 | full | FastAPI application |
|
||||
| celery-worker | — | full | Background task processing |
|
||||
| celery-beat | — | full | Scheduled task scheduler |
|
||||
| flower | 5555 | full | Celery monitoring dashboard |
|
||||
| prometheus | 9090 | full | Metrics storage (15-day retention) |
|
||||
| grafana | 3001 | full | Dashboards (`https://grafana.wizard.lu`) |
|
||||
| node-exporter | 9100 | full | Host CPU/RAM/disk metrics |
|
||||
| cadvisor | 8080 | full | Per-container resource metrics |
|
||||
|
||||
Use `docker compose --profile full up -d` to start all services, or `docker compose up -d` for just db + redis (local development).
|
||||
|
||||
---
|
||||
|
||||
@@ -368,44 +377,60 @@ docker compose -f docker-compose.prod.yml up -d api
|
||||
|
||||
## Backups
|
||||
|
||||
### Database Backup
|
||||
Automated backup scripts handle daily pg_dump with rotation and optional Cloudflare R2 offsite sync:
|
||||
|
||||
```bash
|
||||
# Create backup
|
||||
docker compose -f docker-compose.prod.yml exec db pg_dump -U orion_user orion_db | gzip > backup_$(date +%Y%m%d).sql.gz
|
||||
# Run backup (Orion + Gitea databases)
|
||||
bash scripts/backup.sh
|
||||
|
||||
# Restore backup
|
||||
gunzip -c backup_20240115.sql.gz | docker compose -f docker-compose.prod.yml exec -T db psql -U orion_user -d orion_db
|
||||
# Run backup with R2 upload
|
||||
bash scripts/backup.sh --upload
|
||||
|
||||
# Restore from backup
|
||||
bash scripts/restore.sh orion ~/backups/orion/daily/orion_20260214_030000.sql.gz
|
||||
bash scripts/restore.sh gitea ~/backups/gitea/daily/gitea_20260214_030000.sql.gz
|
||||
```
|
||||
|
||||
### Volume Backup
|
||||
Backups are stored in `~/backups/{orion,gitea}/{daily,weekly}/` with 7-day daily and 4-week weekly retention. A systemd timer runs the backup daily at 03:00.
|
||||
|
||||
See [Hetzner Server Setup — Step 17](hetzner-server-setup.md#step-17-backups) for full setup instructions.
|
||||
|
||||
### Manual Database Backup
|
||||
|
||||
```bash
|
||||
# Backup all volumes
|
||||
docker run --rm \
|
||||
-v orion_postgres_data:/data \
|
||||
-v $(pwd)/backups:/backup \
|
||||
alpine tar czf /backup/postgres_$(date +%Y%m%d).tar.gz /data
|
||||
# One-off backup
|
||||
docker compose exec db pg_dump -U orion_user orion_db | gzip > backup_$(date +%Y%m%d).sql.gz
|
||||
|
||||
# Restore
|
||||
gunzip -c backup_20240115.sql.gz | docker compose exec -T db psql -U orion_user -d orion_db
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
The monitoring stack (Prometheus, Grafana, node-exporter, cAdvisor) runs under `profiles: [full]`. See [Hetzner Server Setup — Step 18](hetzner-server-setup.md#step-18-monitoring-observability) for full setup and [Observability Framework](../architecture/observability.md) for the application-level metrics architecture.
|
||||
|
||||
### Resource Usage
|
||||
|
||||
```bash
|
||||
docker stats
|
||||
docker stats --no-stream
|
||||
```
|
||||
|
||||
### Health Checks
|
||||
|
||||
```bash
|
||||
# Check service health
|
||||
docker compose -f docker-compose.prod.yml ps
|
||||
docker compose --profile full ps
|
||||
|
||||
# Test API health
|
||||
# API health
|
||||
curl -s http://localhost:8000/health | jq
|
||||
|
||||
# Prometheus metrics
|
||||
curl -s http://localhost:8000/metrics | head -5
|
||||
|
||||
# Prometheus targets
|
||||
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user