4bce16fb73
feat(infra): add alerting, network segmentation, and ops docs (Steps 19-24)
...
CI / ruff (push) Successful in 11s
CI / pytest (push) Successful in 36m6s
CI / validate (push) Successful in 22s
CI / dependency-scanning (push) Successful in 28s
CI / docs (push) Successful in 37s
CI / deploy (push) Successful in 47s
- Prometheus alert rules (host, container, API, Celery, target-down)
- Alertmanager with email routing (critical 1h, warning 4h repeat)
- Docker network segmentation (frontend/backend/monitoring)
- Incident response runbook with 8 copy-paste runbooks
- Environment variables reference (55+ vars documented)
- Hetzner setup docs updated with Steps 19-24
- Launch readiness updated with Feb 2026 infrastructure status
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-15 22:06:54 +01:00
ef7187b508
feat: add automated backups and Prometheus/Grafana monitoring stack (Steps 17-18)
...
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / ruff (push) Successful in 7s
CI / validate (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has started running
Backups: pg_dump scripts with daily/weekly rotation and Cloudflare R2 offsite sync.
Monitoring: Prometheus, Grafana, node-exporter, cAdvisor in docker-compose; /metrics
endpoint activated via prometheus_client.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-14 22:40:08 +01:00