Commit Graph

4 Commits

Author SHA1 Message Date
35d1559162 feat(monitoring): add Redis exporter + Sentry docs to deployment guide
Some checks failed
CI / ruff (push) Successful in 10s
CI / pytest (push) Failing after 47m30s
CI / validate (push) Successful in 24s
CI / dependency-scanning (push) Successful in 29s
CI / docs (push) Has been skipped
CI / deploy (push) Has been skipped
- Add redis-exporter container to docker-compose (oliver006/redis_exporter, 32MB)
- Add Redis scrape target to Prometheus config
- Add 4 Redis alert rules: RedisDown, HighMemory, HighConnections, RejectedConnections
- Document Step 19b (Sentry Error Tracking) in Hetzner deployment guide
- Document Step 19c (Redis Monitoring) in Hetzner deployment guide
- Update resource budget and port reference tables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 23:30:18 +01:00
f67510b706 docs: switch email provider recommendation from Mailgun to SendGrid
SendGrid handles both transactional emails and marketing campaigns
under one account. Updated alertmanager SMTP placeholders, hetzner
setup guide (Step 19.5), and environment reference to recommend
SendGrid as the primary email provider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 22:21:49 +01:00
4bce16fb73 feat(infra): add alerting, network segmentation, and ops docs (Steps 19-24)
All checks were successful
CI / ruff (push) Successful in 11s
CI / pytest (push) Successful in 36m6s
CI / validate (push) Successful in 22s
CI / dependency-scanning (push) Successful in 28s
CI / docs (push) Successful in 37s
CI / deploy (push) Successful in 47s
- Prometheus alert rules (host, container, API, Celery, target-down)
- Alertmanager with email routing (critical 1h, warning 4h repeat)
- Docker network segmentation (frontend/backend/monitoring)
- Incident response runbook with 8 copy-paste runbooks
- Environment variables reference (55+ vars documented)
- Hetzner setup docs updated with Steps 19-24
- Launch readiness updated with Feb 2026 infrastructure status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 22:06:54 +01:00
ef7187b508 feat: add automated backups and Prometheus/Grafana monitoring stack (Steps 17-18)
Some checks failed
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / ruff (push) Successful in 7s
CI / validate (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has started running
Backups: pg_dump scripts with daily/weekly rotation and Cloudflare R2 offsite sync.
Monitoring: Prometheus, Grafana, node-exporter, cAdvisor in docker-compose; /metrics
endpoint activated via prometheus_client.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 22:40:08 +01:00