3 Commits

Author SHA1 Message Date
e44f5c0458 chore(alertmanager): untrack alertmanager.yml + ship .example template (post-SMTP migration)
All checks were successful
CI / ruff (push) Successful in 17s
CI / pytest (push) Successful in 2h48m4s
CI / validate (push) Successful in 36s
CI / docs (push) Successful in 56s
CI / deploy (push) Successful in 1m12s
CI / dependency-scanning (push) Successful in 37s
Yesterday's deploy debug surfaced a SendGrid API key pasted into the
tracked monitoring/alertmanager/alertmanager.yml on prod, with the
in-repo file literally captioning the field "TODO: Paste your SG.xxx
API key here" — actively encouraging the anti-pattern. Forensic
follow-up (bash history lines 290-357) confirmed it was a user-driven
nano edit that was never committed, just left as a long-running local
mod. Three problems collapsed into this finding:

  1. Real SMTP credential lived in a tracked git file on prod.
  2. The SendGrid → mail1.myservices.hosting SMTP migration never
     touched alertmanager — it still pointed at smtp.sendgrid.net.
  3. The alertmanager container has been Up 13 days with the
     pre-paste empty smtp_auth_password loaded from disk, so prod's
     email alerting has been silently failing.

Resolution shipped here:

- `git rm --cached monitoring/alertmanager/alertmanager.yml` so the
  prod-edited file on each host stops being a tracked file and the
  credential can't accidentally reach git again.
- Add `monitoring/alertmanager/alertmanager.yml` to .gitignore.
- Ship `monitoring/alertmanager/alertmanager.yml.example` as the
  template — pre-filled with the post-migration non-secret routing
  (`mail1.myservices.hosting:587`, `support@wizard.lu` auth,
  `alerts@wizard.lu` From for inbox clarity), only `smtp_auth_password`
  left as `CHANGEME`. Includes inline guidance for the From-vs-auth
  rule that some SMTP relays enforce.

Per-host steps (Hetzner): backup the prod-edited file → revert local
change → pull → copy the template over the old file → fill in the
password → SIGHUP alertmanager. Doc reference will follow in the next
commit (Hetzner deploy doc still needs an "alertmanager.yml lives
outside git" footnote).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 11:32:10 +02:00
f67510b706 docs: switch email provider recommendation from Mailgun to SendGrid
SendGrid handles both transactional emails and marketing campaigns
under one account. Updated alertmanager SMTP placeholders, hetzner
setup guide (Step 19.5), and environment reference to recommend
SendGrid as the primary email provider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 22:21:49 +01:00
4bce16fb73 feat(infra): add alerting, network segmentation, and ops docs (Steps 19-24)
All checks were successful
CI / ruff (push) Successful in 11s
CI / pytest (push) Successful in 36m6s
CI / validate (push) Successful in 22s
CI / dependency-scanning (push) Successful in 28s
CI / docs (push) Successful in 37s
CI / deploy (push) Successful in 47s
- Prometheus alert rules (host, container, API, Celery, target-down)
- Alertmanager with email routing (critical 1h, warning 4h repeat)
- Docker network segmentation (frontend/backend/monitoring)
- Incident response runbook with 8 copy-paste runbooks
- Environment variables reference (55+ vars documented)
- Hetzner setup docs updated with Steps 19-24
- Launch readiness updated with Feb 2026 infrastructure status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 22:06:54 +01:00