Files
orion/monitoring/alertmanager
Samir Boulahtit e44f5c0458
All checks were successful
CI / ruff (push) Successful in 17s
CI / pytest (push) Successful in 2h48m4s
CI / validate (push) Successful in 36s
CI / docs (push) Successful in 56s
CI / deploy (push) Successful in 1m12s
CI / dependency-scanning (push) Successful in 37s
chore(alertmanager): untrack alertmanager.yml + ship .example template (post-SMTP migration)
Yesterday's deploy debug surfaced a SendGrid API key pasted into the
tracked monitoring/alertmanager/alertmanager.yml on prod, with the
in-repo file literally captioning the field "TODO: Paste your SG.xxx
API key here" — actively encouraging the anti-pattern. Forensic
follow-up (bash history lines 290-357) confirmed it was a user-driven
nano edit that was never committed, just left as a long-running local
mod. Three problems collapsed into this finding:

  1. Real SMTP credential lived in a tracked git file on prod.
  2. The SendGrid → mail1.myservices.hosting SMTP migration never
     touched alertmanager — it still pointed at smtp.sendgrid.net.
  3. The alertmanager container has been Up 13 days with the
     pre-paste empty smtp_auth_password loaded from disk, so prod's
     email alerting has been silently failing.

Resolution shipped here:

- `git rm --cached monitoring/alertmanager/alertmanager.yml` so the
  prod-edited file on each host stops being a tracked file and the
  credential can't accidentally reach git again.
- Add `monitoring/alertmanager/alertmanager.yml` to .gitignore.
- Ship `monitoring/alertmanager/alertmanager.yml.example` as the
  template — pre-filled with the post-migration non-secret routing
  (`mail1.myservices.hosting:587`, `support@wizard.lu` auth,
  `alerts@wizard.lu` From for inbox clarity), only `smtp_auth_password`
  left as `CHANGEME`. Includes inline guidance for the From-vs-auth
  rule that some SMTP relays enforce.

Per-host steps (Hetzner): backup the prod-edited file → revert local
change → pull → copy the template over the old file → fill in the
password → SIGHUP alertmanager. Doc reference will follow in the next
commit (Hetzner deploy doc still needs an "alertmanager.yml lives
outside git" footnote).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 11:32:10 +02:00
..