chore(alertmanager): untrack alertmanager.yml + ship .example template (post-SMTP migration)
All checks were successful
All checks were successful
Yesterday's deploy debug surfaced a SendGrid API key pasted into the
tracked monitoring/alertmanager/alertmanager.yml on prod, with the
in-repo file literally captioning the field "TODO: Paste your SG.xxx
API key here" — actively encouraging the anti-pattern. Forensic
follow-up (bash history lines 290-357) confirmed it was a user-driven
nano edit that was never committed, just left as a long-running local
mod. Three problems collapsed into this finding:
1. Real SMTP credential lived in a tracked git file on prod.
2. The SendGrid → mail1.myservices.hosting SMTP migration never
touched alertmanager — it still pointed at smtp.sendgrid.net.
3. The alertmanager container has been Up 13 days with the
pre-paste empty smtp_auth_password loaded from disk, so prod's
email alerting has been silently failing.
Resolution shipped here:
- `git rm --cached monitoring/alertmanager/alertmanager.yml` so the
prod-edited file on each host stops being a tracked file and the
credential can't accidentally reach git again.
- Add `monitoring/alertmanager/alertmanager.yml` to .gitignore.
- Ship `monitoring/alertmanager/alertmanager.yml.example` as the
template — pre-filled with the post-migration non-secret routing
(`mail1.myservices.hosting:587`, `support@wizard.lu` auth,
`alerts@wizard.lu` From for inbox clarity), only `smtp_auth_password`
left as `CHANGEME`. Includes inline guidance for the From-vs-auth
rule that some SMTP relays enforce.
Per-host steps (Hetzner): backup the prod-edited file → revert local
change → pull → copy the template over the old file → fill in the
password → SIGHUP alertmanager. Doc reference will follow in the next
commit (Hetzner deploy doc still needs an "alertmanager.yml lives
outside git" footnote).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
4
.gitignore
vendored
4
.gitignore
vendored
@@ -192,3 +192,7 @@ exports/
|
|||||||
|
|
||||||
# Security audit (needs revamping)
|
# Security audit (needs revamping)
|
||||||
scripts/security-audit/
|
scripts/security-audit/
|
||||||
|
|
||||||
|
# Alertmanager config is per-host (contains SMTP credentials) — ship
|
||||||
|
# alertmanager.yml.example as the template, real file lives outside git.
|
||||||
|
monitoring/alertmanager/alertmanager.yml
|
||||||
|
|||||||
@@ -1,58 +0,0 @@
|
|||||||
# Alertmanager Configuration for Orion Platform
|
|
||||||
# Docs: https://prometheus.io/docs/alerting/latest/configuration/
|
|
||||||
|
|
||||||
global:
|
|
||||||
resolve_timeout: 5m
|
|
||||||
|
|
||||||
# ─── SMTP Configuration (SendGrid) ──────────────────────────────────
|
|
||||||
# Sign up at sendgrid.com, create an API key, authenticate wizard.lu domain
|
|
||||||
# Username is literally the string "apikey", password is your SG.xxx API key
|
|
||||||
smtp_smarthost: 'smtp.sendgrid.net:587' # SendGrid SMTP relay
|
|
||||||
smtp_from: 'alerts@wizard.lu' # Must match authenticated domain
|
|
||||||
smtp_auth_username: 'apikey' # Always "apikey" for SendGrid
|
|
||||||
smtp_auth_password: '' # TODO: Paste your SG.xxx API key here
|
|
||||||
smtp_require_tls: true
|
|
||||||
|
|
||||||
route:
|
|
||||||
# Group alerts by name and severity
|
|
||||||
group_by: ['alertname', 'severity']
|
|
||||||
group_wait: 30s
|
|
||||||
group_interval: 5m
|
|
||||||
repeat_interval: 4h
|
|
||||||
receiver: 'email-warnings'
|
|
||||||
|
|
||||||
routes:
|
|
||||||
# Critical alerts: repeat every 1 hour
|
|
||||||
- match:
|
|
||||||
severity: critical
|
|
||||||
receiver: 'email-critical'
|
|
||||||
repeat_interval: 1h
|
|
||||||
|
|
||||||
# Warning alerts: repeat every 4 hours
|
|
||||||
- match:
|
|
||||||
severity: warning
|
|
||||||
receiver: 'email-warnings'
|
|
||||||
repeat_interval: 4h
|
|
||||||
|
|
||||||
receivers:
|
|
||||||
- name: 'email-critical'
|
|
||||||
email_configs:
|
|
||||||
- to: 'admin@wizard.lu' # TODO: Replace with your alert recipient
|
|
||||||
send_resolved: true
|
|
||||||
headers:
|
|
||||||
Subject: '[CRITICAL] Orion: {{ .GroupLabels.alertname }}'
|
|
||||||
|
|
||||||
- name: 'email-warnings'
|
|
||||||
email_configs:
|
|
||||||
- to: 'admin@wizard.lu' # TODO: Replace with your alert recipient
|
|
||||||
send_resolved: true
|
|
||||||
headers:
|
|
||||||
Subject: '[WARNING] Orion: {{ .GroupLabels.alertname }}'
|
|
||||||
|
|
||||||
# Inhibition rules — suppress warnings when critical is already firing
|
|
||||||
inhibit_rules:
|
|
||||||
- source_match:
|
|
||||||
severity: 'critical'
|
|
||||||
target_match:
|
|
||||||
severity: 'warning'
|
|
||||||
equal: ['alertname', 'instance']
|
|
||||||
71
monitoring/alertmanager/alertmanager.yml.example
Normal file
71
monitoring/alertmanager/alertmanager.yml.example
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
# Alertmanager Configuration for Orion Platform — TEMPLATE
|
||||||
|
# Docs: https://prometheus.io/docs/alerting/latest/configuration/
|
||||||
|
#
|
||||||
|
# This is the IN-REPO TEMPLATE. The real file on each host lives at
|
||||||
|
# monitoring/alertmanager/alertmanager.yml (gitignored, never committed).
|
||||||
|
# Copy this file to that path and fill in the CHANGEME values per
|
||||||
|
# docs/deployment/hetzner-server-setup.md.
|
||||||
|
|
||||||
|
global:
|
||||||
|
resolve_timeout: 5m
|
||||||
|
|
||||||
|
# ─── SMTP Configuration (mail1.myservices.hosting relay) ────────────
|
||||||
|
# Migrated from SendGrid to mail1.myservices.hosting on 2026-??-?? —
|
||||||
|
# same SMTP backend the app uses (see /admin/settings).
|
||||||
|
#
|
||||||
|
# smtp_from is set to alerts@wizard.lu for inbox routing clarity. Most
|
||||||
|
# SMTP relays allow the From: header to differ from the authenticated
|
||||||
|
# user, BUT some require them to match. If you see "550 sender not
|
||||||
|
# authorized" in the alertmanager logs after a reload, either:
|
||||||
|
# 1. Configure alerts@wizard.lu as a send-as alias on the support@
|
||||||
|
# mailbox in your mail hosting control panel, or
|
||||||
|
# 2. Change smtp_from to 'support@wizard.lu' (less clear in inbox).
|
||||||
|
smtp_smarthost: 'mail1.myservices.hosting:587'
|
||||||
|
smtp_from: 'alerts@wizard.lu'
|
||||||
|
smtp_auth_username: 'support@wizard.lu'
|
||||||
|
smtp_auth_password: 'CHANGEME' # The /admin/settings SMTP password. NEVER commit a real value.
|
||||||
|
smtp_require_tls: true
|
||||||
|
|
||||||
|
route:
|
||||||
|
# Group alerts by name and severity
|
||||||
|
group_by: ['alertname', 'severity']
|
||||||
|
group_wait: 30s
|
||||||
|
group_interval: 5m
|
||||||
|
repeat_interval: 4h
|
||||||
|
receiver: 'email-warnings'
|
||||||
|
|
||||||
|
routes:
|
||||||
|
# Critical alerts: repeat every 1 hour
|
||||||
|
- match:
|
||||||
|
severity: critical
|
||||||
|
receiver: 'email-critical'
|
||||||
|
repeat_interval: 1h
|
||||||
|
|
||||||
|
# Warning alerts: repeat every 4 hours
|
||||||
|
- match:
|
||||||
|
severity: warning
|
||||||
|
receiver: 'email-warnings'
|
||||||
|
repeat_interval: 4h
|
||||||
|
|
||||||
|
receivers:
|
||||||
|
- name: 'email-critical'
|
||||||
|
email_configs:
|
||||||
|
- to: 'admin@wizard.lu' # Recipient mailbox for critical alerts
|
||||||
|
send_resolved: true
|
||||||
|
headers:
|
||||||
|
Subject: '[CRITICAL] Orion: {{ .GroupLabels.alertname }}'
|
||||||
|
|
||||||
|
- name: 'email-warnings'
|
||||||
|
email_configs:
|
||||||
|
- to: 'admin@wizard.lu' # Recipient mailbox for warning alerts
|
||||||
|
send_resolved: true
|
||||||
|
headers:
|
||||||
|
Subject: '[WARNING] Orion: {{ .GroupLabels.alertname }}'
|
||||||
|
|
||||||
|
# Inhibition rules — suppress warnings when critical is already firing
|
||||||
|
inhibit_rules:
|
||||||
|
- source_match:
|
||||||
|
severity: 'critical'
|
||||||
|
target_match:
|
||||||
|
severity: 'warning'
|
||||||
|
equal: ['alertname', 'instance']
|
||||||
Reference in New Issue
Block a user