feat(infra): add alerting, network segmentation, and ops docs (Steps 19-24)
All checks were successful
All checks were successful
- Prometheus alert rules (host, container, API, Celery, target-down) - Alertmanager with email routing (critical 1h, warning 4h repeat) - Docker network segmentation (frontend/backend/monitoring) - Incident response runbook with 8 copy-paste runbooks - Environment variables reference (55+ vars documented) - Hetzner setup docs updated with Steps 19-24 - Launch readiness updated with Feb 2026 infrastructure status Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -5,6 +5,16 @@ global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
# ─── Alerting ────────────────────────────────────────────────────────────
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets: ["alertmanager:9093"]
|
||||
|
||||
rule_files:
|
||||
- /etc/prometheus/alert.rules.yml
|
||||
|
||||
# ─── Scrape Configs ─────────────────────────────────────────────────────
|
||||
scrape_configs:
|
||||
# Orion API — /metrics endpoint (prometheus_client)
|
||||
- job_name: "orion-api"
|
||||
@@ -34,3 +44,10 @@ scrape_configs:
|
||||
- targets: ["localhost:9090"]
|
||||
labels:
|
||||
service: "prometheus"
|
||||
|
||||
# Alertmanager
|
||||
- job_name: "alertmanager"
|
||||
static_configs:
|
||||
- targets: ["alertmanager:9093"]
|
||||
labels:
|
||||
service: "alertmanager"
|
||||
|
||||
Reference in New Issue
Block a user