feat(monitoring): add Redis exporter + Sentry docs to deployment guide
Some checks failed
CI / ruff (push) Successful in 10s
CI / pytest (push) Failing after 47m30s
CI / validate (push) Successful in 24s
CI / dependency-scanning (push) Successful in 29s
CI / docs (push) Has been skipped
CI / deploy (push) Has been skipped

- Add redis-exporter container to docker-compose (oliver006/redis_exporter, 32MB)
- Add Redis scrape target to Prometheus config
- Add 4 Redis alert rules: RedisDown, HighMemory, HighConnections, RejectedConnections
- Document Step 19b (Sentry Error Tracking) in Hetzner deployment guide
- Document Step 19c (Redis Monitoring) in Hetzner deployment guide
- Update resource budget and port reference tables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-27 23:30:18 +01:00
parent ce822af883
commit 35d1559162
54 changed files with 664 additions and 343 deletions

View File

@@ -45,6 +45,13 @@ scrape_configs:
labels:
service: "prometheus"
# Redis Exporter — Redis memory, connections, command stats
- job_name: "redis"
static_configs:
- targets: ["redis-exporter:9121"]
labels:
service: "redis"
# Alertmanager
- job_name: "alertmanager"
static_configs:

View File

@@ -125,6 +125,47 @@ groups:
summary: "Celery queue backlog exceeding 100 tasks"
description: "Queue {{ $labels.queue }} has {{ $value | printf \"%.0f\" }} pending tasks for 10 minutes."
# =========================================================================
# REDIS ALERTS (redis-exporter)
# =========================================================================
- name: redis
rules:
- alert: RedisDown
expr: redis_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Redis is down"
description: "Redis exporter cannot connect to Redis for 1 minute. Background tasks (emails, Celery) are not processing."
- alert: RedisHighMemoryUsage
expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory usage above 80%"
description: "Redis is using {{ $value | printf \"%.1f\" }}% of its max memory. Consider increasing mem_limit or investigating queue backlog."
- alert: RedisHighConnectionCount
expr: redis_connected_clients > 50
for: 5m
labels:
severity: warning
annotations:
summary: "Redis has {{ $value }} connected clients"
description: "Unusually high number of Redis connections. May indicate connection leaks."
- alert: RedisRejectedConnections
expr: increase(redis_rejected_connections_total[5m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Redis is rejecting connections"
description: "Redis rejected {{ $value | printf \"%.0f\" }} connections in the last 5 minutes. Clients cannot connect."
# =========================================================================
# PROMETHEUS SELF-MONITORING
# =========================================================================