docs: update observability and deployment docs to match production stack
Some checks failed
CI / ruff (push) Successful in 12s
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / pytest (push) Has been cancelled

Update observability.md with production container table, actual init code,
and correct env var names. Update docker.md with full 10-service table and
backup/monitoring cross-references. Add explicit AAAA records to DNS tables.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-15 16:44:05 +01:00
parent 10aa75aa69
commit 677e5211f9
4 changed files with 115 additions and 47 deletions

View File

@@ -2,6 +2,38 @@
The Orion platform includes a comprehensive observability framework for monitoring application health, collecting metrics, and tracking errors. This is part of the Framework Layer - infrastructure that modules depend on.
## Production Stack
The full monitoring stack runs as Docker containers alongside the application:
| Container | Image | Port | Purpose |
|---|---|---|---|
| prometheus | `prom/prometheus` | 9090 (localhost) | Metrics storage, 15-day retention |
| grafana | `grafana/grafana` | 3001 (localhost) | Dashboards at `https://grafana.wizard.lu` |
| node-exporter | `prom/node-exporter` | 9100 (localhost) | Host CPU/RAM/disk metrics |
| cadvisor | `gcr.io/cadvisor/cadvisor` | 8080 (localhost) | Per-container resource metrics |
All monitoring containers run under `profiles: [full]` in `docker-compose.yml` with memory limits (256 + 192 + 64 + 128 = 640 MB total).
```
┌──────────────┐ scrape ┌─────────────────┐
│ Prometheus │◄────────────────│ Orion API │ /metrics
│ :9090 │◄────────────────│ node-exporter │ :9100
│ │◄────────────────│ cAdvisor │ :8080
└──────┬───────┘ └─────────────────┘
│ query
┌──────▼───────┐
│ Grafana │──── https://grafana.wizard.lu
│ :3001 │
└──────────────┘
```
Configuration files:
- `monitoring/prometheus.yml` — scrape targets (orion-api, node-exporter, cadvisor, self)
- `monitoring/grafana/provisioning/datasources/datasource.yml` — auto-provisions Prometheus
- `monitoring/grafana/provisioning/dashboards/dashboard.yml` — file-based dashboard provider
## Overview
```
@@ -326,47 +358,47 @@ init_observability(
### Application Lifespan
Observability is initialized in `app/core/lifespan.py` and the health router is mounted in `main.py`:
```python
# main.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from app.core.observability import (
init_observability,
shutdown_observability,
health_router,
register_module_health_checks,
)
# app/core/lifespan.py
from app.core.config import settings
from app.core.observability import init_observability, shutdown_observability
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
init_observability(
enable_metrics=True,
sentry_dsn=settings.SENTRY_DSN,
environment=settings.ENVIRONMENT,
flower_url=settings.FLOWER_URL,
grafana_url=settings.GRAFANA_URL,
enable_metrics=settings.enable_metrics,
sentry_dsn=settings.sentry_dsn,
environment=settings.sentry_environment,
flower_url=settings.flower_url,
grafana_url=settings.grafana_url,
)
register_module_health_checks()
yield
# Shutdown
shutdown_observability()
app = FastAPI(lifespan=lifespan)
app.include_router(health_router)
```
```python
# main.py
from app.core.observability import health_router
app.include_router(health_router) # /metrics, /health/live, /health/ready, /health/tools
```
Note: `/health` is defined separately in `main.py` with a richer response (DB check, feature list, docs links). The `health_router` provides the Kubernetes-style probes and Prometheus endpoint.
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `SENTRY_DSN` | Sentry DSN for error tracking | None (disabled) |
| `ENVIRONMENT` | Environment name | "development" |
| `ENABLE_METRICS` | Enable Prometheus metrics | False |
| `FLOWER_URL` | Flower dashboard URL | None |
| `GRAFANA_URL` | Grafana dashboard URL | None |
| Variable | Config field | Description | Default |
|----------|-------------|-------------|---------|
| `ENABLE_METRICS` | `enable_metrics` | Enable Prometheus metrics collection | `False` |
| `GRAFANA_URL` | `grafana_url` | Grafana dashboard URL | `https://grafana.wizard.lu` |
| `GRAFANA_ADMIN_USER` | — | Grafana admin username (docker-compose only) | `admin` |
| `GRAFANA_ADMIN_PASSWORD` | — | Grafana admin password (docker-compose only) | `changeme` |
| `SENTRY_DSN` | `sentry_dsn` | Sentry DSN for error tracking | `None` (disabled) |
| `SENTRY_ENVIRONMENT` | `sentry_environment` | Environment name for Sentry | `development` |
| `FLOWER_URL` | `flower_url` | Flower dashboard URL | `http://localhost:5555` |
## Kubernetes Integration
@@ -424,6 +456,7 @@ spec:
## Related Documentation
- [Hetzner Server Setup — Step 18](../deployment/hetzner-server-setup.md#step-18-monitoring-observability) - Production monitoring deployment
- [Module System](module-system.md) - Module health check integration
- [Background Tasks](background-tasks.md) - Celery monitoring with Flower
- [Deployment](../deployment/index.md) - Production deployment with monitoring