docs: update observability and deployment docs to match production stack
Some checks failed
Some checks failed
Update observability.md with production container table, actual init code, and correct env var names. Update docker.md with full 10-service table and backup/monitoring cross-references. Add explicit AAAA records to DNS tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2,6 +2,38 @@
|
||||
|
||||
The Orion platform includes a comprehensive observability framework for monitoring application health, collecting metrics, and tracking errors. This is part of the Framework Layer - infrastructure that modules depend on.
|
||||
|
||||
## Production Stack
|
||||
|
||||
The full monitoring stack runs as Docker containers alongside the application:
|
||||
|
||||
| Container | Image | Port | Purpose |
|
||||
|---|---|---|---|
|
||||
| prometheus | `prom/prometheus` | 9090 (localhost) | Metrics storage, 15-day retention |
|
||||
| grafana | `grafana/grafana` | 3001 (localhost) | Dashboards at `https://grafana.wizard.lu` |
|
||||
| node-exporter | `prom/node-exporter` | 9100 (localhost) | Host CPU/RAM/disk metrics |
|
||||
| cadvisor | `gcr.io/cadvisor/cadvisor` | 8080 (localhost) | Per-container resource metrics |
|
||||
|
||||
All monitoring containers run under `profiles: [full]` in `docker-compose.yml` with memory limits (256 + 192 + 64 + 128 = 640 MB total).
|
||||
|
||||
```
|
||||
┌──────────────┐ scrape ┌─────────────────┐
|
||||
│ Prometheus │◄────────────────│ Orion API │ /metrics
|
||||
│ :9090 │◄────────────────│ node-exporter │ :9100
|
||||
│ │◄────────────────│ cAdvisor │ :8080
|
||||
└──────┬───────┘ └─────────────────┘
|
||||
│ query
|
||||
┌──────▼───────┐
|
||||
│ Grafana │──── https://grafana.wizard.lu
|
||||
│ :3001 │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
Configuration files:
|
||||
|
||||
- `monitoring/prometheus.yml` — scrape targets (orion-api, node-exporter, cadvisor, self)
|
||||
- `monitoring/grafana/provisioning/datasources/datasource.yml` — auto-provisions Prometheus
|
||||
- `monitoring/grafana/provisioning/dashboards/dashboard.yml` — file-based dashboard provider
|
||||
|
||||
## Overview
|
||||
|
||||
```
|
||||
@@ -326,47 +358,47 @@ init_observability(
|
||||
|
||||
### Application Lifespan
|
||||
|
||||
Observability is initialized in `app/core/lifespan.py` and the health router is mounted in `main.py`:
|
||||
|
||||
```python
|
||||
# main.py
|
||||
from contextlib import asynccontextmanager
|
||||
from fastapi import FastAPI
|
||||
from app.core.observability import (
|
||||
init_observability,
|
||||
shutdown_observability,
|
||||
health_router,
|
||||
register_module_health_checks,
|
||||
)
|
||||
# app/core/lifespan.py
|
||||
from app.core.config import settings
|
||||
from app.core.observability import init_observability, shutdown_observability
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
# Startup
|
||||
init_observability(
|
||||
enable_metrics=True,
|
||||
sentry_dsn=settings.SENTRY_DSN,
|
||||
environment=settings.ENVIRONMENT,
|
||||
flower_url=settings.FLOWER_URL,
|
||||
grafana_url=settings.GRAFANA_URL,
|
||||
enable_metrics=settings.enable_metrics,
|
||||
sentry_dsn=settings.sentry_dsn,
|
||||
environment=settings.sentry_environment,
|
||||
flower_url=settings.flower_url,
|
||||
grafana_url=settings.grafana_url,
|
||||
)
|
||||
register_module_health_checks()
|
||||
|
||||
yield
|
||||
|
||||
# Shutdown
|
||||
shutdown_observability()
|
||||
|
||||
app = FastAPI(lifespan=lifespan)
|
||||
app.include_router(health_router)
|
||||
```
|
||||
|
||||
```python
|
||||
# main.py
|
||||
from app.core.observability import health_router
|
||||
app.include_router(health_router) # /metrics, /health/live, /health/ready, /health/tools
|
||||
```
|
||||
|
||||
Note: `/health` is defined separately in `main.py` with a richer response (DB check, feature list, docs links). The `health_router` provides the Kubernetes-style probes and Prometheus endpoint.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `SENTRY_DSN` | Sentry DSN for error tracking | None (disabled) |
|
||||
| `ENVIRONMENT` | Environment name | "development" |
|
||||
| `ENABLE_METRICS` | Enable Prometheus metrics | False |
|
||||
| `FLOWER_URL` | Flower dashboard URL | None |
|
||||
| `GRAFANA_URL` | Grafana dashboard URL | None |
|
||||
| Variable | Config field | Description | Default |
|
||||
|----------|-------------|-------------|---------|
|
||||
| `ENABLE_METRICS` | `enable_metrics` | Enable Prometheus metrics collection | `False` |
|
||||
| `GRAFANA_URL` | `grafana_url` | Grafana dashboard URL | `https://grafana.wizard.lu` |
|
||||
| `GRAFANA_ADMIN_USER` | — | Grafana admin username (docker-compose only) | `admin` |
|
||||
| `GRAFANA_ADMIN_PASSWORD` | — | Grafana admin password (docker-compose only) | `changeme` |
|
||||
| `SENTRY_DSN` | `sentry_dsn` | Sentry DSN for error tracking | `None` (disabled) |
|
||||
| `SENTRY_ENVIRONMENT` | `sentry_environment` | Environment name for Sentry | `development` |
|
||||
| `FLOWER_URL` | `flower_url` | Flower dashboard URL | `http://localhost:5555` |
|
||||
|
||||
## Kubernetes Integration
|
||||
|
||||
@@ -424,6 +456,7 @@ spec:
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Hetzner Server Setup — Step 18](../deployment/hetzner-server-setup.md#step-18-monitoring-observability) - Production monitoring deployment
|
||||
- [Module System](module-system.md) - Module health check integration
|
||||
- [Background Tasks](background-tasks.md) - Celery monitoring with Flower
|
||||
- [Deployment](../deployment/index.md) - Production deployment with monitoring
|
||||
|
||||
Reference in New Issue
Block a user