feat(infra): add alerting, network segmentation, and ops docs (Steps 19-24)
All checks were successful
All checks were successful
- Prometheus alert rules (host, container, API, Celery, target-down) - Alertmanager with email routing (critical 1h, warning 4h repeat) - Docker network segmentation (frontend/backend/monitoring) - Incident response runbook with 8 copy-paste runbooks - Environment variables reference (55+ vars documented) - Hetzner setup docs updated with Steps 19-24 - Launch readiness updated with Feb 2026 infrastructure status Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,377 @@
|
||||
# Environment Variables Reference
|
||||
|
||||
All configuration for the Orion platform is managed through environment variables, loaded
|
||||
via Pydantic Settings from an `.env` file or the process environment. This page provides a
|
||||
complete reference for every variable recognised by `app/core/config.py`.
|
||||
|
||||
Variables are read at startup and exposed through the `settings` singleton. In most cases
|
||||
the defaults are tuned for local development; production deployments **must** override the
|
||||
security-sensitive values listed in the [Production Checklist](#production-checklist) at the
|
||||
bottom of this page.
|
||||
|
||||
---
|
||||
|
||||
## Core / Project
|
||||
|
||||
Metadata used in the OpenAPI schema and health endpoints.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `PROJECT_NAME` | Display name shown in API docs and health responses | `Orion - Multi-Store Marketplace Platform` | No |
|
||||
| `VERSION` | Semantic version reported by the platform | `2.2.0` | No |
|
||||
|
||||
---
|
||||
|
||||
## Database
|
||||
|
||||
!!! danger "Production requirement"
|
||||
You **must** set `DATABASE_URL` to a real PostgreSQL connection string in every
|
||||
non-development environment. The default value contains a placeholder password and
|
||||
should never be used in production.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `DATABASE_URL` | PostgreSQL connection string (`postgresql://user:pass@host:port/db`) | `postgresql://orion_user:secure_password@localhost:5432/orion_db` | **Yes** |
|
||||
|
||||
---
|
||||
|
||||
## Admin Initialisation
|
||||
|
||||
Used by `init_production.py` and the database seeder to create the initial platform
|
||||
administrator account.
|
||||
|
||||
!!! warning "Change the default password"
|
||||
The default `ADMIN_PASSWORD` is `admin123`. The production validation check will emit a
|
||||
warning if this value is left unchanged.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `ADMIN_EMAIL` | Email address for the initial admin account | `admin@orion.lu` | No |
|
||||
| `ADMIN_USERNAME` | Username for the initial admin account | `admin` | No |
|
||||
| `ADMIN_PASSWORD` | Password for the initial admin account | `admin123` | No (but **must change** in production) |
|
||||
| `ADMIN_FIRST_NAME` | First name of the admin user | `Platform` | No |
|
||||
| `ADMIN_LAST_NAME` | Last name of the admin user | `Administrator` | No |
|
||||
|
||||
---
|
||||
|
||||
## JWT Authentication
|
||||
|
||||
Controls JSON Web Token generation and expiry.
|
||||
|
||||
!!! danger "Production requirement"
|
||||
`JWT_SECRET_KEY` **must** be replaced with a strong random value. Generate one with:
|
||||
|
||||
```bash
|
||||
openssl rand -hex 32
|
||||
```
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `JWT_SECRET_KEY` | Secret used to sign and verify JWTs | `change-this-in-production` | **Yes** |
|
||||
| `JWT_EXPIRE_HOURS` | Hours component of the token lifetime | `24` | No |
|
||||
| `JWT_EXPIRE_MINUTES` | Minutes component of the token lifetime | `30` | No |
|
||||
|
||||
---
|
||||
|
||||
## API Server
|
||||
|
||||
Settings passed to Uvicorn when the application starts.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `API_HOST` | Bind address for the API server | `0.0.0.0` | No |
|
||||
| `API_PORT` | Port the API server listens on | `8000` | No |
|
||||
| `DEBUG` | Enable debug mode (extra logging, auto-reload) | `True` | No (set `False` in production) |
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `DOCUMENTATION_URL` | URL where the MkDocs site is served | `http://localhost:8001` | No |
|
||||
|
||||
---
|
||||
|
||||
## Security / Middleware
|
||||
|
||||
!!! warning "Restrict allowed hosts"
|
||||
The default `ALLOWED_HOSTS` value of `["*"]` accepts requests with any `Host` header.
|
||||
In production, restrict this to your actual domain names.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `ALLOWED_HOSTS` | JSON list of permitted `Host` header values | `["*"]` | No (but **restrict** in production) |
|
||||
| `RATE_LIMIT_ENABLED` | Enable request rate limiting | `True` | No |
|
||||
| `RATE_LIMIT_REQUESTS` | Maximum number of requests per window | `100` | No |
|
||||
| `RATE_LIMIT_WINDOW` | Rate limit window duration in seconds | `3600` | No |
|
||||
|
||||
---
|
||||
|
||||
## Logging
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `LOG_LEVEL` | Python log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`) | `INFO` | No |
|
||||
| `LOG_FILE` | Path to a log file; `None` means stdout only | `None` | No |
|
||||
|
||||
---
|
||||
|
||||
## Platform Domain
|
||||
|
||||
Controls the base domain for store subdomains and custom-domain features.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `PLATFORM_DOMAIN` | Root domain under which store subdomains are created | `wizard.lu` | No |
|
||||
| `ALLOW_CUSTOM_DOMAINS` | Allow stores to use their own domain names | `True` | No |
|
||||
| `REQUIRE_DOMAIN_VERIFICATION` | Require DNS verification before activating a custom domain | `True` | No |
|
||||
| `SSL_PROVIDER` | SSL certificate provider (`letsencrypt`, `cloudflare`, `manual`) | `letsencrypt` | No |
|
||||
| `AUTO_PROVISION_SSL` | Automatically provision SSL certificates for custom domains | `False` | No |
|
||||
| `DNS_VERIFICATION_PREFIX` | TXT record prefix used for domain ownership verification | `_orion-verify` | No |
|
||||
| `DNS_VERIFICATION_TTL` | TTL in seconds for DNS verification records | `3600` | No |
|
||||
|
||||
---
|
||||
|
||||
## Platform Limits
|
||||
|
||||
Guard-rails for multi-tenant resource usage.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `MAX_STORES_PER_USER` | Maximum number of stores a single user can own | `5` | No |
|
||||
| `MAX_TEAM_MEMBERS_PER_STORE` | Maximum team members allowed per store | `50` | No |
|
||||
| `INVITATION_EXPIRY_DAYS` | Days before a team invitation link expires | `7` | No |
|
||||
|
||||
---
|
||||
|
||||
## Stripe Billing
|
||||
|
||||
!!! info "Required for payments"
|
||||
All three Stripe keys must be set to enable subscription billing and payment
|
||||
processing. Obtain them from the [Stripe Dashboard](https://dashboard.stripe.com/apikeys).
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `STRIPE_SECRET_KEY` | Stripe secret API key | `""` (empty) | Yes (for payments) |
|
||||
| `STRIPE_PUBLISHABLE_KEY` | Stripe publishable API key | `""` (empty) | Yes (for payments) |
|
||||
| `STRIPE_WEBHOOK_SECRET` | Stripe webhook signing secret | `""` (empty) | Yes (for payments) |
|
||||
| `STRIPE_TRIAL_DAYS` | Length of the free trial period in days | `30` | No |
|
||||
|
||||
---
|
||||
|
||||
## Email Configuration
|
||||
|
||||
Orion supports multiple email providers. Set `EMAIL_PROVIDER` to choose one, then
|
||||
configure the matching provider-specific variables below.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `EMAIL_PROVIDER` | Email transport backend (`smtp`, `sendgrid`, `mailgun`, `ses`) | `smtp` | No |
|
||||
| `EMAIL_FROM_ADDRESS` | Sender address for outgoing emails | `noreply@orion.lu` | No |
|
||||
| `EMAIL_FROM_NAME` | Sender display name | `Orion` | No |
|
||||
| `EMAIL_REPLY_TO` | Optional reply-to address | `""` (empty) | No |
|
||||
| `EMAIL_ENABLED` | Master switch to enable/disable all outgoing email | `True` | No |
|
||||
| `EMAIL_DEBUG` | Log emails to console instead of sending (development only) | `False` | No |
|
||||
|
||||
### SMTP Settings
|
||||
|
||||
Used when `EMAIL_PROVIDER=smtp`.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `SMTP_HOST` | SMTP server hostname | `localhost` | No |
|
||||
| `SMTP_PORT` | SMTP server port | `587` | No |
|
||||
| `SMTP_USER` | SMTP authentication username | `""` (empty) | No |
|
||||
| `SMTP_PASSWORD` | SMTP authentication password | `""` (empty) | No |
|
||||
| `SMTP_USE_TLS` | Use STARTTLS (port 587) | `True` | No |
|
||||
| `SMTP_USE_SSL` | Use implicit SSL (port 465) | `False` | No |
|
||||
|
||||
### SendGrid Settings
|
||||
|
||||
Used when `EMAIL_PROVIDER=sendgrid`.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `SENDGRID_API_KEY` | SendGrid API key | `""` (empty) | Yes (if using SendGrid) |
|
||||
|
||||
### Mailgun Settings
|
||||
|
||||
Used when `EMAIL_PROVIDER=mailgun`.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `MAILGUN_API_KEY` | Mailgun API key | `""` (empty) | Yes (if using Mailgun) |
|
||||
| `MAILGUN_DOMAIN` | Mailgun sending domain | `""` (empty) | Yes (if using Mailgun) |
|
||||
|
||||
### Amazon SES Settings
|
||||
|
||||
Used when `EMAIL_PROVIDER=ses`.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `AWS_ACCESS_KEY_ID` | AWS access key for SES | `""` (empty) | Yes (if using SES) |
|
||||
| `AWS_SECRET_ACCESS_KEY` | AWS secret key for SES | `""` (empty) | Yes (if using SES) |
|
||||
| `AWS_REGION` | AWS region for the SES endpoint | `eu-west-1` | No |
|
||||
|
||||
---
|
||||
|
||||
## Storefront Defaults
|
||||
|
||||
Default locale and currency applied to new storefronts. Individual stores can override
|
||||
these through the admin interface or the `AdminSetting` database table.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `DEFAULT_STOREFRONT_LOCALE` | Locale code for currency and number formatting | `fr-LU` | No |
|
||||
| `DEFAULT_CURRENCY` | ISO 4217 currency code | `EUR` | No |
|
||||
|
||||
---
|
||||
|
||||
## Seed Data
|
||||
|
||||
Controls the volume of demo data generated by the database seeder.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `SEED_DEMO_STORES` | Number of demo stores to create | `3` | No |
|
||||
| `SEED_CUSTOMERS_PER_STORE` | Number of demo customers per store | `15` | No |
|
||||
| `SEED_PRODUCTS_PER_STORE` | Number of demo products per store | `20` | No |
|
||||
| `SEED_ORDERS_PER_STORE` | Number of demo orders per store | `10` | No |
|
||||
|
||||
---
|
||||
|
||||
## Celery / Redis
|
||||
|
||||
Background task processing. When `USE_CELERY` is `False`, tasks fall back to FastAPI's
|
||||
built-in `BackgroundTasks`.
|
||||
|
||||
!!! tip "Enable Celery in production"
|
||||
Set `USE_CELERY=True` and ensure a Redis instance is reachable at `REDIS_URL` for
|
||||
reliable background task processing.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `REDIS_URL` | Redis connection string used as Celery broker and result backend | `redis://localhost:6379/0` | No |
|
||||
| `USE_CELERY` | Use Celery for background tasks instead of FastAPI BackgroundTasks | `False` | No (set `True` in production) |
|
||||
| `FLOWER_URL` | URL of the Flower monitoring dashboard | `http://localhost:5555` | No |
|
||||
| `FLOWER_PASSWORD` | Password for Flower authentication | `changeme` | No (but **change** in production) |
|
||||
|
||||
---
|
||||
|
||||
## Sentry
|
||||
|
||||
Error tracking and performance monitoring via [Sentry](https://sentry.io).
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `SENTRY_DSN` | Sentry Data Source Name; `None` disables Sentry | `None` | No |
|
||||
| `SENTRY_ENVIRONMENT` | Environment tag sent with events (`development`, `staging`, `production`) | `development` | No |
|
||||
| `SENTRY_TRACES_SAMPLE_RATE` | Fraction of transactions sampled for performance monitoring (0.0--1.0) | `0.1` | No |
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
Prometheus metrics and Grafana dashboard integration.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `ENABLE_METRICS` | Expose a `/metrics` endpoint for Prometheus scraping | `False` | No (set `True` in production) |
|
||||
| `GRAFANA_URL` | URL of the Grafana instance | `https://grafana.wizard.lu` | No |
|
||||
| `GRAFANA_ADMIN_USER` | Grafana admin username | `admin` | No |
|
||||
| `GRAFANA_ADMIN_PASSWORD` | Grafana admin password | `""` (empty) | No |
|
||||
|
||||
---
|
||||
|
||||
## Cloudflare R2 Storage
|
||||
|
||||
Object storage for media uploads. When `STORAGE_BACKEND` is `local`, files are stored on
|
||||
the server filesystem.
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `STORAGE_BACKEND` | Storage backend to use (`local` or `r2`) | `local` | No |
|
||||
| `R2_ACCOUNT_ID` | Cloudflare account ID | `None` | Yes (if using R2) |
|
||||
| `R2_ACCESS_KEY_ID` | R2 API access key | `None` | Yes (if using R2) |
|
||||
| `R2_SECRET_ACCESS_KEY` | R2 API secret key | `None` | Yes (if using R2) |
|
||||
| `R2_BUCKET_NAME` | R2 bucket name | `orion-media` | No |
|
||||
| `R2_PUBLIC_URL` | Custom public URL for media access (e.g. `https://media.yoursite.com`) | `None` | No |
|
||||
|
||||
---
|
||||
|
||||
## Cloudflare CDN / Proxy
|
||||
|
||||
| Variable | Description | Default | Required |
|
||||
|---|---|---|---|
|
||||
| `CLOUDFLARE_ENABLED` | Set to `True` when the application sits behind Cloudflare proxy (adjusts trusted-proxy headers) | `False` | No (set `True` when proxied) |
|
||||
|
||||
---
|
||||
|
||||
## Production Checklist
|
||||
|
||||
Before deploying to production, ensure the following variables are set correctly. Items
|
||||
marked **critical** will trigger a startup warning if left at their default values.
|
||||
|
||||
!!! danger "Critical -- must change"
|
||||
- [x] `DATABASE_URL` -- point to a production PostgreSQL instance
|
||||
- [x] `JWT_SECRET_KEY` -- generate with `openssl rand -hex 32`
|
||||
- [x] `ADMIN_PASSWORD` -- choose a strong, unique password
|
||||
- [x] `DEBUG` -- set to `False`
|
||||
- [x] `ALLOWED_HOSTS` -- restrict to your domain(s)
|
||||
|
||||
!!! warning "Strongly recommended"
|
||||
- [x] `USE_CELERY` -- set to `True` with a production Redis instance
|
||||
- [x] `FLOWER_PASSWORD` -- change from the default `changeme`
|
||||
- [x] `ENABLE_METRICS` -- set to `True` for observability
|
||||
- [x] `SENTRY_DSN` -- configure for error tracking
|
||||
- [x] `SENTRY_ENVIRONMENT` -- set to `production`
|
||||
- [x] `STORAGE_BACKEND` -- set to `r2` for scalable media storage
|
||||
- [x] `CLOUDFLARE_ENABLED` -- set to `True` if behind Cloudflare proxy
|
||||
|
||||
!!! info "Required for specific features"
|
||||
- [x] **Payments:** `STRIPE_SECRET_KEY`, `STRIPE_PUBLISHABLE_KEY`, `STRIPE_WEBHOOK_SECRET`
|
||||
- [x] **Email (SendGrid):** `SENDGRID_API_KEY`
|
||||
- [x] **Email (Mailgun):** `MAILGUN_API_KEY`, `MAILGUN_DOMAIN`
|
||||
- [x] **Email (SES):** `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
|
||||
- [x] **R2 Storage:** `R2_ACCOUNT_ID`, `R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY`
|
||||
|
||||
### Example `.env` file (production)
|
||||
|
||||
```bash
|
||||
# Core
|
||||
DATABASE_URL=postgresql://orion:STRONG_PASSWORD@db.internal:5432/orion
|
||||
JWT_SECRET_KEY=a]3f...your-random-hex-here...9c2b
|
||||
DEBUG=False
|
||||
ALLOWED_HOSTS=["wizard.lu","*.wizard.lu"]
|
||||
|
||||
# Admin
|
||||
ADMIN_PASSWORD=your-strong-admin-password
|
||||
|
||||
# Celery / Redis
|
||||
REDIS_URL=redis://redis.internal:6379/0
|
||||
USE_CELERY=True
|
||||
FLOWER_PASSWORD=a-secure-flower-password
|
||||
|
||||
# Stripe
|
||||
STRIPE_SECRET_KEY=sk_live_...
|
||||
STRIPE_PUBLISHABLE_KEY=pk_live_...
|
||||
STRIPE_WEBHOOK_SECRET=whsec_...
|
||||
|
||||
# Email (example: SendGrid)
|
||||
EMAIL_PROVIDER=sendgrid
|
||||
SENDGRID_API_KEY=SG....
|
||||
|
||||
# R2 Storage
|
||||
STORAGE_BACKEND=r2
|
||||
R2_ACCOUNT_ID=your-account-id
|
||||
R2_ACCESS_KEY_ID=your-access-key
|
||||
R2_SECRET_ACCESS_KEY=your-secret-key
|
||||
R2_PUBLIC_URL=https://media.wizard.lu
|
||||
|
||||
# Monitoring
|
||||
ENABLE_METRICS=True
|
||||
SENTRY_DSN=https://examplePublicKey@o0.ingest.sentry.io/0
|
||||
SENTRY_ENVIRONMENT=production
|
||||
CLOUDFLARE_ENABLED=True
|
||||
```
|
||||
|
||||
@@ -90,6 +90,18 @@ Complete step-by-step guide for deploying Orion on a Hetzner Cloud VPS.
|
||||
|
||||
**Steps 1–18 fully complete.** All infrastructure operational.
|
||||
|
||||
!!! success "Progress — 2026-02-15 (continued)"
|
||||
**Completed (Steps 19–24):**
|
||||
|
||||
- **Step 19: Prometheus Alerting** — alert rules (host, container, API, Celery, targets) + Alertmanager with email routing
|
||||
- **Step 20: Security Hardening** — Docker network segmentation (frontend/backend/monitoring), fail2ban config, unattended-upgrades
|
||||
- **Step 21: Cloudflare Domain Proxy** — origin certificates, WAF, bot protection, rate limiting (documented, user deploys)
|
||||
- **Step 22: Incident Response** — 8 runbooks with copy-paste commands, severity levels, decision tree
|
||||
- **Step 23: Environment Reference** — all 55+ env vars documented with defaults and production requirements
|
||||
- **Step 24: Documentation Updates** — hetzner docs, launch readiness, mkdocs nav updated
|
||||
|
||||
**Steps 1–24 fully complete.** Enterprise infrastructure hardening done.
|
||||
|
||||
|
||||
## Installed Software Versions
|
||||
|
||||
@@ -1106,6 +1118,372 @@ docker stats --no-stream
|
||||
|
||||
---
|
||||
|
||||
## Step 19: Prometheus Alerting
|
||||
|
||||
Alert rules and Alertmanager for email notifications when things go wrong.
|
||||
|
||||
### 19.1 Architecture
|
||||
|
||||
```
|
||||
┌──────────────┐ evaluates ┌───────────────────┐
|
||||
│ Prometheus │─────────────►│ alert.rules.yml │
|
||||
│ :9090 │ │ (host, container, │
|
||||
│ │ │ API, Celery) │
|
||||
└──────┬───────┘ └───────────────────┘
|
||||
│ fires alerts
|
||||
┌──────▼───────┐
|
||||
│ Alertmanager │──── email ──► admin@wizard.lu
|
||||
│ :9093 │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
### 19.2 Alert Rules
|
||||
|
||||
Alert rules are defined in `monitoring/prometheus/alert.rules.yml`:
|
||||
|
||||
| Group | Alert | Condition | Severity |
|
||||
|---|---|---|---|
|
||||
| Host | HostHighCpuUsage | CPU >80% for 5m | warning |
|
||||
| Host | HostHighMemoryUsage | Memory >85% for 5m | warning |
|
||||
| Host | HostHighDiskUsage | Disk >80% | warning |
|
||||
| Host | HostDiskFullPrediction | Disk full within 4h | critical |
|
||||
| Containers | ContainerHighRestartCount | >3 restarts/hour | critical |
|
||||
| Containers | ContainerOomKilled | Any OOM kill | critical |
|
||||
| Containers | ContainerHighCpu | >80% CPU for 5m | warning |
|
||||
| API | ApiHighErrorRate | 5xx rate >1% for 5m | critical |
|
||||
| API | ApiHighLatency | P95 >2s for 5m | warning |
|
||||
| API | ApiHealthCheckDown | Health check failing 1m | critical |
|
||||
| Celery | CeleryQueueBacklog | >100 tasks for 10m | warning |
|
||||
| Prometheus | TargetDown | Any target down 2m | critical |
|
||||
|
||||
### 19.3 Alertmanager Configuration
|
||||
|
||||
Alertmanager config is in `monitoring/alertmanager/alertmanager.yml`:
|
||||
|
||||
- **Critical alerts**: repeat every 1 hour
|
||||
- **Warning alerts**: repeat every 4 hours
|
||||
- Groups by `alertname` + `severity`, 30s wait, 5m interval
|
||||
- Inhibition: warnings suppressed when critical is already firing for same alert
|
||||
|
||||
!!! warning "Configure SMTP before deploying"
|
||||
Edit `monitoring/alertmanager/alertmanager.yml` and fill in the SMTP settings (host, username, password, recipient email). Alertmanager will start but won't send emails until SMTP is configured.
|
||||
|
||||
### 19.4 Docker Compose Changes
|
||||
|
||||
The `docker-compose.yml` includes:
|
||||
|
||||
- `alertmanager` service: `prom/alertmanager:latest`, profiles: [full], port 127.0.0.1:9093, mem_limit: 32m
|
||||
- `prometheus` volumes: mounts `alert.rules.yml` as read-only
|
||||
- `prometheus.yml`: `alerting:` section pointing to alertmanager:9093, `rule_files:` for alert rules, new scrape job for alertmanager
|
||||
|
||||
### 19.5 Deploy
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full up -d
|
||||
```
|
||||
|
||||
### 19.6 Verification
|
||||
|
||||
```bash
|
||||
# Alertmanager healthy
|
||||
curl -s http://localhost:9093/-/healthy
|
||||
|
||||
# Alert rules loaded
|
||||
curl -s http://localhost:9090/api/v1/rules | python3 -m json.tool | head -20
|
||||
|
||||
# Active alerts (should be empty if all is well)
|
||||
curl -s http://localhost:9090/api/v1/alerts | python3 -m json.tool
|
||||
|
||||
# Alertmanager target in Prometheus
|
||||
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep alertmanager
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 20: Security Hardening
|
||||
|
||||
Docker network segmentation, fail2ban configuration, and automatic security updates.
|
||||
|
||||
### 20.1 Docker Network Segmentation
|
||||
|
||||
Three isolated networks replace the default flat network:
|
||||
|
||||
| Network | Purpose | Services |
|
||||
|---|---|---|
|
||||
| `orion_frontend` | External-facing | api |
|
||||
| `orion_backend` | Database + workers | db, redis, api, celery-worker, celery-beat, flower |
|
||||
| `orion_monitoring` | Metrics collection | api, prometheus, grafana, node-exporter, cadvisor, alertmanager |
|
||||
|
||||
The `api` service is on all three networks because it needs to:
|
||||
|
||||
- Serve HTTP traffic (frontend)
|
||||
- Connect to database and Redis (backend)
|
||||
- Expose `/metrics` to Prometheus (monitoring)
|
||||
|
||||
This is already configured in the updated `docker-compose.yml`. After deploying, verify:
|
||||
|
||||
```bash
|
||||
docker network ls | grep orion
|
||||
# Expected: orion_frontend, orion_backend, orion_monitoring
|
||||
```
|
||||
|
||||
### 20.2 fail2ban Configuration
|
||||
|
||||
fail2ban is already installed (Step 3) but needs jail configuration.
|
||||
|
||||
**SSH jail** — create `/etc/fail2ban/jail.local`:
|
||||
|
||||
```ini
|
||||
[sshd]
|
||||
enabled = true
|
||||
port = ssh
|
||||
filter = sshd
|
||||
logpath = /var/log/auth.log
|
||||
maxretry = 3
|
||||
bantime = 86400
|
||||
findtime = 600
|
||||
```
|
||||
|
||||
**Caddy auth filter** — create `/etc/fail2ban/filter.d/caddy-auth.conf`:
|
||||
|
||||
```ini
|
||||
[Definition]
|
||||
failregex = ^.*"remote_ip":"<HOST>".*"status":(401|403).*$
|
||||
ignoreregex =
|
||||
```
|
||||
|
||||
**Caddy jail** — create `/etc/fail2ban/jail.d/caddy.conf`:
|
||||
|
||||
```ini
|
||||
[caddy-auth]
|
||||
enabled = true
|
||||
port = http,https
|
||||
filter = caddy-auth
|
||||
logpath = /var/log/caddy/access.log
|
||||
maxretry = 10
|
||||
bantime = 3600
|
||||
findtime = 600
|
||||
```
|
||||
|
||||
!!! note "Caddy access logging"
|
||||
For the Caddy jail to work, enable access logging in your Caddyfile by adding `log` directives that write to `/var/log/caddy/access.log` in JSON format. See [Caddy logging docs](https://caddyserver.com/docs/caddyfile/directives/log).
|
||||
|
||||
Restart fail2ban:
|
||||
|
||||
```bash
|
||||
sudo systemctl restart fail2ban
|
||||
sudo fail2ban-client status
|
||||
sudo fail2ban-client status sshd
|
||||
```
|
||||
|
||||
### 20.3 Unattended Security Upgrades
|
||||
|
||||
Install and enable automatic security updates:
|
||||
|
||||
```bash
|
||||
sudo apt install -y unattended-upgrades apt-listchanges
|
||||
sudo dpkg-reconfigure -plow unattended-upgrades
|
||||
```
|
||||
|
||||
This enables security-only updates with automatic reboot disabled (safe default). Verify:
|
||||
|
||||
```bash
|
||||
sudo unattended-upgrades --dry-run 2>&1 | head -10
|
||||
cat /etc/apt/apt.conf.d/20auto-upgrades
|
||||
```
|
||||
|
||||
Expected `20auto-upgrades` content:
|
||||
|
||||
```
|
||||
APT::Periodic::Update-Package-Lists "1";
|
||||
APT::Periodic::Unattended-Upgrade "1";
|
||||
```
|
||||
|
||||
### 20.4 Verification
|
||||
|
||||
```bash
|
||||
# fail2ban jails active
|
||||
sudo fail2ban-client status sshd
|
||||
|
||||
# Docker networks exist
|
||||
docker network ls | grep orion
|
||||
|
||||
# Unattended upgrades configured
|
||||
sudo unattended-upgrades --dry-run 2>&1 | head
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 21: Cloudflare Domain Proxy
|
||||
|
||||
Move DNS to Cloudflare for WAF, DDoS protection, and CDN. This step involves DNS propagation — do it during a maintenance window.
|
||||
|
||||
!!! warning "DNS changes affect all services"
|
||||
Moving nameservers involves propagation delay (minutes to hours). Plan for brief interruption. Do this step last, after Steps 19–20 are verified.
|
||||
|
||||
### 21.1 Pre-Migration: Record Email DNS
|
||||
|
||||
Before changing nameservers, document all email-related DNS records:
|
||||
|
||||
```bash
|
||||
# Run for each domain (wizard.lu, omsflow.lu, rewardflow.lu)
|
||||
dig wizard.lu MX +short
|
||||
dig wizard.lu TXT +short
|
||||
dig _dmarc.wizard.lu TXT +short
|
||||
dig default._domainkey.wizard.lu TXT +short # DKIM selector may vary
|
||||
```
|
||||
|
||||
Save the output — you'll need to verify these exist after Cloudflare import.
|
||||
|
||||
### 21.2 Add Domains to Cloudflare
|
||||
|
||||
1. Log in to [Cloudflare Dashboard](https://dash.cloudflare.com)
|
||||
2. **Add a site** for each domain: `wizard.lu`, `omsflow.lu`, `rewardflow.lu`
|
||||
3. Cloudflare auto-scans and imports existing DNS records
|
||||
4. **Verify MX/SPF/DKIM/DMARC records are present** before changing NS
|
||||
5. Email records must stay as **DNS-only (grey cloud)** — never proxy MX records
|
||||
|
||||
### 21.3 Change Nameservers
|
||||
|
||||
At your domain registrar, update NS records to Cloudflare's assigned nameservers. Cloudflare will show which NS to use (e.g., `ns1.cloudflare.com`, `ns2.cloudflare.com`).
|
||||
|
||||
### 21.4 Generate Origin Certificates
|
||||
|
||||
Cloudflare Origin Certificates (free, 15-year validity) avoid ACME challenge issues when traffic is proxied:
|
||||
|
||||
1. In Cloudflare: **SSL/TLS** > **Origin Server** > **Create Certificate**
|
||||
2. Generate for `*.wizard.lu, wizard.lu` (repeat for each domain)
|
||||
3. Download the certificate and private key
|
||||
|
||||
Install on the server:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /etc/caddy/certs/{wizard.lu,omsflow.lu,rewardflow.lu}
|
||||
# Copy cert.pem and key.pem to each directory
|
||||
sudo chown -R caddy:caddy /etc/caddy/certs/
|
||||
sudo chmod 600 /etc/caddy/certs/*/key.pem
|
||||
```
|
||||
|
||||
### 21.5 Update Caddyfile
|
||||
|
||||
For Cloudflare-proxied domains, use explicit TLS with origin certs. Keep auto-HTTPS for `git.wizard.lu` (DNS-only, grey cloud):
|
||||
|
||||
```caddy
|
||||
# ─── Cloudflare-proxied domains (origin certs) ──────────
|
||||
wizard.lu {
|
||||
tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
|
||||
reverse_proxy localhost:8001
|
||||
}
|
||||
|
||||
omsflow.lu {
|
||||
tls /etc/caddy/certs/omsflow.lu/cert.pem /etc/caddy/certs/omsflow.lu/key.pem
|
||||
reverse_proxy localhost:8001
|
||||
}
|
||||
|
||||
rewardflow.lu {
|
||||
tls /etc/caddy/certs/rewardflow.lu/cert.pem /etc/caddy/certs/rewardflow.lu/key.pem
|
||||
reverse_proxy localhost:8001
|
||||
}
|
||||
|
||||
api.wizard.lu {
|
||||
tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
|
||||
reverse_proxy localhost:8001
|
||||
}
|
||||
|
||||
flower.wizard.lu {
|
||||
tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
|
||||
reverse_proxy localhost:5555
|
||||
}
|
||||
|
||||
grafana.wizard.lu {
|
||||
tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
|
||||
reverse_proxy localhost:3001
|
||||
}
|
||||
|
||||
# ─── DNS-only domain (auto-HTTPS via Let's Encrypt) ─────
|
||||
git.wizard.lu {
|
||||
reverse_proxy localhost:3000
|
||||
}
|
||||
```
|
||||
|
||||
Restart Caddy:
|
||||
|
||||
```bash
|
||||
sudo systemctl restart caddy
|
||||
sudo systemctl status caddy
|
||||
```
|
||||
|
||||
### 21.6 Cloudflare Settings (per domain)
|
||||
|
||||
| Setting | Value |
|
||||
|---|---|
|
||||
| SSL mode | Full (Strict) |
|
||||
| Always Use HTTPS | On |
|
||||
| WAF Managed Rules | On |
|
||||
| Bot Fight Mode | On |
|
||||
| Rate Limiting | 100 req/min on `/api/*` |
|
||||
|
||||
### 21.7 Production Environment
|
||||
|
||||
Add to `~/apps/orion/.env`:
|
||||
|
||||
```bash
|
||||
CLOUDFLARE_ENABLED=true
|
||||
```
|
||||
|
||||
### 21.8 Verification
|
||||
|
||||
```bash
|
||||
# CF proxy active (look for cf-ray header)
|
||||
curl -I https://wizard.lu | grep cf-ray
|
||||
|
||||
# DNS resolves to Cloudflare IPs (not 91.99.65.229)
|
||||
dig wizard.lu +short
|
||||
|
||||
# All domains responding
|
||||
curl -I https://omsflow.lu
|
||||
curl -I https://rewardflow.lu
|
||||
curl -I https://api.wizard.lu/health
|
||||
|
||||
# git.wizard.lu still on Let's Encrypt (not CF)
|
||||
curl -I https://git.wizard.lu
|
||||
```
|
||||
|
||||
!!! info "`git.wizard.lu` stays DNS-only"
|
||||
The Gitea instance uses SSH on port 2222 for git operations. Cloudflare proxy only supports HTTP/HTTPS, so `git.wizard.lu` must remain as DNS-only (grey cloud) with Let's Encrypt auto-SSL via Caddy.
|
||||
|
||||
---
|
||||
|
||||
## Step 22: Incident Response Runbook
|
||||
|
||||
A comprehensive incident response runbook is available at [Incident Response](incident-response.md). It includes:
|
||||
|
||||
- **Severity levels**: SEV-1 (platform down, <15min), SEV-2 (feature broken, <1h), SEV-3 (minor, <4h)
|
||||
- **Quick diagnosis decision tree**: SSH → Docker → containers → Caddy → DNS
|
||||
- **8 runbooks** with copy-paste commands for common incidents
|
||||
- **Post-incident report template**
|
||||
- **Monitoring URLs** quick reference
|
||||
|
||||
---
|
||||
|
||||
## Step 23: Environment Reference
|
||||
|
||||
A complete environment variables reference is available at [Environment Variables](environment.md). It documents all 55+ configuration variables from `app/core/config.py`, grouped by category with defaults and production requirements.
|
||||
|
||||
---
|
||||
|
||||
## Step 24: Documentation Updates
|
||||
|
||||
This document has been updated with Steps 19–24. Additional documentation changes:
|
||||
|
||||
- `docs/deployment/incident-response.md` — new incident response runbook
|
||||
- `docs/deployment/environment.md` — complete env var reference (was empty)
|
||||
- `docs/deployment/launch-readiness.md` — updated with Feb 2026 infrastructure status
|
||||
- `mkdocs.yml` — incident-response.md added to nav
|
||||
|
||||
---
|
||||
|
||||
## Domain & Port Reference
|
||||
|
||||
| Service | Internal Port | External Port | Domain (via Caddy) |
|
||||
@@ -1122,6 +1500,7 @@ docker stats --no-stream
|
||||
| Grafana | 3000 | 3001 (localhost) | `grafana.wizard.lu` |
|
||||
| Node Exporter | 9100 | 9100 (localhost) | (internal only) |
|
||||
| cAdvisor | 8080 | 8080 (localhost) | (internal only) |
|
||||
| Alertmanager | 9093 | 9093 (localhost) | (internal only) |
|
||||
| Caddy | — | 80, 443 | (reverse proxy) |
|
||||
|
||||
!!! note "Single backend, multiple domains"
|
||||
|
||||
793
docs/deployment/incident-response.md
Normal file
793
docs/deployment/incident-response.md
Normal file
@@ -0,0 +1,793 @@
|
||||
# Incident Response Runbook
|
||||
|
||||
Operational runbook for diagnosing and resolving production incidents on the Orion platform.
|
||||
|
||||
!!! info "Server Details"
|
||||
- **Server**: Hetzner Cloud CAX11 (4 GB RAM, ARM64)
|
||||
- **IP**: `91.99.65.229`
|
||||
- **App path**: `~/apps/orion`
|
||||
- **Docker profile**: `--profile full`
|
||||
- **Reverse proxy**: Caddy 2.10.2 (systemd, not containerized)
|
||||
- **Domains**: wizard.lu, omsflow.lu, rewardflow.lu
|
||||
|
||||
---
|
||||
|
||||
## Severity Levels
|
||||
|
||||
| Level | Definition | Examples | Response Time | Notification |
|
||||
|-------|-----------|----------|---------------|--------------|
|
||||
| **SEV-1** | Platform down, all users affected | API unreachable, database down, server unresponsive | **< 15 min** | Immediate page |
|
||||
| **SEV-2** | Feature broken, subset of users affected | Celery not processing tasks, one platform domain down, SSL expired | **< 1 hour** | Slack / email alert |
|
||||
| **SEV-3** | Minor issue, no user impact or minimal degradation | High memory warning, slow queries, disk usage above 70% | **< 4 hours** | Grafana alert, next business day |
|
||||
|
||||
!!! warning "Escalation"
|
||||
If a SEV-2 is not resolved within 2 hours, escalate to SEV-1. If a SEV-3 trends toward impacting users, escalate to SEV-2.
|
||||
|
||||
---
|
||||
|
||||
## Quick Diagnosis Decision Tree
|
||||
|
||||
Follow these steps in order when responding to any incident.
|
||||
|
||||
### Step 1: Can you reach the server?
|
||||
|
||||
```bash
|
||||
ssh samir@91.99.65.229
|
||||
```
|
||||
|
||||
- **Yes** -- proceed to Step 2.
|
||||
- **No** -- check your local network. Try from a different connection. If still unreachable, check [Hetzner Status](https://status.hetzner.com/) and open a support ticket. As a last resort, use the Hetzner Cloud Console rescue mode.
|
||||
|
||||
### Step 2: Is Docker running?
|
||||
|
||||
```bash
|
||||
sudo systemctl status docker
|
||||
```
|
||||
|
||||
- **Yes** -- proceed to Step 3.
|
||||
- **No** -- start Docker:
|
||||
|
||||
```bash
|
||||
sudo systemctl start docker
|
||||
```
|
||||
|
||||
### Step 3: Are the containers running?
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion && docker compose --profile full ps
|
||||
```
|
||||
|
||||
Check for containers in `Restarting`, `Exited`, or missing entirely. Healthy output shows all containers as `Up (healthy)` or `Up`.
|
||||
|
||||
- **All healthy** -- proceed to Step 4.
|
||||
- **Some down** -- go to the relevant runbook below (API, Database, Celery, etc.).
|
||||
- **All down** -- go to [Runbook 7: Full Stack Restart](#7-full-stack-restart-after-reboot).
|
||||
|
||||
### Step 4: Is Caddy running?
|
||||
|
||||
```bash
|
||||
sudo systemctl status caddy
|
||||
```
|
||||
|
||||
- **Yes** -- proceed to Step 5.
|
||||
- **No** -- go to [Runbook 4: Caddy / SSL / Domain Issues](#4-caddy-ssl-domain-issues).
|
||||
|
||||
### Step 5: Are domains resolving?
|
||||
|
||||
```bash
|
||||
dig wizard.lu +short
|
||||
dig api.wizard.lu +short
|
||||
dig omsflow.lu +short
|
||||
dig rewardflow.lu +short
|
||||
```
|
||||
|
||||
All should return `91.99.65.229`. If not, check DNS records at your registrar.
|
||||
|
||||
### Step 6: Is the API responding?
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:8001/health | python3 -m json.tool
|
||||
curl -s https://api.wizard.lu/health
|
||||
```
|
||||
|
||||
- **Both work** -- issue may be intermittent. Check Grafana for recent anomalies.
|
||||
- **localhost works, external fails** -- Caddy or DNS issue. Go to [Runbook 4](#4-caddy-ssl-domain-issues).
|
||||
- **Neither works** -- API is down. Go to [Runbook 1](#1-api-container-down-crash-looping).
|
||||
|
||||
---
|
||||
|
||||
## Runbooks
|
||||
|
||||
### 1. API Container Down / Crash-Looping
|
||||
|
||||
!!! danger "SEV-1"
|
||||
API unavailability affects all users on all platforms.
|
||||
|
||||
**Symptoms**: `api` container shows `Restarting` or `Exited` in `docker compose ps`. External URLs return 502.
|
||||
|
||||
**Diagnose**:
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion
|
||||
|
||||
# Check container status
|
||||
docker compose --profile full ps api
|
||||
|
||||
# View recent logs (last 100 lines)
|
||||
docker compose --profile full logs --tail=100 api
|
||||
|
||||
# Look for Python exceptions
|
||||
docker compose --profile full logs api 2>&1 | grep -i "error\|exception\|traceback" | tail -20
|
||||
```
|
||||
|
||||
**Common causes and fixes**:
|
||||
|
||||
=== "Import / syntax error in code"
|
||||
|
||||
The log will show a Python traceback on startup. This usually means a bad deploy.
|
||||
|
||||
```bash
|
||||
# Roll back to previous commit
|
||||
cd ~/apps/orion
|
||||
git log --oneline -5
|
||||
git checkout <previous-good-commit>
|
||||
docker compose --profile full up -d --build api
|
||||
```
|
||||
|
||||
=== "Database connection refused"
|
||||
|
||||
The API cannot reach PostgreSQL. See [Runbook 2](#2-database-issues).
|
||||
|
||||
=== "Port conflict"
|
||||
|
||||
Another process is using port 8001.
|
||||
|
||||
```bash
|
||||
sudo ss -tlnp | grep 8001
|
||||
# Kill the conflicting process, then restart
|
||||
docker compose --profile full restart api
|
||||
```
|
||||
|
||||
=== "Out of memory"
|
||||
|
||||
The container was OOM-killed. See [Runbook 3](#3-high-memory-oom).
|
||||
|
||||
**Recovery**:
|
||||
|
||||
```bash
|
||||
# Restart the API container
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full restart api
|
||||
|
||||
# Wait 10 seconds, then verify
|
||||
sleep 10
|
||||
docker compose --profile full ps api
|
||||
curl -s http://localhost:8001/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Database Issues
|
||||
|
||||
!!! danger "SEV-1"
|
||||
Database unavailability brings down the entire platform.
|
||||
|
||||
**Symptoms**: API logs show `connection refused`, `could not connect to server`, or `OperationalError`. Health check fails with database errors.
|
||||
|
||||
**Diagnose**:
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion
|
||||
|
||||
# Check PostgreSQL container
|
||||
docker compose --profile full ps db
|
||||
docker compose --profile full logs --tail=50 db
|
||||
|
||||
# Test connection from inside the network
|
||||
docker compose --profile full exec db pg_isready -U orion_user -d orion_db
|
||||
|
||||
# Check disk space (PostgreSQL needs space for WAL)
|
||||
df -h
|
||||
docker system df
|
||||
```
|
||||
|
||||
**Common causes and fixes**:
|
||||
|
||||
=== "Container stopped"
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full up -d db
|
||||
sleep 5
|
||||
docker compose --profile full exec db pg_isready -U orion_user -d orion_db
|
||||
# Once healthy, restart the API
|
||||
docker compose --profile full restart api celery-worker celery-beat
|
||||
```
|
||||
|
||||
=== "Too many connections"
|
||||
|
||||
```bash
|
||||
# Check active connections
|
||||
docker compose --profile full exec db \
|
||||
psql -U orion_user -d orion_db -c \
|
||||
"SELECT count(*) FROM pg_stat_activity;"
|
||||
|
||||
# Kill idle connections
|
||||
docker compose --profile full exec db \
|
||||
psql -U orion_user -d orion_db -c \
|
||||
"SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '10 minutes';"
|
||||
```
|
||||
|
||||
=== "Disk full (WAL or data)"
|
||||
|
||||
See [Runbook 6: Disk Full](#6-disk-full).
|
||||
|
||||
=== "Data corruption (last resort)"
|
||||
|
||||
If PostgreSQL refuses to start with corruption errors:
|
||||
|
||||
```bash
|
||||
# Stop everything
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full down
|
||||
|
||||
# Restore from backup (see Runbook 8)
|
||||
bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/<latest>.sql.gz
|
||||
```
|
||||
|
||||
**Check for slow queries**:
|
||||
|
||||
```bash
|
||||
docker compose --profile full exec db \
|
||||
psql -U orion_user -d orion_db -c \
|
||||
"SELECT pid, now() - query_start AS duration, left(query, 80)
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle'
|
||||
ORDER BY duration DESC
|
||||
LIMIT 10;"
|
||||
```
|
||||
|
||||
**Kill a stuck query**:
|
||||
|
||||
```bash
|
||||
docker compose --profile full exec db \
|
||||
psql -U orion_user -d orion_db -c \
|
||||
"SELECT pg_terminate_backend(<PID>);"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. High Memory / OOM
|
||||
|
||||
!!! warning "SEV-2 (can escalate to SEV-1 if OOM killer fires)"
|
||||
The server has 4 GB RAM. Normal usage is ~2.4 GB. Above 3.2 GB is critical.
|
||||
|
||||
**Symptoms**: Containers restarting unexpectedly. `dmesg` shows OOM killer. Grafana memory graphs spiking.
|
||||
|
||||
**Diagnose**:
|
||||
|
||||
```bash
|
||||
# System memory
|
||||
free -h
|
||||
|
||||
# Per-container memory usage
|
||||
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
|
||||
|
||||
# Check for OOM kills
|
||||
sudo dmesg | grep -i "oom\|killed" | tail -10
|
||||
|
||||
# Top processes by memory
|
||||
ps aux --sort=-%mem | head -15
|
||||
```
|
||||
|
||||
**Immediate relief**:
|
||||
|
||||
```bash
|
||||
# Clear Docker build cache
|
||||
docker builder prune -f
|
||||
|
||||
# Remove unused images
|
||||
docker image prune -f
|
||||
|
||||
# Remove stopped containers
|
||||
docker container prune -f
|
||||
|
||||
# Nuclear option: remove all unused Docker data
|
||||
docker system prune -f
|
||||
```
|
||||
|
||||
**If a specific container is the culprit**:
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion
|
||||
|
||||
# Restart the offending container
|
||||
docker compose --profile full restart <container-name>
|
||||
|
||||
# If the API is leaking memory, a restart is the fastest fix
|
||||
docker compose --profile full restart api
|
||||
```
|
||||
|
||||
**If CI jobs are running** (they add ~550 MB temporarily):
|
||||
|
||||
```bash
|
||||
# Check if a Gitea Actions runner job is active
|
||||
sudo systemctl status gitea-runner
|
||||
# Wait for the job to finish, or stop the runner temporarily
|
||||
sudo systemctl stop gitea-runner
|
||||
```
|
||||
|
||||
!!! tip "Long-term fix"
|
||||
If OOM events become frequent, upgrade to CAX21 (8 GB RAM, ~7.50 EUR/mo) via **Hetzner Cloud Console > Server > Rescale**. The upgrade takes about 2 minutes and preserves all data.
|
||||
|
||||
---
|
||||
|
||||
### 4. Caddy / SSL / Domain Issues
|
||||
|
||||
!!! warning "SEV-2"
|
||||
Caddy handles TLS termination and routing for all domains. If Caddy is down, all external access is lost even though the API may be running fine internally.
|
||||
|
||||
**Symptoms**: Sites return connection refused on port 443. SSL certificate errors in the browser. Specific domain not working.
|
||||
|
||||
**Diagnose**:
|
||||
|
||||
```bash
|
||||
# Check Caddy status
|
||||
sudo systemctl status caddy
|
||||
|
||||
# View Caddy logs
|
||||
sudo journalctl -u caddy --since "30 minutes ago" --no-pager
|
||||
|
||||
# Test internal API directly (bypasses Caddy)
|
||||
curl -s http://localhost:8001/health
|
||||
|
||||
# Test SSL certificates
|
||||
curl -vI https://wizard.lu 2>&1 | grep -E "SSL|subject|expire"
|
||||
curl -vI https://api.wizard.lu 2>&1 | grep -E "SSL|subject|expire"
|
||||
```
|
||||
|
||||
**Common causes and fixes**:
|
||||
|
||||
=== "Caddy stopped"
|
||||
|
||||
```bash
|
||||
sudo systemctl start caddy
|
||||
sudo systemctl status caddy
|
||||
```
|
||||
|
||||
=== "Caddyfile syntax error"
|
||||
|
||||
```bash
|
||||
# Validate configuration
|
||||
sudo caddy validate --config /etc/caddy/Caddyfile
|
||||
|
||||
# If invalid, check recent changes
|
||||
sudo nano /etc/caddy/Caddyfile
|
||||
|
||||
# After fixing, reload (not restart, preserves certificates)
|
||||
sudo systemctl reload caddy
|
||||
```
|
||||
|
||||
=== "SSL certificate issue"
|
||||
|
||||
Caddy auto-renews certificates. If renewal fails, it is usually a port 80 or DNS issue.
|
||||
|
||||
```bash
|
||||
# Ensure port 80 is open (needed for ACME HTTP challenge)
|
||||
sudo ufw status | grep 80
|
||||
|
||||
# Check Caddy certificate storage
|
||||
sudo ls -la /var/lib/caddy/.local/share/caddy/certificates/
|
||||
|
||||
# Force certificate renewal by restarting Caddy
|
||||
sudo systemctl restart caddy
|
||||
```
|
||||
|
||||
=== "DNS not pointing to server"
|
||||
|
||||
```bash
|
||||
dig wizard.lu +short
|
||||
# Should return 91.99.65.229
|
||||
|
||||
# If wrong, update DNS at registrar and wait for propagation
|
||||
# Temporary: test by adding to /etc/hosts on your local machine
|
||||
```
|
||||
|
||||
**Caddyfile reference** (at `/etc/caddy/Caddyfile`):
|
||||
|
||||
```bash
|
||||
sudo cat /etc/caddy/Caddyfile
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Celery Worker Issues
|
||||
|
||||
!!! attention "SEV-2"
|
||||
Celery processes background tasks (imports, emails, scheduled jobs). If down, no background work happens, but the platform remains browsable.
|
||||
|
||||
**Symptoms**: Background tasks not executing. Flower shows no active workers. Emails not being sent.
|
||||
|
||||
**Diagnose**:
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion
|
||||
|
||||
# Check worker and beat containers
|
||||
docker compose --profile full ps celery-worker celery-beat
|
||||
|
||||
# View worker logs
|
||||
docker compose --profile full logs --tail=50 celery-worker
|
||||
docker compose --profile full logs --tail=50 celery-beat
|
||||
|
||||
# Check Redis (the broker)
|
||||
docker compose --profile full exec redis redis-cli ping
|
||||
docker compose --profile full exec redis redis-cli llen celery
|
||||
|
||||
# Check Flower for worker status
|
||||
curl -s http://localhost:5555/api/workers | python3 -m json.tool
|
||||
```
|
||||
|
||||
**Common causes and fixes**:
|
||||
|
||||
=== "Worker crashed / import error"
|
||||
|
||||
```bash
|
||||
# Check for Python errors in worker logs
|
||||
docker compose --profile full logs celery-worker 2>&1 | grep -i "error\|exception" | tail -10
|
||||
|
||||
# Restart worker
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full restart celery-worker celery-beat
|
||||
```
|
||||
|
||||
=== "Redis down"
|
||||
|
||||
```bash
|
||||
# Check Redis container
|
||||
docker compose --profile full ps redis
|
||||
docker compose --profile full logs --tail=20 redis
|
||||
|
||||
# Restart Redis, then workers
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full restart redis
|
||||
sleep 5
|
||||
docker compose --profile full restart celery-worker celery-beat
|
||||
```
|
||||
|
||||
=== "Task queue backed up"
|
||||
|
||||
```bash
|
||||
# Check queue length
|
||||
docker compose --profile full exec redis redis-cli llen celery
|
||||
|
||||
# If queue is extremely large and tasks are stale, purge
|
||||
docker compose --profile full exec api \
|
||||
celery -A app.core.celery_app purge -f
|
||||
|
||||
# Restart worker to pick up fresh
|
||||
docker compose --profile full restart celery-worker
|
||||
```
|
||||
|
||||
=== "Beat scheduler out of sync"
|
||||
|
||||
```bash
|
||||
# Remove the beat schedule file and restart
|
||||
docker compose --profile full exec celery-beat rm -f /app/celerybeat-schedule
|
||||
docker compose --profile full restart celery-beat
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Disk Full
|
||||
|
||||
!!! warning "SEV-2 (becomes SEV-1 if PostgreSQL cannot write WAL)"
|
||||
The server has 37 GB disk. Docker images, logs, and database WAL can fill it quickly.
|
||||
|
||||
**Symptoms**: Write errors in logs. PostgreSQL panics. Docker cannot pull images. `No space left on device` errors.
|
||||
|
||||
**Diagnose**:
|
||||
|
||||
```bash
|
||||
# Overall disk usage
|
||||
df -h /
|
||||
|
||||
# Docker disk usage breakdown
|
||||
docker system df
|
||||
|
||||
# Largest directories
|
||||
sudo du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -10
|
||||
du -sh ~/backups/* 2>/dev/null
|
||||
du -sh ~/apps/orion/logs/* 2>/dev/null
|
||||
```
|
||||
|
||||
**Immediate cleanup**:
|
||||
|
||||
```bash
|
||||
# 1. Remove old Docker images and build cache (safe, usually frees 2-5 GB)
|
||||
docker system prune -af --volumes
|
||||
|
||||
# 2. Truncate application logs
|
||||
cd ~/apps/orion
|
||||
truncate -s 0 logs/*.log 2>/dev/null
|
||||
|
||||
# 3. Remove old backups beyond retention policy
|
||||
find ~/backups -name "*.sql.gz" -mtime +14 -delete
|
||||
|
||||
# 4. Clean systemd journal logs (keep last 3 days)
|
||||
sudo journalctl --vacuum-time=3d
|
||||
|
||||
# 5. Clean apt cache
|
||||
sudo apt clean
|
||||
```
|
||||
|
||||
**After freeing space**:
|
||||
|
||||
```bash
|
||||
# Verify space recovered
|
||||
df -h /
|
||||
|
||||
# Restart any containers that failed due to disk full
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full up -d
|
||||
```
|
||||
|
||||
!!! tip "Prevention"
|
||||
Set up a Grafana alert for disk usage > 70%. The node-exporter dashboard (ID 1860) includes disk usage panels. If the server persistently runs low on disk, upgrade to CAX21 (80 GB disk).
|
||||
|
||||
---
|
||||
|
||||
### 7. Full Stack Restart (After Reboot)
|
||||
|
||||
!!! info "SEV-2"
|
||||
After a server reboot (planned or unplanned), all services need to come back up in the correct order.
|
||||
|
||||
**When to use**: After a Hetzner maintenance reboot, manual reboot, or kernel upgrade.
|
||||
|
||||
**Step-by-step recovery**:
|
||||
|
||||
```bash
|
||||
# 1. Verify Docker is running
|
||||
sudo systemctl status docker
|
||||
# If not: sudo systemctl start docker
|
||||
|
||||
# 2. Start Gitea (needed for CI, not for the app itself)
|
||||
cd ~/gitea && docker compose up -d
|
||||
sleep 5
|
||||
|
||||
# 3. Start the Orion stack (db and redis start first due to depends_on)
|
||||
cd ~/apps/orion
|
||||
docker compose --profile full up -d
|
||||
sleep 15
|
||||
|
||||
# 4. Verify all containers are healthy
|
||||
docker compose --profile full ps
|
||||
|
||||
# 5. Verify API health
|
||||
curl -s http://localhost:8001/health | python3 -m json.tool
|
||||
|
||||
# 6. Start Caddy (should auto-start, but verify)
|
||||
sudo systemctl status caddy
|
||||
# If not running: sudo systemctl start caddy
|
||||
|
||||
# 7. Start the Gitea Actions runner
|
||||
sudo systemctl status gitea-runner
|
||||
# If not running: sudo systemctl start gitea-runner
|
||||
|
||||
# 8. Verify external access
|
||||
curl -s https://api.wizard.lu/health
|
||||
curl -I https://wizard.lu
|
||||
curl -I https://omsflow.lu
|
||||
curl -I https://rewardflow.lu
|
||||
|
||||
# 9. Verify monitoring
|
||||
curl -I https://grafana.wizard.lu
|
||||
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep -c '"health":"up"'
|
||||
|
||||
# 10. Verify backups timer is active
|
||||
systemctl list-timers orion-backup.timer
|
||||
```
|
||||
|
||||
!!! note "Boot order"
|
||||
Docker containers with `restart: always` will auto-start after Docker starts. Caddy and the Gitea runner are systemd services with `WantedBy=multi-user.target` and also auto-start. In practice, you mainly need to verify rather than manually start.
|
||||
|
||||
---
|
||||
|
||||
### 8. Restore from Backup (Disaster Recovery)
|
||||
|
||||
!!! danger "SEV-1"
|
||||
Use this runbook when the database is corrupted or data is lost and you need to restore from a backup.
|
||||
|
||||
**Prerequisites**: Identify the backup to restore from.
|
||||
|
||||
```bash
|
||||
# List available local backups
|
||||
ls -lh ~/backups/orion/daily/
|
||||
ls -lh ~/backups/orion/weekly/
|
||||
|
||||
# If local backups are gone, download from R2
|
||||
source ~/apps/orion/.env
|
||||
aws s3 ls s3://orion-backups/orion/daily/ \
|
||||
--endpoint-url "https://${R2_ACCOUNT_ID:-$(grep R2_ACCOUNT_ID ~/apps/orion/.env | cut -d= -f2)}.r2.cloudflarestorage.com" \
|
||||
--profile r2
|
||||
```
|
||||
|
||||
**Download from R2 (if local backups unavailable)**:
|
||||
|
||||
```bash
|
||||
aws s3 sync s3://orion-backups/ ~/backups/ \
|
||||
--endpoint-url "https://<ACCOUNT_ID>.r2.cloudflarestorage.com" \
|
||||
--profile r2
|
||||
```
|
||||
|
||||
**Restore using the restore script**:
|
||||
|
||||
```bash
|
||||
# Restore Orion database
|
||||
bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/<backup-file>.sql.gz
|
||||
```
|
||||
|
||||
The restore script will:
|
||||
|
||||
1. Stop application containers (API, Celery) while keeping the database running
|
||||
2. Drop and recreate the `orion_db` database
|
||||
3. Restore from the `.sql.gz` backup file
|
||||
4. Run `alembic upgrade heads` to apply any pending migrations
|
||||
5. Restart all containers
|
||||
|
||||
**Verify after restore**:
|
||||
|
||||
```bash
|
||||
cd ~/apps/orion
|
||||
|
||||
# Check API health
|
||||
curl -s http://localhost:8001/health | python3 -m json.tool
|
||||
|
||||
# Verify data integrity (check row counts of key tables)
|
||||
docker compose --profile full exec db \
|
||||
psql -U orion_user -d orion_db -c \
|
||||
"SELECT 'platforms' AS tbl, count(*) FROM platforms
|
||||
UNION ALL SELECT 'users', count(*) FROM users
|
||||
UNION ALL SELECT 'stores', count(*) FROM stores;"
|
||||
|
||||
# Verify external access
|
||||
curl -s https://api.wizard.lu/health
|
||||
```
|
||||
|
||||
**Restore Gitea (if needed)**:
|
||||
|
||||
```bash
|
||||
bash ~/apps/orion/scripts/restore.sh gitea ~/backups/gitea/daily/<backup-file>.sql.gz
|
||||
```
|
||||
|
||||
**Full server rebuild from Hetzner snapshot** (worst case):
|
||||
|
||||
1. Go to **Hetzner Cloud Console > Servers > Snapshots**
|
||||
2. Select the most recent snapshot and click **Rebuild from snapshot**
|
||||
3. After rebuild, SSH in and verify all services per [Runbook 7](#7-full-stack-restart-after-reboot)
|
||||
|
||||
---
|
||||
|
||||
## Post-Incident Report Template
|
||||
|
||||
After resolving any SEV-1 or SEV-2 incident, create a post-incident report. Save reports in a shared location for the team.
|
||||
|
||||
```markdown
|
||||
# Post-Incident Report: [Brief Title]
|
||||
|
||||
**Date**: YYYY-MM-DD
|
||||
**Severity**: SEV-1 / SEV-2
|
||||
**Duration**: HH:MM (from detection to resolution)
|
||||
**Author**: [Name]
|
||||
|
||||
## Incident Summary
|
||||
|
||||
[1-2 sentence description of what happened and the user impact.]
|
||||
|
||||
## Timeline (UTC)
|
||||
|
||||
| Time | Event |
|
||||
|-------|--------------------------------------------|
|
||||
| HH:MM | Alert triggered / issue reported |
|
||||
| HH:MM | Responder acknowledged |
|
||||
| HH:MM | Root cause identified |
|
||||
| HH:MM | Fix applied |
|
||||
| HH:MM | Service fully restored |
|
||||
|
||||
## Root Cause
|
||||
|
||||
[What caused the incident. Be specific -- e.g., "OOM killer terminated the API
|
||||
container because a Celery import task loaded 50k products into memory at once."]
|
||||
|
||||
## Resolution
|
||||
|
||||
[What was done to fix it. Include exact commands if relevant.]
|
||||
|
||||
## Impact
|
||||
|
||||
- **Users affected**: [number or scope]
|
||||
- **Data lost**: [none / describe]
|
||||
- **Downtime**: [duration]
|
||||
|
||||
## Action Items
|
||||
|
||||
| Action | Owner | Due Date | Status |
|
||||
|--------|-------|----------|--------|
|
||||
| [Preventive measure 1] | [Name] | YYYY-MM-DD | [ ] Open |
|
||||
| [Preventive measure 2] | [Name] | YYYY-MM-DD | [ ] Open |
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
[What went well, what could be improved in the response process.]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Useful Monitoring URLs
|
||||
|
||||
| Service | URL | Purpose |
|
||||
|---------|-----|---------|
|
||||
| **Grafana** | [grafana.wizard.lu](https://grafana.wizard.lu) | Dashboards for host metrics, container metrics |
|
||||
| **Prometheus** | `http://localhost:9090` (SSH tunnel) | Raw metrics queries, target health |
|
||||
| **Prometheus Targets** | `http://localhost:9090/targets` | Check which scrape targets are up/down |
|
||||
| **API Health** | [api.wizard.lu/health](https://api.wizard.lu/health) | Application health check (DB, Redis) |
|
||||
| **API Liveness** | [api.wizard.lu/health/live](https://api.wizard.lu/health/live) | Basic liveness probe |
|
||||
| **API Readiness** | [api.wizard.lu/health/ready](https://api.wizard.lu/health/ready) | Readiness probe (includes dependencies) |
|
||||
| **API Metrics** | [api.wizard.lu/metrics](https://api.wizard.lu/metrics) | Prometheus-format application metrics |
|
||||
| **Flower** | [flower.wizard.lu](https://flower.wizard.lu) | Celery task monitoring, worker status |
|
||||
| **Gitea** | [git.wizard.lu](https://git.wizard.lu) | Git repository and CI pipeline status |
|
||||
| **Main Platform** | [wizard.lu](https://wizard.lu) | Main storefront |
|
||||
| **OMS Platform** | [omsflow.lu](https://omsflow.lu) | OMS storefront |
|
||||
| **Loyalty+ Platform** | [rewardflow.lu](https://rewardflow.lu) | Loyalty+ storefront |
|
||||
| **Hetzner Console** | [console.hetzner.cloud](https://console.hetzner.cloud) | Server management, snapshots, rescue mode |
|
||||
| **Hetzner Status** | [status.hetzner.com](https://status.hetzner.com) | Hetzner infrastructure status |
|
||||
|
||||
!!! tip "SSH tunnel for internal services"
|
||||
Prometheus and other internal services are not exposed externally. To access them from your local machine:
|
||||
|
||||
```bash
|
||||
# Prometheus (localhost:9090 on server → localhost:9090 on your machine)
|
||||
ssh -L 9090:localhost:9090 samir@91.99.65.229
|
||||
|
||||
# Then open http://localhost:9090 in your browser
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: Essential Commands
|
||||
|
||||
```bash
|
||||
# SSH into the server
|
||||
ssh samir@91.99.65.229
|
||||
|
||||
# Container status
|
||||
cd ~/apps/orion && docker compose --profile full ps
|
||||
|
||||
# Container resource usage
|
||||
docker stats --no-stream
|
||||
|
||||
# Follow all logs
|
||||
cd ~/apps/orion && docker compose --profile full logs -f
|
||||
|
||||
# Restart a single service
|
||||
cd ~/apps/orion && docker compose --profile full restart <service>
|
||||
|
||||
# Full stack rebuild
|
||||
cd ~/apps/orion && docker compose --profile full up -d --build
|
||||
|
||||
# Caddy status / logs
|
||||
sudo systemctl status caddy
|
||||
sudo journalctl -u caddy -f
|
||||
|
||||
# System resources
|
||||
free -h && df -h / && uptime
|
||||
|
||||
# Manual deploy
|
||||
cd ~/apps/orion && bash scripts/deploy.sh
|
||||
|
||||
# Manual backup
|
||||
bash ~/apps/orion/scripts/backup.sh --upload
|
||||
|
||||
# Run migrations
|
||||
cd ~/apps/orion && docker compose --profile full exec -e PYTHONPATH=/app api python -m alembic upgrade heads
|
||||
```
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
This document tracks the launch readiness status of the complete platform including Store Dashboard, Shop/Storefront, and Admin features.
|
||||
|
||||
**Last Updated:** 2026-01-08
|
||||
**Last Updated:** 2026-02-15
|
||||
**Overall Status:** 95% Feature Complete - LAUNCH READY
|
||||
|
||||
---
|
||||
@@ -104,7 +104,7 @@ Previous blockers (password reset, search, order emails) have been resolved. Onl
|
||||
|-----------|--------|-----|
|
||||
| Email System | 20% | Password reset, tier change notifications |
|
||||
| Payment Verification | Missing | Stripe payment intent verification |
|
||||
| Monitoring | 50% | Framework ready, alerting TODO |
|
||||
| Monitoring | Ready | Prometheus + Grafana + Alertmanager with 12 alert rules |
|
||||
|
||||
---
|
||||
|
||||
@@ -192,6 +192,24 @@ Previous blockers (password reset, search, order emails) have been resolved. Onl
|
||||
|
||||
---
|
||||
|
||||
## February 2026 Infrastructure Hardening
|
||||
|
||||
| Component | Status | Details |
|
||||
|-----------|--------|---------|
|
||||
| Hetzner VPS | Running | CAX11 (4 GB RAM, ARM64), Ubuntu 24.04 |
|
||||
| Docker stack | 11 containers | API, DB, Redis, Celery x2, Flower, Prometheus, Grafana, node-exporter, cAdvisor, Alertmanager |
|
||||
| Monitoring | Complete | Prometheus (5 targets), Grafana dashboards, 12 alert rules |
|
||||
| Alerting | Complete | Alertmanager with email routing (critical 1h, warning 4h) |
|
||||
| Backups | Complete | Daily pg_dump, R2 offsite, Hetzner snapshots |
|
||||
| Network security | Complete | 3 Docker networks (frontend/backend/monitoring), fail2ban, unattended-upgrades |
|
||||
| Reverse proxy | Complete | Caddy with auto-SSL for all domains |
|
||||
| CI/CD | Complete | Gitea Actions, auto-deploy on push to master |
|
||||
| Cloudflare proxy | Documented | Origin certs + WAF ready, deploy when needed |
|
||||
| Incident response | Complete | 8 runbooks, severity levels, decision tree |
|
||||
| Environment docs | Complete | 55+ env vars documented with defaults |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status
|
||||
|
||||
All code validators pass:
|
||||
@@ -228,10 +246,13 @@ Performance Validator: PASSED (with skips)
|
||||
|
||||
### Infrastructure
|
||||
- [ ] Production Stripe keys
|
||||
- [ ] SSL certificates
|
||||
- [ ] Database backups configured
|
||||
- [ ] Monitoring/alerting setup
|
||||
- [x] SSL certificates (Caddy auto-SSL via Let's Encrypt)
|
||||
- [x] Database backups configured (daily pg_dump + R2 offsite + Hetzner snapshots)
|
||||
- [x] Monitoring/alerting setup (Prometheus + Grafana + Alertmanager)
|
||||
- [ ] Error tracking (Sentry)
|
||||
- [x] Docker network segmentation (frontend/backend/monitoring)
|
||||
- [x] fail2ban + unattended-upgrades
|
||||
- [ ] Cloudflare proxy (WAF, DDoS protection)
|
||||
|
||||
### Pre-Launch Testing
|
||||
- [ ] End-to-end order flow
|
||||
|
||||
Reference in New Issue
Block a user