# Hetzner Cloud Server Setup Complete step-by-step guide for deploying Orion on a Hetzner Cloud VPS. !!! info "Server Details" - **Provider**: Hetzner Cloud - **OS**: Ubuntu 24.04.3 LTS (upgraded to 24.04.4 after updates) - **Architecture**: aarch64 (ARM64) - **IP**: `91.99.65.229` - **IPv6**: `2a01:4f8:1c1a:b39c::1` - **Disk**: 37 GB - **RAM**: 4 GB - **Auth**: SSH key (configured via Hetzner Console) - **Setup date**: 2026-02-11 !!! success "Progress — 2026-02-12" **Completed (Steps 1–16):** - Non-root user `samir` with SSH key - Server hardened (UFW firewall, SSH root login disabled, fail2ban) - Docker 29.2.1 & Docker Compose 5.0.2 installed - Gitea running at `https://git.wizard.lu` (user: `sboulahtit`, repo: `orion`) - Repository cloned to `~/apps/orion` - Production `.env` configured with generated secrets - Full Docker stack deployed (API, PostgreSQL, Redis, Celery worker/beat, Flower) - Database migrated (76 tables) and seeded (admin, platforms, CMS, email templates) - API verified at `https://api.wizard.lu/health` - DNS A records configured and propagated for `wizard.lu` and subdomains - Caddy 2.10.2 reverse proxy with auto-SSL (Let's Encrypt) - Temporary firewall rules removed (ports 3000, 8001) - Gitea Actions runner v0.2.13 registered and running as systemd service - SSH key added to Gitea for local push via SSH - Git remote updated: `ssh://git@git.wizard.lu:2222/sboulahtit/orion.git` - ProxyHeadersMiddleware added for correct HTTPS behind Caddy - Fixed TierLimitExceededException import and Pydantic @field_validator bugs - `wizard.lu` serving frontend with CSS over HTTPS (mixed content fixed) - `/merchants` and `/admin` redirect fix (CMS catch-all was intercepting) !!! success "Progress — 2026-02-13" **Completed:** - CI fully green: ruff (lint), pytest, architecture, docs all passing - Pinned ruff==0.8.4 in requirements-dev.txt (CI/local version mismatch was root cause of recurring I001 errors) - Pre-commit hooks configured and installed (ruff auto-fix, architecture validation, trailing whitespace, end-of-file) - AAAA (IPv6) records added for all wizard.lu domains - mkdocs build clean (zero warnings) — all 32 orphan pages added to nav - Pre-commit documented in `docs/development/code-quality.md` - **Step 16: Continuous deployment** — auto-deploy on push to master via `scripts/deploy.sh` + Gitea Actions **Next steps:** - [x] Step 17: Backups - [x] Step 18: Monitoring & observability **Deferred (not urgent, do when all platforms ready):** - [x] ~~DNS A + AAAA records for platform domains (`omsflow.lu`, `rewardflow.lu`)~~ - [x] ~~Uncomment platform domains in Caddyfile after DNS propagation~~ !!! success "Progress — 2026-02-14" **Completed:** - **Wizamart → Orion rename** — 1,086 occurrences replaced across 184 files (database identifiers, email addresses, domains, config, templates, docs, seed data) - Template renamed: `homepage-wizamart.html` → `homepage-orion.html` - **Production DB rebuilt from scratch** with Orion naming (`orion_db`, `orion_user`) - Platform domains configured in seed data: wizard.lu (main), omsflow.lu, rewardflow.lu (loyalty) - Docker volume explicitly named `orion_postgres_data` - `.dockerignore` added — prevents `.env` from being baked into Docker images - `env_file: .env` added to `docker-compose.yml` — containers load host env vars properly - `CapacitySnapshot` model import fixed (moved from billing to monitoring in `alembic/env.py`) - All services verified healthy at `https://api.wizard.lu/health` - **Step 17: Backups** — automated pg_dump scripts (daily + weekly rotation), R2 offsite upload, restore helper - **Step 18: Monitoring** — Prometheus, Grafana, node-exporter, cAdvisor added to docker-compose; `/metrics` endpoint activated via `prometheus_client` !!! success "Progress — 2026-02-15" **Completed:** - **Step 17 server-side**: Hetzner backups enabled (5 of 7 daily images, last 6.22 GB) - **Step 18 server-side**: Full monitoring stack deployed — Prometheus (4/4 targets up), Grafana at `https://grafana.wizard.lu` with Node Exporter Full (#1860) and Docker/cAdvisor (#193) dashboards - **Domain rename**: `oms.lu` → `omsflow.lu`, `loyalty.lu` → `rewardflow.lu` across entire codebase (19 + 13 files) - **Platform domains live**: all three platforms serving HTTPS via Caddy with auto-SSL - `https://wizard.lu` (main) - `https://omsflow.lu` (OMS) - `https://rewardflow.lu` (Loyalty+) - Platform `domain` column updated in production DB - RAM usage ~2.4 GB on 4 GB server (stable, CI jobs add ~550 MB temporarily) - **Systemd backup timer** (`orion-backup.timer`) — daily at 03:00 UTC, tested manually - **Cloudflare R2 offsite backups** — `orion-backups` bucket, `awscli` configured with `--profile r2`, `--upload` flag added to systemd timer - `python3-pip` and `awscli` installed on server (pip user install, PATH added to `.bashrc` and systemd service) **Steps 1–18 fully complete.** All infrastructure operational. !!! success "Progress — 2026-02-15 (continued)" **Completed (Steps 19–24):** - **Step 19: Prometheus Alerting** — alert rules (host, container, API, Celery, targets) + Alertmanager with email routing - **Step 20: Security Hardening** — Docker network segmentation (frontend/backend/monitoring), fail2ban config, unattended-upgrades - **Step 21: Cloudflare Domain Proxy** — origin certificates, WAF, bot protection, rate limiting (documented, user deploys) - **Step 22: Incident Response** — 8 runbooks with copy-paste commands, severity levels, decision tree - **Step 23: Environment Reference** — all 55+ env vars documented with defaults and production requirements - **Step 24: Documentation Updates** — hetzner docs, launch readiness, mkdocs nav updated **Steps 1–24 fully complete.** Enterprise infrastructure hardening done. !!! success "Progress — 2026-02-24" **Completed:** - **Step 25: Google Wallet Integration** — Google Cloud project "Orion" created, Wallet API enabled, service account configured - Google Pay Merchant ID: `BCR2DN5TW2CNXDAG` - Google Wallet Issuer ID: `3388000000023089598` - Service account: `wallet-service@orion-488322.iam.gserviceaccount.com` (admin role in Pay & Wallet Console) - Service account JSON key generated - Dependencies added to `requirements.txt`: `google-auth>=2.0.0`, `PyJWT>=2.0.0` (commit `d36783a`) - Loyalty env vars added to `.env.example` and `docs/deployment/environment.md` - `LOYALTY_GOOGLE_ISSUER_ID` and `LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON` added to `app/core/config.py` Settings class - **End-to-end integration wired:** - Enrollment auto-creates Google Wallet class + object (`card_service` → `wallet_service.create_wallet_objects`) - Stamp/points operations auto-sync to Google Wallet (`stamp_service`/`points_service` → `wallet_service.sync_card_to_wallets`) - Storefront API returns wallet URLs (`GET /loyalty/card`, `POST /loyalty/enroll`) - "Add to Google Wallet" button wired in storefront dashboard and enrollment success page (Alpine.js conditional rendering) - Google Wallet is a platform-wide config (env vars only) — merchants don't need to configure anything **Next steps:** - [ ] Upload service account JSON to Hetzner server - [ ] Set `LOYALTY_GOOGLE_ISSUER_ID` and `LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON` in production `.env` - [ ] Restart app and test end-to-end: enroll → add pass → stamp → verify pass updates - [ ] Submit for Google production approval when ready - [ ] Apple Wallet setup (APNs push, certificates, pass images) !!! success "Progress — 2026-02-16" **Completed:** - **Step 21: Cloudflare Domain Proxy** — all three domains active on Cloudflare (Full setup): - `wizard.lu` — DNS records configured (6 A + 6 AAAA), old CNAME records removed, NS switched at Netim, SSL/TLS set to Full (Strict), Always Use HTTPS enabled, AI crawlers blocked - `omsflow.lu` — DNS records configured (2 A + 2 AAAA), NS switched at Netim, SSL/TLS Full (Strict) + Always Use HTTPS - `rewardflow.lu` — DNS records configured (2 A + 2 AAAA), NS switched at Netim, SSL/TLS Full (Strict) + Always Use HTTPS - `git.wizard.lu` stays DNS-only (grey cloud) for SSH access on port 2222 - DNSSEC disabled at registrar (will re-enable via Cloudflare later) - Registrar: Netim (`netim.com`) - Origin certificates generated (non-wildcard, specific subdomains) and installed on server - Caddyfile updated: origin certs for proxied domains, `tls { issuer acme }` for `git.wizard.lu` - Access logging enabled for fail2ban (`/var/log/caddy/access.log`) - All domains verified working: `wizard.lu`, `omsflow.lu`, `rewardflow.lu`, `api.wizard.lu`, `git.wizard.lu` - **Step 19: SendGrid SMTP** — fully configured and tested: - SendGrid account created (free trial, 60-day limit) - `wizard.lu` domain authenticated (5 CNAME + 1 TXT in Cloudflare DNS) - Link branding enabled - API key `orion-production` created - Alertmanager SMTP configured (`alerts@wizard.lu` → SendGrid) - App email configured (`EMAIL_PROVIDER=sendgrid`, `noreply@wizard.lu`) - Test alert sent and received successfully - **Cloudflare security** — configured on all three domains: - Bot Fight Mode enabled - DDoS protection active (default) - Rate limiting: 100 req/10s on `/api/` paths, block for 10s **Steps 1–24 fully deployed and operational.** !!! success "Progress — 2026-02-17" **Launch readiness — fully deployed and verified (44/44 checks pass):** - **Memory limits** on all 6 app containers (db: 512m, redis: 128m, api: 512m, celery-worker: 512m, celery-beat: 256m, flower: 256m) — beat/flower bumped from 128m after OOM kills - **Flower port** restricted to localhost only (`127.0.0.1:5555:5555`) — access via Caddy reverse proxy - **Flower password** changed from default - **Infrastructure health checks** — `/health/ready` now checks PostgreSQL (`SELECT 1`) and Redis (`ping`) with individual check details and latency - **fail2ban Caddy auth jail** deployed — bans IPs after 10 failed auth attempts - **Unattended upgrades** verified active - **Scaling guide** — practical playbook at `docs/deployment/scaling-guide.md` - **Server verification script** — `scripts/verify-server.sh` (44/44 PASS, 0 FAIL, 0 WARN) **Server is launch-ready for first client (24 stores).** ## Installed Software Versions | Software | Version | |---|---| | Ubuntu | 24.04.4 LTS | | Kernel | 6.8.0-100-generic (aarch64) | | Docker | 29.2.1 | | Docker Compose | 5.0.2 | | PostgreSQL | 15 (container) | | Redis | 7-alpine (container) | | Python | 3.11-slim (container) | | Gitea | latest (container) | | Caddy | 2.10.2 | | act_runner | 0.2.13 | --- ## Step 1: Initial Server Access ```bash ssh root@91.99.65.229 ``` ## Step 2: Create Non-Root User Create a dedicated user with sudo privileges and copy the SSH key: ```bash # Create user adduser samir usermod -aG sudo samir # Copy SSH keys to new user rsync --archive --chown=samir:samir ~/.ssh /home/samir ``` Verify by connecting as the new user (from a **new terminal**): ```bash ssh samir@91.99.65.229 ``` ## Step 3: System Update & Essential Packages ```bash sudo apt update && sudo apt upgrade -y sudo apt install -y \ curl \ git \ wget \ ufw \ fail2ban \ htop \ unzip \ make ``` Reboot if a kernel upgrade is pending: ```bash sudo reboot ``` ## Step 4: Firewall Configuration (UFW) ```bash sudo ufw allow OpenSSH sudo ufw allow 80/tcp sudo ufw allow 443/tcp sudo ufw enable ``` Verify: ```bash sudo ufw status ``` Expected output: ``` Status: active To Action From -- ------ ---- OpenSSH ALLOW Anywhere 80/tcp ALLOW Anywhere 443/tcp ALLOW Anywhere ``` ## Step 5: Harden SSH !!! warning "Before doing this step" Make sure you can SSH as `samir` from another terminal first! If you lock yourself out, you'll need to use Hetzner's console rescue mode. ```bash sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config sudo systemctl restart ssh # Note: Ubuntu 24.04 uses 'ssh' not 'sshd' ``` ## Step 6: Install Docker & Docker Compose ```bash curl -fsSL https://get.docker.com | sh sudo usermod -aG docker samir ``` Log out and back in for the group change: ```bash exit # Then: ssh samir@91.99.65.229 ``` Verify: ```bash docker --version docker compose version ``` ## Step 7: Gitea (Self-Hosted Git) Create the Gitea directory and compose file: ```bash mkdir -p ~/gitea && cd ~/gitea ``` Create `docker-compose.yml` with `nano ~/gitea/docker-compose.yml`: ```yaml services: gitea: image: gitea/gitea:latest container_name: gitea restart: always environment: - USER_UID=1000 - USER_GID=1000 - GITEA__database__DB_TYPE=postgres - GITEA__database__HOST=gitea-db:5432 - GITEA__database__NAME=gitea - GITEA__database__USER=gitea - GITEA__database__PASSWD= - GITEA__server__ROOT_URL=http://91.99.65.229:3000/ - GITEA__server__SSH_DOMAIN=91.99.65.229 - GITEA__server__DOMAIN=91.99.65.229 - GITEA__actions__ENABLED=true volumes: - gitea-data:/data - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro ports: - "3000:3000" - "2222:22" depends_on: gitea-db: condition: service_healthy gitea-db: image: postgres:15 container_name: gitea-db restart: always environment: POSTGRES_DB: gitea POSTGRES_USER: gitea POSTGRES_PASSWORD: volumes: - gitea-db-data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U gitea"] interval: 10s timeout: 5s retries: 5 volumes: gitea-data: gitea-db-data: ``` Generate the database password with `openssl rand -hex 16` and replace `` in both places. Open the firewall for Gitea and start: ```bash sudo ufw allow 3000/tcp docker compose up -d docker compose ps ``` Visit `http://91.99.65.229:3000` and complete the setup wizard. Create an admin account (e.g. `sboulahtit`). Then create a repository (e.g. `orion`). ## Step 8: Push Repository to Gitea ### Add SSH Key to Gitea Before pushing via SSH, add your local machine's public key to Gitea: 1. Copy your public key: ```bash cat ~/.ssh/id_ed25519.pub # Or if using RSA: cat ~/.ssh/id_rsa.pub ``` 2. In the Gitea web UI: click your avatar → **Settings** → **SSH / GPG Keys** → **Add Key** → paste the key. 3. Add the Gitea SSH host to known hosts: ```bash ssh-keyscan -p 2222 git.wizard.lu >> ~/.ssh/known_hosts ``` ### Add Remote and Push From your **local machine**: ```bash cd /home/samir/Documents/PycharmProjects/letzshop-product-import git remote add gitea ssh://git@git.wizard.lu:2222/sboulahtit/orion.git git push gitea master ``` !!! note "Remote URL updated" The remote was initially set to `http://91.99.65.229:3000/...` during setup. After Caddy was configured, it was updated to use the domain with SSH: `ssh://git@git.wizard.lu:2222/sboulahtit/orion.git` To update an existing remote: ```bash git remote set-url gitea ssh://git@git.wizard.lu:2222/sboulahtit/orion.git ``` ## Step 9: Clone Repository on Server ```bash mkdir -p ~/apps cd ~/apps git clone http://localhost:3000/sboulahtit/orion.git cd orion ``` ## Step 10: Configure Production Environment ```bash cp .env.example .env nano .env ``` ### Critical Production Values Generate secrets: ```bash openssl rand -hex 32 # For JWT_SECRET_KEY openssl rand -hex 16 # For database password ``` | Variable | How to Generate / What to Set | |---|---| | `DEBUG` | `False` | | `DATABASE_URL` | `postgresql://orion_user:YOUR_DB_PW@db:5432/orion_db` | | `JWT_SECRET_KEY` | Output of `openssl rand -hex 32` | | `ADMIN_PASSWORD` | Strong password | | `USE_CELERY` | `true` | | `REDIS_URL` | `redis://redis:6379/0` | | `STRIPE_SECRET_KEY` | Your Stripe secret key (configure later) | | `STRIPE_PUBLISHABLE_KEY` | Your Stripe publishable key (configure later) | | `STRIPE_WEBHOOK_SECRET` | Your Stripe webhook secret (configure later) | | `STORAGE_BACKEND` | `r2` (if using Cloudflare R2, configure later) | Also update the PostgreSQL password in `docker-compose.yml` (lines 9 and 40) to match. ## Step 11: Deploy with Docker Compose ```bash cd ~/apps/orion # Create directories with correct permissions for the container user mkdir -p logs uploads exports sudo chown -R 1000:1000 logs uploads exports # Start infrastructure first docker compose up -d db redis # Wait for health checks to pass docker compose ps # Build and start the full stack docker compose --profile full up -d --build ``` Verify all services are running: ```bash docker compose --profile full ps ``` Expected: `api` (healthy), `db` (healthy), `redis` (healthy), `celery-worker` (healthy), `celery-beat` (running), `flower` (running). ## Step 12: Initialize Database !!! note "PYTHONPATH required" The seed scripts need `PYTHONPATH=/app` set explicitly when running inside the container. ```bash # Run migrations (use 'heads' for multi-branch Alembic) docker compose --profile full exec -e PYTHONPATH=/app api python -m alembic upgrade heads # Seed production data docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/init_production.py docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/init_log_settings.py docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/create_default_content_pages.py docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/seed_email_templates.py ``` ### Seeded Data Summary | Data | Count | |---|---| | Admin users | 1 (`admin@wizard.lu`) | | Platforms | 3 (OMS, Main, Loyalty+) | | Admin settings | 15 | | Subscription tiers | 4 (Essential, Professional, Business, Enterprise) | | Log settings | 6 | | CMS pages | 8 (About, Contact, FAQ, Shipping, Returns, Privacy, Terms, Homepage) | | Email templates | 17 (4 languages: en, fr, de, lb) | --- ## Step 13: DNS Configuration Before setting up Caddy, point your domain's DNS to the server. ### wizard.lu (Main Platform) — Completed | Type | Name | Value | TTL | |---|---|---|---| | A | `@` | `91.99.65.229` | 300 | | A | `www` | `91.99.65.229` | 300 | | A | `api` | `91.99.65.229` | 300 | | A | `git` | `91.99.65.229` | 300 | | A | `flower` | `91.99.65.229` | 300 | ### omsflow.lu (OMS Platform) — Completed | Type | Name | Value | TTL | |---|---|---|---| | A | `@` | `91.99.65.229` | 300 | | A | `www` | `91.99.65.229` | 300 | | AAAA | `@` | `2a01:4f8:1c1a:b39c::1` | 300 | | AAAA | `www` | `2a01:4f8:1c1a:b39c::1` | 300 | ### rewardflow.lu (Loyalty+ Platform) — Completed | Type | Name | Value | TTL | |---|---|---|---| | A | `@` | `91.99.65.229` | 300 | | A | `www` | `91.99.65.229` | 300 | | AAAA | `@` | `2a01:4f8:1c1a:b39c::1` | 300 | | AAAA | `www` | `2a01:4f8:1c1a:b39c::1` | 300 | ### IPv6 (AAAA) Records — Completed AAAA records are included in the DNS tables above for all domains. To verify your IPv6 address: ```bash ip -6 addr show eth0 | grep 'scope global' ``` It should match the value in the Hetzner Cloud Console (Networking tab). Then create AAAA records mirroring each A record above, e.g.: | Type | Name (wizard.lu) | Value | TTL | |---|---|---|---| | AAAA | `@` | `2a01:4f8:1c1a:b39c::1` | 300 | | AAAA | `www` | `2a01:4f8:1c1a:b39c::1` | 300 | | AAAA | `api` | `2a01:4f8:1c1a:b39c::1` | 300 | | AAAA | `git` | `2a01:4f8:1c1a:b39c::1` | 300 | | AAAA | `flower` | `2a01:4f8:1c1a:b39c::1` | 300 | Repeat for `omsflow.lu` and `rewardflow.lu`. !!! tip "DNS propagation" Set TTL to 300 (5 minutes) initially. DNS changes can take up to 24 hours to propagate globally, but usually complete within 30 minutes. Verify with: `dig api.wizard.lu +short` ## Step 14: Reverse Proxy with Caddy Install Caddy: ```bash sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' \ | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \ | sudo tee /etc/apt/sources.list.d/caddy-stable.list sudo apt update && sudo apt install caddy ``` ### Caddyfile Configuration Edit `/etc/caddy/Caddyfile`: ```caddy # ─── Platform 1: Main (wizard.lu) ─────────────────────────── wizard.lu { reverse_proxy localhost:8001 } www.wizard.lu { redir https://wizard.lu{uri} permanent } # ─── Platform 2: OMS (omsflow.lu) ─────────────────────────────── omsflow.lu { reverse_proxy localhost:8001 } www.omsflow.lu { redir https://omsflow.lu{uri} permanent } # ─── Platform 3: Loyalty+ (rewardflow.lu) ────────────────── rewardflow.lu { reverse_proxy localhost:8001 } www.rewardflow.lu { redir https://rewardflow.lu{uri} permanent } # ─── Services ─────────────────────────────────────────────── api.wizard.lu { reverse_proxy localhost:8001 } git.wizard.lu { reverse_proxy localhost:3000 } flower.wizard.lu { reverse_proxy localhost:5555 } ``` !!! info "How multi-platform routing works" All platform domains (`wizard.lu`, `omsflow.lu`, `rewardflow.lu`) point to the **same FastAPI backend** on port 8001. The `PlatformContextMiddleware` reads the `Host` header to detect which platform the request is for. Caddy preserves the Host header by default, so no extra configuration is needed. The `domain` column in the `platforms` database table must match: | Platform | code | domain | |---|---|---| | Main | `main` | `wizard.lu` | | OMS | `oms` | `omsflow.lu` | | Loyalty+ | `loyalty` | `rewardflow.lu` | Start Caddy: ```bash sudo systemctl restart caddy ``` Caddy automatically provisions Let's Encrypt SSL certificates for all configured domains. Verify: ```bash curl -I https://wizard.lu curl -I https://api.wizard.lu/health curl -I https://git.wizard.lu ``` After Caddy is working, remove the temporary firewall rules: ```bash sudo ufw delete allow 3000/tcp sudo ufw delete allow 8001/tcp ``` Update Gitea's configuration to use its new domain. In `~/gitea/docker-compose.yml`, change: ```yaml - GITEA__server__ROOT_URL=https://git.wizard.lu/ - GITEA__server__SSH_DOMAIN=git.wizard.lu - GITEA__server__DOMAIN=git.wizard.lu ``` Then restart Gitea: ```bash cd ~/gitea && docker compose up -d gitea ``` ### Future: Multi-Tenant Store Routing Stores on each platform use two routing modes: - **Standard (subdomain)**: `acme.omsflow.lu` — included in the base subscription - **Premium (custom domain)**: `acme.lu` — available with premium subscription tiers Both modes are handled by the `StoreContextMiddleware` which reads the `Host` header, so Caddy just needs to forward requests and preserve the header. #### Wildcard Subdomains (for store subdomains) When stores start using subdomains like `acme.omsflow.lu`, add wildcard blocks: ```caddy *.omsflow.lu { reverse_proxy localhost:8001 } *.rewardflow.lu { reverse_proxy localhost:8001 } *.wizard.lu { reverse_proxy localhost:8001 } ``` !!! warning "Wildcard SSL requires DNS challenge" Let's Encrypt cannot issue wildcard certificates via HTTP challenge. Wildcard certs require a **DNS challenge**, which means installing a Caddy DNS provider plugin (e.g. `caddy-dns/cloudflare`) and configuring API credentials for your DNS provider. See [Caddy DNS challenge docs](https://caddyserver.com/docs/automatic-https#dns-challenge). #### Custom Store Domains (for premium stores) When premium stores bring their own domains (e.g. `acme.lu`), use Caddy's **on-demand TLS**: ```caddy https:// { tls { on_demand } reverse_proxy localhost:8001 } ``` On-demand TLS auto-provisions SSL certificates when a new domain connects. Add an `ask` endpoint to validate that the domain is registered in the `store_domains` table, preventing abuse: ```caddy tls { on_demand ask http://localhost:8001/api/v1/internal/verify-domain } ``` !!! note "Not needed yet" Wildcard subdomains and custom domains are future work. The current Caddyfile handles all platform root domains and service subdomains. ## Step 15: Gitea Actions Runner !!! warning "ARM64 architecture" This server is ARM64. Download the `arm64` binary, not `amd64`. Download and install: ```bash mkdir -p ~/gitea-runner && cd ~/gitea-runner # Download act_runner v0.2.13 (ARM64) wget https://gitea.com/gitea/act_runner/releases/download/v0.2.13/act_runner-0.2.13-linux-arm64 chmod +x act_runner-0.2.13-linux-arm64 ln -s act_runner-0.2.13-linux-arm64 act_runner ``` Register the runner (get token from **Site Administration > Actions > Runners > Create new Runner**): ```bash ./act_runner register \ --instance https://git.wizard.lu \ --token YOUR_RUNNER_TOKEN ``` Accept the default runner name and labels when prompted. Create a systemd service for persistent operation: ```bash sudo nano /etc/systemd/system/gitea-runner.service ``` ```ini [Unit] Description=Gitea Actions Runner After=network.target [Service] Type=simple User=samir WorkingDirectory=/home/samir/gitea-runner ExecStart=/home/samir/gitea-runner/act_runner daemon Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` Enable and start: ```bash sudo systemctl daemon-reload sudo systemctl enable --now gitea-runner sudo systemctl status gitea-runner ``` Verify the runner shows as **Online** in Gitea: **Site Administration > Actions > Runners**. ## Step 16: Continuous Deployment Automate deployment on every successful push to master. The Gitea Actions runner and the app both run on the same server, so the deploy job SSHes from the CI Docker container to `172.17.0.1` (Docker bridge gateway — see note in 16.2). ``` push to master ├── ruff ──────┐ ├── pytest ────┤ └── validate ──┤ └── deploy (SSH → scripts/deploy.sh) ├── git stash / pull / pop ├── docker compose up -d --build ├── alembic upgrade heads └── health check (retries) ``` ### 16.1 Generate Deploy SSH Key (on server) ```bash ssh-keygen -t ed25519 -C "gitea-deploy@wizard.lu" -f ~/.ssh/deploy_ed25519 -N "" cat ~/.ssh/deploy_ed25519.pub >> ~/.ssh/authorized_keys ``` ### 16.2 Add Gitea Secrets In **Repository Settings > Actions > Secrets**, add: | Secret | Value | |---|---| | `DEPLOY_SSH_KEY` | Contents of `~/.ssh/deploy_ed25519` (private key) | | `DEPLOY_HOST` | `172.17.0.1` (Docker bridge gateway — **not** `127.0.0.1`) | | `DEPLOY_USER` | `samir` | | `DEPLOY_PATH` | `/home/samir/apps/orion` | !!! important "Why `172.17.0.1` and not `127.0.0.1`?" CI jobs run inside Docker containers where `127.0.0.1` is the container, not the host. `172.17.0.1` is the Docker bridge gateway that routes to the host. Ensure the firewall allows SSH from the Docker bridge network: `sudo ufw allow from 172.17.0.0/16 to any port 22`. When Gitea and Orion are on separate servers, replace with the Orion server's IP. ### 16.3 Deploy Script The deploy script lives at `scripts/deploy.sh` in the repository. It: 1. Stashes local changes (preserves `.env`) 2. Pulls latest code (`--ff-only`) 3. Pops stash to restore local changes 4. Rebuilds and restarts Docker containers (`docker compose --profile full up -d --build`) 5. Runs database migrations (`alembic upgrade heads`) 6. Health checks `http://localhost:8001/health` with 12 retries (60s total) Exit codes: `0` success, `1` git pull failed, `2` docker compose failed, `3` migration failed, `4` health check failed. ### 16.4 CI Workflow The deploy job in `.gitea/workflows/ci.yml` runs only on master push, after `ruff`, `pytest`, and `validate` pass: ```yaml deploy: runs-on: ubuntu-latest if: github.event_name == 'push' && github.ref == 'refs/heads/master' needs: [ruff, pytest, validate] steps: - name: Deploy to production uses: appleboy/ssh-action@v1 with: host: ${{ secrets.DEPLOY_HOST }} username: ${{ secrets.DEPLOY_USER }} key: ${{ secrets.DEPLOY_SSH_KEY }} port: 22 command_timeout: 10m script: cd ${{ secrets.DEPLOY_PATH }} && bash scripts/deploy.sh ``` ### 16.5 Manual Fallback If CI is down, deploy manually: ```bash cd ~/apps/orion && bash scripts/deploy.sh ``` ### 16.6 Verify ```bash # All app containers running cd ~/apps/orion && docker compose --profile full ps # API health (via Caddy with SSL) curl https://api.wizard.lu/health # Main platform curl -I https://wizard.lu # Gitea curl -I https://git.wizard.lu # Flower curl -I https://flower.wizard.lu # Gitea runner status sudo systemctl status gitea-runner ``` ## Step 17: Backups Three layers of backup protection: Hetzner server snapshots, automated PostgreSQL dumps with local rotation, and offsite sync to Cloudflare R2. ### 17.1 Enable Hetzner Server Backups In the Hetzner Cloud Console: 1. Go to **Servers** > select your server > **Backups** 2. Click **Enable backups** (~20% of server cost, ~1.20 EUR/mo for CAX11) 3. Hetzner takes automatic weekly snapshots with 7-day retention This covers full-disk recovery (OS, Docker volumes, config files) but is coarse-grained. Database-level backups (below) give finer restore granularity. ### 17.2 Cloudflare R2 Setup (Offsite Backup Storage) R2 provides S3-compatible object storage with a generous free tier (10 GB storage, 10 million reads/month). **Create Cloudflare account and R2 bucket:** 1. Sign up at [cloudflare.com](https://dash.cloudflare.com/sign-up) (free account) 2. Go to **R2 Object Storage** > **Create bucket** 3. Name: `orion-backups`, region: automatic 4. Go to **R2** > **Manage R2 API Tokens** > **Create API token** - Permissions: Object Read & Write - Specify bucket: `orion-backups` 5. Note the **Account ID**, **Access Key ID**, and **Secret Access Key** **Install and configure AWS CLI on the server:** ```bash # awscli is not available via apt on Ubuntu 24.04; install via pip sudo apt install -y python3-pip pip3 install awscli --break-system-packages # Add ~/.local/bin to PATH (pip installs binaries there) echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc source ~/.bashrc aws configure --profile r2 # Access Key ID: # Secret Access Key: # Default region name: auto # Default output format: json ``` **Test connectivity:** ```bash aws s3 ls --endpoint-url https://.r2.cloudflarestorage.com --profile r2 ``` Add the R2 backup bucket name to your production `.env`: ```bash R2_BACKUP_BUCKET=orion-backups ``` ### 17.3 Backup Script The backup script at `scripts/backup.sh` handles: - `pg_dump` of Orion DB (via `docker exec orion-db-1`) - `pg_dump` of Gitea DB (via `docker exec gitea-db`) - On Sundays: copies daily backup to `weekly/` subdirectory - Rotation: keeps 7 daily, 4 weekly backups - Optional `--upload` flag: syncs to Cloudflare R2 ```bash # Create backup directories mkdir -p ~/backups/{orion,gitea}/{daily,weekly} # Run a manual backup bash ~/apps/orion/scripts/backup.sh # Run with R2 upload bash ~/apps/orion/scripts/backup.sh --upload # Verify backup integrity ls -lh ~/backups/orion/daily/ gunzip -t ~/backups/orion/daily/*.sql.gz ``` ### 17.4 Systemd Timer (Daily at 03:00) Create the service unit: ```bash sudo nano /etc/systemd/system/orion-backup.service ``` ```ini [Unit] Description=Orion database backup After=docker.service [Service] Type=oneshot User=samir Environment="PATH=/home/samir/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ExecStart=/usr/bin/bash /home/samir/apps/orion/scripts/backup.sh --upload StandardOutput=journal StandardError=journal ``` Create the timer: ```bash sudo nano /etc/systemd/system/orion-backup.timer ``` ```ini [Unit] Description=Run Orion backup daily at 03:00 [Timer] OnCalendar=*-*-* 03:00:00 Persistent=true [Install] WantedBy=timers.target ``` Enable and start: ```bash sudo systemctl daemon-reload sudo systemctl enable --now orion-backup.timer # Verify timer is active systemctl list-timers orion-backup.timer # Test manually sudo systemctl start orion-backup.service journalctl -u orion-backup.service --no-pager ``` ### 17.5 Restore Procedure The restore script at `scripts/restore.sh` handles the full restore cycle: ```bash # Restore Orion database bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/orion_20260214_030000.sql.gz # Restore Gitea database bash ~/apps/orion/scripts/restore.sh gitea ~/backups/gitea/daily/gitea_20260214_030000.sql.gz ``` The script will: 1. Stop app containers (keep DB running) 2. Drop and recreate the database 3. Restore from the `.sql.gz` backup 4. Run Alembic migrations (Orion only) 5. Restart all containers To restore from R2 (if local backups are lost): ```bash # Download from R2 aws s3 sync s3://orion-backups/ ~/backups/ \ --endpoint-url https://.r2.cloudflarestorage.com \ --profile r2 # Then restore as usual bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/.sql.gz ``` ### 17.6 Verification ```bash # Backup files exist ls -lh ~/backups/orion/daily/ ls -lh ~/backups/gitea/daily/ # Backup integrity gunzip -t ~/backups/orion/daily/*.sql.gz # Timer is scheduled systemctl list-timers orion-backup.timer # R2 sync (if configured) aws s3 ls s3://orion-backups/ --endpoint-url https://.r2.cloudflarestorage.com --profile r2 --recursive ``` --- ## Step 18: Monitoring & Observability Prometheus + Grafana monitoring stack with host and container metrics. ### Architecture ``` ┌──────────────┐ scrape ┌─────────────────┐ │ Prometheus │◄────────────────│ Orion API │ /metrics │ :9090 │◄────────────────│ node-exporter │ :9100 │ │◄────────────────│ cAdvisor │ :8080 └──────┬───────┘ └─────────────────┘ │ query ┌──────▼───────┐ │ Grafana │──── https://grafana.wizard.lu │ :3001 │ └──────────────┘ ``` ### Resource Budget (4 GB Server) | Container | RAM Limit | Purpose | |---|---|---| | prometheus | 256 MB | Metrics storage (15-day retention, 2 GB max) | | grafana | 192 MB | Dashboards (SQLite backend) | | node-exporter | 64 MB | Host CPU/RAM/disk metrics | | cadvisor | 128 MB | Per-container resource metrics | | **Total new** | **640 MB** | | Existing stack ~1.8 GB + 640 MB new = ~2.4 GB. Leaves ~1.6 GB for OS. If too tight, live-upgrade to CAX21 (8 GB/80 GB, ~7.50 EUR/mo) via **Cloud Console > Server > Rescale** (~2 min restart). ### 18.1 DNS Record Add A and AAAA records for `grafana.wizard.lu`: | Type | Name | Value | TTL | |---|---|---|---| | A | `grafana` | `91.99.65.229` | 300 | | AAAA | `grafana` | `2a01:4f8:1c1a:b39c::1` | 300 | ### 18.2 Caddy Configuration Add to `/etc/caddy/Caddyfile`: ```caddy grafana.wizard.lu { reverse_proxy localhost:3001 } ``` Reload Caddy: ```bash sudo systemctl reload caddy ``` ### 18.3 Production Environment Add to `~/apps/orion/.env`: ```bash ENABLE_METRICS=true GRAFANA_URL=https://grafana.wizard.lu GRAFANA_ADMIN_USER=admin GRAFANA_ADMIN_PASSWORD= ``` ### 18.4 Deploy ```bash cd ~/apps/orion docker compose --profile full up -d --build ``` Verify all containers are running: ```bash docker compose --profile full ps docker stats --no-stream ``` ### 18.5 Grafana First Login 1. Open `https://grafana.wizard.lu` 2. Login with `admin` / `` 3. Change the default password when prompted **Import community dashboards:** - **Node Exporter Full**: Dashboards > Import > ID `1860` > Select Prometheus datasource - **Docker / cAdvisor**: Dashboards > Import > ID `193` > Select Prometheus datasource ### 18.6 Verification ```bash # Prometheus metrics from Orion API curl -s https://api.wizard.lu/metrics | head -5 # Health endpoints curl -s https://api.wizard.lu/health/live curl -s https://api.wizard.lu/health/ready # Prometheus targets (all should be "up") curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep health # Grafana accessible curl -I https://grafana.wizard.lu # RAM usage within limits docker stats --no-stream ``` --- ## Step 19: Prometheus Alerting Alert rules and Alertmanager for email notifications when things go wrong. ### 19.1 Architecture ``` ┌──────────────┐ evaluates ┌───────────────────┐ │ Prometheus │─────────────►│ alert.rules.yml │ │ :9090 │ │ (host, container, │ │ │ │ API, Celery) │ └──────┬───────┘ └───────────────────┘ │ fires alerts ┌──────▼───────┐ │ Alertmanager │──── email ──► admin@wizard.lu │ :9093 │ └──────────────┘ ``` ### 19.2 Alert Rules Alert rules are defined in `monitoring/prometheus/alert.rules.yml`: | Group | Alert | Condition | Severity | |---|---|---|---| | Host | HostHighCpuUsage | CPU >80% for 5m | warning | | Host | HostHighMemoryUsage | Memory >85% for 5m | warning | | Host | HostHighDiskUsage | Disk >80% | warning | | Host | HostDiskFullPrediction | Disk full within 4h | critical | | Containers | ContainerHighRestartCount | >3 restarts/hour | critical | | Containers | ContainerOomKilled | Any OOM kill | critical | | Containers | ContainerHighCpu | >80% CPU for 5m | warning | | API | ApiHighErrorRate | 5xx rate >1% for 5m | critical | | API | ApiHighLatency | P95 >2s for 5m | warning | | API | ApiHealthCheckDown | Health check failing 1m | critical | | Celery | CeleryQueueBacklog | >100 tasks for 10m | warning | | Prometheus | TargetDown | Any target down 2m | critical | ### 19.3 Alertmanager Configuration Alertmanager config is in `monitoring/alertmanager/alertmanager.yml`: - **Critical alerts**: repeat every 1 hour - **Warning alerts**: repeat every 4 hours - Groups by `alertname` + `severity`, 30s wait, 5m interval - Inhibition: warnings suppressed when critical is already firing for same alert !!! warning "Configure SMTP before deploying" Edit `monitoring/alertmanager/alertmanager.yml` and fill in the SMTP settings (host, username, password, recipient email). Alertmanager will start but won't send emails until SMTP is configured. ### 19.4 Docker Compose Changes The `docker-compose.yml` includes: - `alertmanager` service: `prom/alertmanager:latest`, profiles: [full], port 127.0.0.1:9093, mem_limit: 32m - `prometheus` volumes: mounts `alert.rules.yml` as read-only - `prometheus.yml`: `alerting:` section pointing to alertmanager:9093, `rule_files:` for alert rules, new scrape job for alertmanager ### 19.5 Alertmanager SMTP Setup (SendGrid) Alertmanager needs SMTP to send email notifications. SendGrid handles both transactional emails and marketing campaigns under one account — set it up once and use it for everything. **Free trial**: 100 emails/day for 60 days. Covers alerting + transactional emails through launch. After 60 days, upgrade to a paid plan (Essentials starts at ~$20/mo for 50K emails/mo). **1. Create SendGrid account:** 1. Sign up at [sendgrid.com](https://sendgrid.com/) (free plan) 2. Complete **Sender Authentication**: go to **Settings** > **Sender Authentication** > **Domain Authentication** 3. Authenticate your sending domain (`wizard.lu`) — SendGrid provides CNAME records to add to DNS 4. Create an API key: **Settings** > **API Keys** > **Create API Key** (Full Access) 5. Save the API key — you'll need it for both Alertmanager and the app's `EMAIL_PROVIDER` !!! info "SendGrid SMTP credentials" SendGrid uses a single credential pattern for SMTP: - **Server**: `smtp.sendgrid.net` - **Port**: `587` (STARTTLS) - **Username**: literally the string `apikey` (not your email) - **Password**: your API key (starts with `SG.`) **2. Update alertmanager config on the server:** ```bash nano ~/apps/orion/monitoring/alertmanager/alertmanager.yml ``` Replace the SMTP placeholders: ```yaml global: smtp_smarthost: 'smtp.sendgrid.net:587' smtp_from: 'alerts@wizard.lu' smtp_auth_username: 'apikey' smtp_auth_password: 'SG.your-sendgrid-api-key-here' smtp_require_tls: true ``` Update the `to:` addresses in both receivers to your actual email. **3. Update app email config** in `~/apps/orion/.env`: ```bash # SendGrid for all application emails (password reset, order confirmation, etc.) EMAIL_PROVIDER=sendgrid SENDGRID_API_KEY=SG.your-sendgrid-api-key-here EMAIL_FROM_ADDRESS=noreply@wizard.lu EMAIL_FROM_NAME=Orion ``` **4. Restart services:** ```bash cd ~/apps/orion docker compose --profile full restart alertmanager api curl -s http://localhost:9093/-/healthy # Should return OK ``` **5. Test by triggering a test alert (optional):** ```bash # Send a test alert to alertmanager (v2 API) curl -X POST http://localhost:9093/api/v2/alerts -H "Content-Type: application/json" -d '[{"labels":{"alertname":"TestAlert","severity":"warning"},"annotations":{"summary":"Test alert - please ignore"},"startsAt":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","endsAt":"'$(date -u -d '+5 minutes' +%Y-%m-%dT%H:%M:%SZ)'"}]' ``` Check your inbox within 30 seconds. Then verify the alert resolved: ```bash curl -s http://localhost:9093/api/v2/alerts | python3 -m json.tool ``` !!! tip "Alternative SMTP providers" Any SMTP service works if you prefer a different provider: - **Amazon SES**: `email-smtp.eu-west-1.amazonaws.com:587` — cheapest at scale ($0.10/1K emails) - **Mailgun**: `smtp.mailgun.org:587` — transactional only, no built-in marketing - **Gmail**: `smtp.gmail.com:587` with an App Password (not recommended for production) ### 19.6 Deploy ```bash cd ~/apps/orion docker compose --profile full up -d ``` ### 19.7 Verification ```bash # Alertmanager healthy curl -s http://localhost:9093/-/healthy # Alert rules loaded curl -s http://localhost:9090/api/v1/rules | python3 -m json.tool | head -20 # Active alerts (should be empty if all is well) curl -s http://localhost:9090/api/v1/alerts | python3 -m json.tool # Alertmanager target in Prometheus curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep alertmanager ``` ### 19.8 Multi-Domain Email Strategy SendGrid supports multiple authenticated domains on a single account. This enables sending emails from client domains (e.g., `orders@acme.lu`) without clients needing their own SendGrid plan. **Current setup:** - `wizard.lu` authenticated — used for platform emails (`alerts@`, `noreply@`) **Future: client domain onboarding** When a client wants emails sent from their domain (e.g., `acme.lu`): 1. In SendGrid: **Settings** > **Sender Authentication** > **Authenticate a Domain** → add `acme.lu` 2. SendGrid provides CNAME + TXT records 3. Client adds the DNS records to their domain 4. Verify in SendGrid This is the professional approach — emails come from the client's domain with proper SPF/DKIM, not from `wizard.lu`. Build an admin flow to automate this as part of store onboarding. !!! note "Volume planning" The free trial allows 100 emails/day. Once clients start sending marketing campaigns, upgrade to a paid SendGrid plan based on total volume across all client domains. --- ## Step 20: Security Hardening Docker network segmentation, fail2ban configuration, and automatic security updates. ### 20.1 Docker Network Segmentation Three isolated networks replace the default flat network: | Network | Purpose | Services | |---|---|---| | `orion_frontend` | External-facing | api | | `orion_backend` | Database + workers | db, redis, api, celery-worker, celery-beat, flower | | `orion_monitoring` | Metrics collection | api, prometheus, grafana, node-exporter, cadvisor, alertmanager | The `api` service is on all three networks because it needs to: - Serve HTTP traffic (frontend) - Connect to database and Redis (backend) - Expose `/metrics` to Prometheus (monitoring) This is already configured in the updated `docker-compose.yml`. After deploying, verify: ```bash docker network ls | grep orion # Expected: orion_frontend, orion_backend, orion_monitoring ``` ### 20.2 fail2ban Configuration fail2ban is already installed (Step 3) but needs jail configuration. All commands below are copy-pasteable. **SSH jail** — bans IPs after 3 failed SSH attempts for 24 hours: ```bash sudo tee /etc/fail2ban/jail.local << 'EOF' [sshd] enabled = true port = ssh filter = sshd logpath = /var/log/auth.log maxretry = 3 bantime = 86400 findtime = 600 EOF ``` **Caddy access logging** — fail2ban needs a log file to watch. Add a global `log` directive to your Caddyfile: ```bash sudo nano /etc/caddy/Caddyfile ``` Add this block at the **very top** of the Caddyfile, before any site blocks: ```caddy { log { output file /var/log/caddy/access.log { roll_size 100MiB roll_keep 5 } format json } } ``` Create the log directory and restart Caddy: ```bash sudo mkdir -p /var/log/caddy sudo chown caddy:caddy /var/log/caddy sudo systemctl restart caddy sudo systemctl status caddy # Verify logging works (make a request, then check) curl -s https://wizard.lu > /dev/null sudo tail -1 /var/log/caddy/access.log | python3 -m json.tool | head -5 ``` **Caddy auth filter** — matches 401/403 responses in Caddy's JSON logs: ```bash sudo tee /etc/fail2ban/filter.d/caddy-auth.conf << 'EOF' [Definition] failregex = ^.*"remote_ip":"".*"status":(401|403).*$ ignoreregex = EOF ``` **Caddy jail** — bans IPs after 10 failed auth attempts for 1 hour: ```bash sudo tee /etc/fail2ban/jail.d/caddy.conf << 'EOF' [caddy-auth] enabled = true port = http,https filter = caddy-auth logpath = /var/log/caddy/access.log maxretry = 10 bantime = 3600 findtime = 600 EOF ``` **Restart and verify:** ```bash sudo systemctl restart fail2ban # Both jails should be listed sudo fail2ban-client status # SSH jail details sudo fail2ban-client status sshd # Caddy jail details (will show 0 bans initially) sudo fail2ban-client status caddy-auth ``` ### 20.3 Unattended Security Upgrades Install and enable automatic security updates: ```bash sudo apt install -y unattended-upgrades apt-listchanges sudo dpkg-reconfigure -plow unattended-upgrades ``` This enables security-only updates with automatic reboot disabled (safe default). Verify: ```bash sudo unattended-upgrades --dry-run 2>&1 | head -10 cat /etc/apt/apt.conf.d/20auto-upgrades ``` Expected `20auto-upgrades` content: ``` APT::Periodic::Update-Package-Lists "1"; APT::Periodic::Unattended-Upgrade "1"; ``` ### 20.4 Clean Up Legacy Docker Network After deploying with network segmentation, the old default network may remain: ```bash # Check if orion_default still exists docker network ls | grep orion_default # Remove it (safe — no containers should be using it) docker network rm orion_default 2>/dev/null || echo "Already removed" ``` ### 20.5 Verification ```bash # fail2ban jails active (should show sshd and caddy-auth) sudo fail2ban-client status # SSH jail details sudo fail2ban-client status sshd # Docker networks (should show 3: frontend, backend, monitoring) docker network ls | grep orion # Unattended upgrades configured sudo unattended-upgrades --dry-run 2>&1 | head # Caddy access log being written sudo tail -1 /var/log/caddy/access.log ``` --- ## Step 21: Cloudflare Domain Proxy Move DNS to Cloudflare for WAF, DDoS protection, and CDN. This step involves DNS propagation — do it during a maintenance window. !!! warning "DNS changes affect all services" Moving nameservers involves propagation delay (minutes to hours). Plan for brief interruption. Do this step last, after Steps 19–20 are verified. ### 21.1 Pre-Migration: Record Email DNS Before changing nameservers, document all email-related DNS records: ```bash # Run for each domain (wizard.lu, omsflow.lu, rewardflow.lu) dig wizard.lu MX +short dig wizard.lu TXT +short dig _dmarc.wizard.lu TXT +short dig default._domainkey.wizard.lu TXT +short # DKIM selector may vary ``` Save the output — you'll need to verify these exist after Cloudflare import. ### 21.2 Add Domains to Cloudflare 1. Log in to [Cloudflare Dashboard](https://dash.cloudflare.com) 2. **Add a site** for each domain: `wizard.lu`, `omsflow.lu`, `rewardflow.lu` 3. Select **Free** plan → choose **Full setup** (nameserver-based, not CNAME/partial) 4. Block AI crawlers on all pages 5. Cloudflare auto-scans and imports existing DNS records — **review carefully**: - Delete any stale CNAME records (leftover from partial setup) - Add missing A/AAAA records manually (Cloudflare scan may miss some) - Verify MX/SPF/DKIM/DMARC records are present before changing NS - Email records (MX, TXT) must stay as **DNS-only (grey cloud)** — never proxy MX records 6. Set proxy status: - **Orange cloud (proxied)**: `@`, `www`, `api`, `flower`, `grafana` — gets WAF + CDN - **Grey cloud (DNS only)**: `git` — needs direct access for SSH on port 2222 ### 21.3 Change Nameservers At your domain registrar (Netim), update NS records to Cloudflare's assigned nameservers. Cloudflare shows the exact pair during activation (e.g., `name1.ns.cloudflare.com`, `name2.ns.cloudflare.com`). Disable DNSSEC at the registrar before switching NS — re-enable later via Cloudflare. ### 21.4 Generate Origin Certificates Cloudflare Origin Certificates (free, 15-year validity) avoid ACME challenge issues when traffic is proxied: 1. In Cloudflare: **SSL/TLS** > **Origin Server** > **Create Certificate** 2. Generate for each domain with **specific subdomains** (not wildcards): - `wizard.lu`: `wizard.lu, api.wizard.lu, www.wizard.lu, flower.wizard.lu, grafana.wizard.lu` - `omsflow.lu`: `omsflow.lu, www.omsflow.lu` - `rewardflow.lu`: `rewardflow.lu, www.rewardflow.lu` 3. Download the certificate and private key (private key is shown only once) !!! warning "Do NOT use wildcard origin certs for wizard.lu" A `*.wizard.lu` wildcard cert will match `git.wizard.lu`, which needs a Let's Encrypt cert (DNS-only, not proxied through Cloudflare). Use specific subdomains instead. Install on the server: ```bash sudo mkdir -p /etc/caddy/certs/{wizard.lu,omsflow.lu,rewardflow.lu} # For each domain, create cert.pem and key.pem: sudo nano /etc/caddy/certs/wizard.lu/cert.pem # paste certificate sudo nano /etc/caddy/certs/wizard.lu/key.pem # paste private key # Repeat for omsflow.lu and rewardflow.lu sudo chown -R caddy:caddy /etc/caddy/certs/ sudo chmod 600 /etc/caddy/certs/*/key.pem ``` ### 21.5 Update Caddyfile For Cloudflare-proxied domains, use explicit TLS with origin certs. Keep auto-HTTPS for `git.wizard.lu` (DNS-only, grey cloud): ```caddy { log { output file /var/log/caddy/access.log { roll_size 100MiB roll_keep 5 } format json } } # ─── Platform 1: Main (wizard.lu) ─────────────────────────── wizard.lu { tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem reverse_proxy localhost:8001 } www.wizard.lu { tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem redir https://wizard.lu{uri} permanent } # ─── Platform 2: OMS (omsflow.lu) ─────────────────────────── omsflow.lu { tls /etc/caddy/certs/omsflow.lu/cert.pem /etc/caddy/certs/omsflow.lu/key.pem reverse_proxy localhost:8001 } www.omsflow.lu { tls /etc/caddy/certs/omsflow.lu/cert.pem /etc/caddy/certs/omsflow.lu/key.pem redir https://omsflow.lu{uri} permanent } # ─── Platform 3: Loyalty+ (rewardflow.lu) ────────────────── rewardflow.lu { tls /etc/caddy/certs/rewardflow.lu/cert.pem /etc/caddy/certs/rewardflow.lu/key.pem reverse_proxy localhost:8001 } www.rewardflow.lu { tls /etc/caddy/certs/rewardflow.lu/cert.pem /etc/caddy/certs/rewardflow.lu/key.pem redir https://rewardflow.lu{uri} permanent } # ─── Services (wizard.lu origin cert) ─────────────────────── api.wizard.lu { tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem reverse_proxy localhost:8001 } flower.wizard.lu { tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem reverse_proxy localhost:5555 } grafana.wizard.lu { tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem reverse_proxy localhost:3001 } # ─── DNS-only domain (Let's Encrypt, not proxied by Cloudflare) ─ git.wizard.lu { tls { issuer acme } reverse_proxy localhost:3000 } ``` Restart Caddy: ```bash sudo systemctl restart caddy sudo systemctl status caddy ``` ### 21.6 Cloudflare Settings (per domain) Configure these in the Cloudflare dashboard for each domain (`wizard.lu`, `omsflow.lu`, `rewardflow.lu`): | Setting | Location | Value | |---|---|---| | SSL mode | SSL/TLS > Overview | Full (Strict) | | Always Use HTTPS | SSL/TLS > Edge Certificates | On | | Bot Fight Mode | Security > Settings | On | | DDoS protection | Security > Security rules > DDoS | Active (enabled by default) | | AI crawlers | Security (during setup) | Blocked on all pages | **Rate limiting rule** (Security > Security rules > Create rule): | Field | Value | |---|---| | Match | URI Path contains `/api/` | | Characteristics | IP | | Rate | 100 requests per 10 seconds | | Action | Block | | Duration | 10 seconds | ### 21.7 Production Environment Add to `~/apps/orion/.env`: ```bash CLOUDFLARE_ENABLED=true ``` ### 21.8 Verification ```bash # CF proxy active (look for cf-ray header) curl -I https://wizard.lu | grep cf-ray # DNS resolves to Cloudflare IPs (not 91.99.65.229) dig wizard.lu +short # All domains responding curl -I https://omsflow.lu curl -I https://rewardflow.lu curl -I https://api.wizard.lu/health # git.wizard.lu still on Let's Encrypt (not CF) curl -I https://git.wizard.lu ``` !!! info "`git.wizard.lu` stays DNS-only" The Gitea instance uses SSH on port 2222 for git operations. Cloudflare proxy only supports HTTP/HTTPS, so `git.wizard.lu` must remain as DNS-only (grey cloud) with Let's Encrypt auto-SSL via Caddy. --- ## Step 22: Incident Response Runbook A comprehensive incident response runbook is available at [Incident Response](incident-response.md). It includes: - **Severity levels**: SEV-1 (platform down, <15min), SEV-2 (feature broken, <1h), SEV-3 (minor, <4h) - **Quick diagnosis decision tree**: SSH → Docker → containers → Caddy → DNS - **8 runbooks** with copy-paste commands for common incidents - **Post-incident report template** - **Monitoring URLs** quick reference --- ## Step 23: Environment Reference A complete environment variables reference is available at [Environment Variables](environment.md). It documents all 55+ configuration variables from `app/core/config.py`, grouped by category with defaults and production requirements. --- ## Step 24: Documentation Updates This document has been updated with Steps 19–24. Additional documentation changes: - `docs/deployment/incident-response.md` — new incident response runbook - `docs/deployment/environment.md` — complete env var reference (was empty) - `docs/deployment/launch-readiness.md` — updated with Feb 2026 infrastructure status - `mkdocs.yml` — incident-response.md added to nav --- ## Step 25: Google Wallet Integration Enable loyalty card passes in Google Wallet so customers can add their loyalty card to their Android phone. ### Prerequisites - Google account (personal Gmail is fine) - Loyalty module deployed and working ### 25.1 Google Pay & Wallet Console Register as a Google Wallet Issuer: 1. Go to [pay.google.com/business/console](https://pay.google.com/business/console) 2. Enter your business name (e.g., "Letzshop" or your company name) — this is for Google's review, customers don't see it on passes 3. Note your **Issuer ID** from the Google Wallet API section !!! info "Issuer ID" The Issuer ID is a long numeric string (e.g., `3388000000023089598`). You'll find it under Google Wallet API → Manage in the Pay & Wallet Console. ### 25.2 Google Cloud Project 1. Go to [console.cloud.google.com](https://console.cloud.google.com) 2. Create a new project (e.g., "Orion") 3. Enable the **Google Wallet API**: - Navigate to "APIs & Services" → "Library" - Search for "Google Wallet API" and enable it ### 25.3 Service Account Create a service account for API access: 1. Go to "APIs & Services" → "Credentials" → "Create Credentials" 2. Select **Google Wallet API** as the API 3. Select **Application data** (not user data — your backend calls the API directly) 4. Name the service account (e.g., `wallet-service`) 5. Click "Done" Download the JSON key: 1. Go to "IAM & Admin" → "Service Accounts" 2. Click on the service account you created 3. Go to **Keys** tab → **Add Key** → **Create new key** → **JSON** 4. Save the downloaded `.json` file securely ### 25.4 Link Service Account to Issuer 1. Go back to [pay.google.com/business/console](https://pay.google.com/business/console) 2. In the **left sidebar**, click **Users** (not inside the Wallet API section) 3. Invite the service account email (e.g., `wallet-service@orion-488322.iam.gserviceaccount.com`) 4. Assign **Admin** role 5. Verify it appears in the users list !!! warning "Common mistake" The "Users" link is in the **left sidebar** of the Pay & Wallet Console, not inside the "Google Wallet API" → "Manage" section. The Manage page has "Setup test accounts" which is a different feature. ### 25.5 Deploy to Server Upload the service account JSON key to the Hetzner server: ```bash # From your local machine scp /path/to/orion-488322-xxxxx.json samir@91.99.65.229:~/apps/orion/google-wallet-sa.json ``` Add the environment variables to the production `.env`: ```bash ssh samir@91.99.65.229 cd ~/apps/orion nano .env ``` Add: ```bash # Google Wallet (Loyalty Module) LOYALTY_GOOGLE_ISSUER_ID=3388000000023089598 LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON=/app/google-wallet-sa.json ``` !!! note "Docker path" The path must be relative to the Docker container's filesystem. If the file is in `~/apps/orion/`, it maps to `/app/` inside the container (check your `docker-compose.yml` volumes). Mount the JSON file in `docker-compose.yml` if not already covered by the app volume: ```yaml services: api: volumes: - ./google-wallet-sa.json:/app/google-wallet-sa.json:ro ``` Restart the application: ```bash docker compose --profile full up -d --build ``` ### 25.6 Platform-Level Configuration Google Wallet is a **platform-wide setting** — all merchants on the platform share the same Issuer ID and service account. Merchants don't need to configure anything; wallet integration activates automatically when the env vars are set. The two required env vars: ```bash # In production .env LOYALTY_GOOGLE_ISSUER_ID=3388000000023089598 LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON=/app/google-wallet-sa.json ``` When both are set, every loyalty program on the platform automatically gets Google Wallet support: enrollment creates wallet passes, stamp/points operations sync to passes, and the storefront shows "Add to Google Wallet" buttons. ### 25.7 Verify Configuration Check the API health and wallet service status: ```bash # Check the app logs for wallet service initialization docker compose --profile full logs api | grep -i "wallet\|loyalty" # Test via API — enroll a customer and check the response for wallet URLs curl -s https://api.wizard.lu/health | python3 -m json.tool ``` ### 25.8 Testing Google Wallet Passes Google provides a **demo mode** — passes work in test without full production approval: 1. Console admins and developers (your Google account) can always test passes 2. For additional testers, add their Gmail addresses in Pay & Wallet Console → Google Wallet API → Manage → **Setup test accounts** 3. Use `walletobjects.sandbox` scope for initial testing (the code uses `wallet_object.issuer` which covers both) **End-to-end test flow:** 1. Create a loyalty program via the store panel and set the Google Wallet Issuer ID in Settings → Digital Wallet 2. Enroll a customer (via store or storefront self-enrollment) - The system automatically creates a Google Wallet `LoyaltyClass` (for the program) and `LoyaltyObject` (for the card) 3. Open the storefront loyalty dashboard — the "Add to Google Wallet" button appears 4. Click the button (or open the URL on an Android device) — the pass is added to Google Wallet 5. Add a stamp or points — the pass in Google Wallet auto-updates (no push needed, Google syncs) ### 25.9 Local Development Setup You can test the full Google Wallet integration from your local machine: ```bash # In your local .env LOYALTY_GOOGLE_ISSUER_ID=3388000000023089598 LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON=/path/to/orion-488322-xxxxx.json ``` The `GoogleWalletService` calls Google's REST API directly over HTTPS — no special network configuration needed. The same service account JSON works on both local and server environments. **Local testing checklist:** - [x] Service account JSON downloaded and path set in env - [x] `LOYALTY_GOOGLE_ISSUER_ID` set in env - [ ] Start the app locally: `python3 -m uvicorn main:app --reload` - [ ] Enroll a customer → check logs for "Created Google Wallet class" and "Created Google Wallet object" - [ ] Open storefront dashboard → "Add to Google Wallet" button should appear - [ ] Open the wallet URL on Android → pass added to Google Wallet - [ ] Add stamps → check logs for "Updated Google Wallet object", verify pass updates ### 25.10 How It Works (Architecture) The integration is fully automatic — no manual API calls needed after initial setup. ``` ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │ Merchant │────▶│ Orion API │────▶│ Google Wallet API │ │ sets issuer │ │ │ │ │ │ ID in UI │ │ │ │ │ └─────────────┘ └──────────────┘ └─────────────────────┘ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │ Customer │────▶│ Orion API │────▶│ Google Wallet API │ │ enrolls │ │ │ │ │ │ │ │create_class +│ │ POST /loyaltyClass │ │ │ │create_object │ │ POST /loyaltyObject │ │ │◀────│ save_url │ │ │ │ │ └──────────────┘ └─────────────────────┘ │ taps "Add │ │ to Wallet" │────▶ Google Wallet app adds pass automatically └─────────────┘ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │ Staff adds │────▶│ Orion API │────▶│ Google Wallet API │ │ stamp/pts │ │ │ │ │ │ │ │update_object │ │ PATCH /loyaltyObject│ └─────────────┘ └──────────────┘ └─────────────────────┘ Pass auto-updates on customer's phone ``` **Automatic triggers:** | Event | Wallet Action | Service Call | |-------|---------------|--------------| | Customer enrolls | Create class (if first) + create object | `wallet_service.create_wallet_objects()` | | Stamp added/redeemed/voided | Update object with new balance | `wallet_service.sync_card_to_wallets()` | | Points earned/redeemed/voided/adjusted | Update object with new balance | `wallet_service.sync_card_to_wallets()` | | Customer opens dashboard | Generate save URL (JWT, 1h expiry) | `wallet_service.get_add_to_wallet_urls()` | No push notifications needed — Google syncs object changes automatically. ### 25.11 Next Steps After Google Wallet is verified working: 1. **Submit for Google production approval** — required before non-test users can add passes 2. **Apple Wallet** — separate setup requiring Apple Developer account, APNs certificates, and pass signing certificates (see [Loyalty Module docs](../modules/loyalty.md#apple-wallet)) --- ## Domain & Port Reference | Service | Internal Port | External Port | Domain (via Caddy) | |---|---|---|---| | Orion API | 8000 | 8001 | `api.wizard.lu` | | Main Platform | 8000 | 8001 | `wizard.lu` | | OMS Platform | 8000 | 8001 | `omsflow.lu` | | Loyalty+ Platform | 8000 | 8001 | `rewardflow.lu` | | PostgreSQL | 5432 | 5432 | (internal only) | | Redis | 6379 | 6380 | (internal only) | | Flower | 5555 | 5555 | `flower.wizard.lu` | | Gitea | 3000 | 3000 | `git.wizard.lu` | | Prometheus | 9090 | 9090 (localhost) | (internal only) | | Grafana | 3000 | 3001 (localhost) | `grafana.wizard.lu` | | Node Exporter | 9100 | 9100 (localhost) | (internal only) | | cAdvisor | 8080 | 8080 (localhost) | (internal only) | | Alertmanager | 9093 | 9093 (localhost) | (internal only) | | Caddy | — | 80, 443 | (reverse proxy) | !!! note "Single backend, multiple domains" All platform domains route to the same FastAPI backend. The `PlatformContextMiddleware` identifies the platform from the `Host` header. See [Multi-Platform Architecture](../architecture/multi-platform-cms.md) for details. ## Directory Structure on Server ``` ~/ ├── apps/ │ └── orion/ # Orion application │ ├── .env # Production environment │ ├── docker-compose.yml # App stack (API, DB, Redis, Celery, monitoring) │ ├── monitoring/ # Prometheus + Grafana config │ ├── logs/ # Application logs │ ├── uploads/ # User uploads │ └── exports/ # Export files ├── backups/ │ ├── orion/ │ │ ├── daily/ # 7-day retention │ │ └── weekly/ # 4-week retention │ └── gitea/ │ ├── daily/ │ └── weekly/ ├── gitea/ │ └── docker-compose.yml # Gitea + PostgreSQL └── gitea-runner/ # CI/CD runner (act_runner v0.2.13) ├── act_runner # symlink → act_runner-0.2.13-linux-arm64 ├── act_runner-0.2.13-linux-arm64 └── .runner # registration config ``` ## Troubleshooting ### Permission denied on logs The Docker container runs as `appuser` (UID 1000). Host-mounted volumes need matching ownership: ```bash sudo chown -R 1000:1000 logs uploads exports ``` ### Celery workers restarting Check logs for import errors: ```bash docker compose --profile full logs celery-worker --tail 30 ``` Common cause: stale task module references in `app/core/celery_config.py`. ### SSH service name on Ubuntu 24.04 Ubuntu 24.04 uses `ssh` not `sshd`: ```bash sudo systemctl restart ssh # correct sudo systemctl restart sshd # will fail ``` ### git pull fails with local changes If `docker-compose.yml` was edited on the server (e.g. passwords), stash before pulling: ```bash git stash git pull git stash pop ``` ## Maintenance ### Deploy updates Deployments happen automatically when pushing to master (see [Step 16](#step-16-continuous-deployment)). For manual deploys: ```bash cd ~/apps/orion && bash scripts/deploy.sh ``` The script handles stashing local changes, pulling, rebuilding containers, running migrations, and health checks. ### View logs ```bash # Follow all logs in real-time docker compose --profile full logs -f # Follow a specific service docker compose --profile full logs -f api docker compose --profile full logs -f celery-worker docker compose --profile full logs -f celery-beat docker compose --profile full logs -f flower # View last N lines (useful for debugging crashes) docker compose --profile full logs --tail=50 api docker compose --profile full logs --tail=100 celery-worker # Filter logs for errors docker compose --profile full logs api | grep -i "error\|exception\|failed" ``` ### Check container status ```bash # Overview of all containers (health, uptime, ports) docker compose --profile full ps # Watch for containers stuck in "Restarting" — indicates a crash loop # Healthy containers show: Up Xs (healthy) ``` ### Restart services ```bash # Restart a single service docker compose --profile full restart api # Restart everything docker compose --profile full restart # Full rebuild (after code changes) docker compose --profile full up -d --build ``` ### Quick access URLs After Caddy is configured: | Service | URL | |---|---| | Main Platform | `https://wizard.lu` | | API Swagger docs | `https://api.wizard.lu/docs` | | API ReDoc | `https://api.wizard.lu/redoc` | | Admin panel | `https://wizard.lu/admin/login` | | Health check | `https://api.wizard.lu/health` | | Prometheus metrics | `https://api.wizard.lu/metrics` | | Gitea | `https://git.wizard.lu` | | Flower | `https://flower.wizard.lu` | | Grafana | `https://grafana.wizard.lu` | | OMS Platform | `https://omsflow.lu` | | Loyalty+ Platform | `https://rewardflow.lu` | Direct IP access (temporary, until firewall rules are removed): | Service | URL | |---|---| | API | `http://91.99.65.229:8001/docs` | | Gitea | `http://91.99.65.229:3000` | | Flower | `http://91.99.65.229:5555` |