Files
orion/docs/deployment/hetzner-server-setup.md
Samir Boulahtit 05c53e1865
Some checks failed
CI / pytest (push) Failing after 48m4s
CI / validate (push) Successful in 25s
CI / ruff (push) Successful in 11s
CI / dependency-scanning (push) Successful in 29s
CI / docs (push) Has been skipped
CI / deploy (push) Has been skipped
docs(deployment): add verified full reset procedure to Hetzner guide
Document the complete nuclear reset sequence (tested end-to-end):
stop → build → infra up → schema reset → migrations → seeds → start.
Update seeded data counts to match current output (30 CMS pages,
12 tiers, 3 admins, 28 email templates). Switch from exec to run --rm
for seed commands so they work before services are started.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 23:21:52 +01:00

82 KiB
Raw Blame History

Hetzner Cloud Server Setup

Complete step-by-step guide for deploying Orion on a Hetzner Cloud VPS.

!!! info "Server Details" - Provider: Hetzner Cloud - OS: Ubuntu 24.04.3 LTS (upgraded to 24.04.4 after updates) - Architecture: aarch64 (ARM64) - IP: 91.99.65.229 - IPv6: 2a01:4f8:1c1a:b39c::1 - Disk: 37 GB - RAM: 4 GB - Auth: SSH key (configured via Hetzner Console) - Setup date: 2026-02-11

!!! success "Progress — 2026-02-12" Completed (Steps 116):

- Non-root user `samir` with SSH key
- Server hardened (UFW firewall, SSH root login disabled, fail2ban)
- Docker 29.2.1 & Docker Compose 5.0.2 installed
- Gitea running at `https://git.wizard.lu` (user: `sboulahtit`, repo: `orion`)
- Repository cloned to `~/apps/orion`
- Production `.env` configured with generated secrets
- Full Docker stack deployed (API, PostgreSQL, Redis, Celery worker/beat, Flower)
- Database migrated (76 tables) and seeded (admin, platforms, CMS, email templates)
- API verified at `https://api.wizard.lu/health`
- DNS A records configured and propagated for `wizard.lu` and subdomains
- Caddy 2.10.2 reverse proxy with auto-SSL (Let's Encrypt)
- Temporary firewall rules removed (ports 3000, 8001)
- Gitea Actions runner v0.2.13 registered and running as systemd service
- SSH key added to Gitea for local push via SSH
- Git remote updated: `ssh://git@git.wizard.lu:2222/sboulahtit/orion.git`
- ProxyHeadersMiddleware added for correct HTTPS behind Caddy
- Fixed TierLimitExceededException import and Pydantic @field_validator bugs
- `wizard.lu` serving frontend with CSS over HTTPS (mixed content fixed)
- `/merchants` and `/admin` redirect fix (CMS catch-all was intercepting)

!!! success "Progress — 2026-02-13" Completed:

- CI fully green: ruff (lint), pytest, architecture, docs all passing
- Pinned ruff==0.8.4 in requirements-dev.txt (CI/local version mismatch was root cause of recurring I001 errors)
- Pre-commit hooks configured and installed (ruff auto-fix, architecture validation, trailing whitespace, end-of-file)
- AAAA (IPv6) records added for all wizard.lu domains
- mkdocs build clean (zero warnings) — all 32 orphan pages added to nav
- Pre-commit documented in `docs/development/code-quality.md`
- **Step 16: Continuous deployment** — auto-deploy on push to master via `scripts/deploy.sh` + Gitea Actions

**Next steps:**

- [x] Step 17: Backups
- [x] Step 18: Monitoring & observability

**Deferred (not urgent, do when all platforms ready):**

- [x] ~~DNS A + AAAA records for platform domains (`omsflow.lu`, `rewardflow.lu`)~~
- [x] ~~Uncomment platform domains in Caddyfile after DNS propagation~~

!!! success "Progress — 2026-02-14" Completed:

- **Wizamart → Orion rename** — 1,086 occurrences replaced across 184 files (database identifiers, email addresses, domains, config, templates, docs, seed data)
- Template renamed: `homepage-wizamart.html` → `homepage-orion.html`
- **Production DB rebuilt from scratch** with Orion naming (`orion_db`, `orion_user`)
- Platform domains configured in seed data: wizard.lu (main), omsflow.lu, rewardflow.lu (loyalty)
- Docker volume explicitly named `orion_postgres_data`
- `.dockerignore` added — prevents `.env` from being baked into Docker images
- `env_file: .env` added to `docker-compose.yml` — containers load host env vars properly
- `CapacitySnapshot` model import fixed (moved from billing to monitoring in `alembic/env.py`)
- All services verified healthy at `https://api.wizard.lu/health`
- **Step 17: Backups** — automated pg_dump scripts (daily + weekly rotation), R2 offsite upload, restore helper
- **Step 18: Monitoring** — Prometheus, Grafana, node-exporter, cAdvisor added to docker-compose; `/metrics` endpoint activated via `prometheus_client`

!!! success "Progress — 2026-02-15" Completed:

- **Step 17 server-side**: Hetzner backups enabled (5 of 7 daily images, last 6.22 GB)
- **Step 18 server-side**: Full monitoring stack deployed — Prometheus (4/4 targets up), Grafana at `https://grafana.wizard.lu` with Node Exporter Full (#1860) and Docker/cAdvisor (#193) dashboards
- **Domain rename**: `oms.lu` → `omsflow.lu`, `loyalty.lu` → `rewardflow.lu` across entire codebase (19 + 13 files)
- **Platform domains live**: all three platforms serving HTTPS via Caddy with auto-SSL
    - `https://wizard.lu` (main)
    - `https://omsflow.lu` (OMS)
    - `https://rewardflow.lu` (Loyalty+)
- Platform `domain` column updated in production DB
- RAM usage ~2.4 GB on 4 GB server (stable, CI jobs add ~550 MB temporarily)
- **Systemd backup timer** (`orion-backup.timer`) — daily at 03:00 UTC, tested manually
- **Cloudflare R2 offsite backups** — `orion-backups` bucket, `awscli` configured with `--profile r2`, `--upload` flag added to systemd timer
- `python3-pip` and `awscli` installed on server (pip user install, PATH added to `.bashrc` and systemd service)

**Steps 118 fully complete.** All infrastructure operational.

!!! success "Progress — 2026-02-15 (continued)" Completed (Steps 1924):

- **Step 19: Prometheus Alerting** — alert rules (host, container, API, Celery, targets) + Alertmanager with email routing
- **Step 20: Security Hardening** — Docker network segmentation (frontend/backend/monitoring), fail2ban config, unattended-upgrades
- **Step 21: Cloudflare Domain Proxy** — origin certificates, WAF, bot protection, rate limiting (documented, user deploys)
- **Step 22: Incident Response** — 8 runbooks with copy-paste commands, severity levels, decision tree
- **Step 23: Environment Reference** — all 55+ env vars documented with defaults and production requirements
- **Step 24: Documentation Updates** — hetzner docs, launch readiness, mkdocs nav updated

**Steps 124 fully complete.** Enterprise infrastructure hardening done.

!!! success "Progress — 2026-02-24" Completed:

- **Step 25: Google Wallet Integration** — Google Cloud project "Orion" created, Wallet API enabled, service account configured
    - Google Pay Merchant ID: `BCR2DN5TW2CNXDAG`
    - Google Wallet Issuer ID: `3388000000023089598`
    - Service account: `wallet-service@orion-488322.iam.gserviceaccount.com` (admin role in Pay & Wallet Console)
    - Service account JSON key generated
    - Dependencies added to `requirements.txt`: `google-auth>=2.0.0`, `PyJWT>=2.0.0` (commit `d36783a`)
    - Loyalty env vars added to `.env.example` and `docs/deployment/environment.md`
    - `LOYALTY_GOOGLE_ISSUER_ID` and `LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON` added to `app/core/config.py` Settings class
    - **End-to-end integration wired:**
        - Enrollment auto-creates Google Wallet class + object (`card_service` → `wallet_service.create_wallet_objects`)
        - Stamp/points operations auto-sync to Google Wallet (`stamp_service`/`points_service` → `wallet_service.sync_card_to_wallets`)
        - Storefront API returns wallet URLs (`GET /loyalty/card`, `POST /loyalty/enroll`)
        - "Add to Google Wallet" button wired in storefront dashboard and enrollment success page (Alpine.js conditional rendering)
        - Google Wallet is a platform-wide config (env vars only) — merchants don't need to configure anything

**Next steps:**

- [ ] Upload service account JSON to Hetzner server
- [ ] Set `LOYALTY_GOOGLE_ISSUER_ID` and `LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON` in production `.env`
- [ ] Restart app and test end-to-end: enroll → add pass → stamp → verify pass updates
- [ ] Submit for Google production approval when ready
- [ ] Apple Wallet setup (APNs push, certificates, pass images)

!!! success "Progress — 2026-02-16" Completed:

- **Step 21: Cloudflare Domain Proxy** — all three domains active on Cloudflare (Full setup):
    - `wizard.lu` — DNS records configured (6 A + 6 AAAA), old CNAME records removed, NS switched at Netim, SSL/TLS set to Full (Strict), Always Use HTTPS enabled, AI crawlers blocked
    - `omsflow.lu` — DNS records configured (2 A + 2 AAAA), NS switched at Netim, SSL/TLS Full (Strict) + Always Use HTTPS
    - `rewardflow.lu` — DNS records configured (2 A + 2 AAAA), NS switched at Netim, SSL/TLS Full (Strict) + Always Use HTTPS
    - `git.wizard.lu` stays DNS-only (grey cloud) for SSH access on port 2222
    - DNSSEC disabled at registrar (will re-enable via Cloudflare later)
    - Registrar: Netim (`netim.com`)
    - Origin certificates generated (non-wildcard, specific subdomains) and installed on server
    - Caddyfile updated: origin certs for proxied domains, `tls { issuer acme }` for `git.wizard.lu`
    - Access logging enabled for fail2ban (`/var/log/caddy/access.log`)
    - All domains verified working: `wizard.lu`, `omsflow.lu`, `rewardflow.lu`, `api.wizard.lu`, `git.wizard.lu`
- **Step 19: SendGrid SMTP** — fully configured and tested:
    - SendGrid account created (free trial, 60-day limit)
    - `wizard.lu` domain authenticated (5 CNAME + 1 TXT in Cloudflare DNS)
    - Link branding enabled
    - API key `orion-production` created
    - Alertmanager SMTP configured (`alerts@wizard.lu` → SendGrid)
    - App email configured (`EMAIL_PROVIDER=sendgrid`, `noreply@wizard.lu`)
    - Test alert sent and received successfully

- **Cloudflare security** — configured on all three domains:
    - Bot Fight Mode enabled
    - DDoS protection active (default)
    - Rate limiting: 100 req/10s on `/api/` paths, block for 10s

**Steps 124 fully deployed and operational.**

!!! success "Progress — 2026-02-17" Launch readiness — fully deployed and verified (44/44 checks pass):

- **Memory limits** on all 6 app containers (db: 256m, redis: 128m, api: 512m, celery-worker: 768m, celery-beat: 128m, flower: 192m) — rebalanced after celery-worker OOM kills (concurrency reduced from 4 to 2)
- **Flower port** restricted to localhost only (`127.0.0.1:5555:5555`) — access via Caddy reverse proxy
- **Flower password** changed from default
- **Infrastructure health checks** — `/health/ready` now checks PostgreSQL (`SELECT 1`) and Redis (`ping`) with individual check details and latency
- **fail2ban Caddy auth jail** deployed — bans IPs after 10 failed auth attempts
- **Unattended upgrades** verified active
- **Scaling guide** — practical playbook at `docs/deployment/scaling-guide.md`
- **Server verification script** — `scripts/verify-server.sh` (44/44 PASS, 0 FAIL, 0 WARN)

**Server is launch-ready for first client (24 stores).**

Installed Software Versions

Software Version
Ubuntu 24.04.4 LTS
Kernel 6.8.0-100-generic (aarch64)
Docker 29.2.1
Docker Compose 5.0.2
PostgreSQL 15 (container)
Redis 7-alpine (container)
Python 3.11-slim (container)
Gitea latest (container)
Caddy 2.10.2
act_runner 0.2.13

Step 1: Initial Server Access

ssh root@91.99.65.229

Step 2: Create Non-Root User

Create a dedicated user with sudo privileges and copy the SSH key:

# Create user
adduser samir
usermod -aG sudo samir

# Copy SSH keys to new user
rsync --archive --chown=samir:samir ~/.ssh /home/samir

Verify by connecting as the new user (from a new terminal):

ssh samir@91.99.65.229

Step 3: System Update & Essential Packages

sudo apt update && sudo apt upgrade -y

sudo apt install -y \
    curl \
    git \
    wget \
    ufw \
    fail2ban \
    htop \
    unzip \
    make

Reboot if a kernel upgrade is pending:

sudo reboot

Step 4: Firewall Configuration (UFW)

sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable

Verify:

sudo ufw status

Expected output:

Status: active

To                         Action      From
--                         ------      ----
OpenSSH                    ALLOW       Anywhere
80/tcp                     ALLOW       Anywhere
443/tcp                    ALLOW       Anywhere

Step 5: Harden SSH

!!! warning "Before doing this step" Make sure you can SSH as samir from another terminal first! If you lock yourself out, you'll need to use Hetzner's console rescue mode.

sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh    # Note: Ubuntu 24.04 uses 'ssh' not 'sshd'

Step 6: Install Docker & Docker Compose

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker samir

Log out and back in for the group change:

exit
# Then: ssh samir@91.99.65.229

Verify:

docker --version
docker compose version

Step 7: Gitea (Self-Hosted Git)

Create the Gitea directory and compose file:

mkdir -p ~/gitea && cd ~/gitea

Create docker-compose.yml with nano ~/gitea/docker-compose.yml:

services:
  gitea:
    image: gitea/gitea:latest
    container_name: gitea
    restart: always
    environment:
      - USER_UID=1000
      - USER_GID=1000
      - GITEA__database__DB_TYPE=postgres
      - GITEA__database__HOST=gitea-db:5432
      - GITEA__database__NAME=gitea
      - GITEA__database__USER=gitea
      - GITEA__database__PASSWD=<GENERATED_PASSWORD>
      - GITEA__server__ROOT_URL=http://91.99.65.229:3000/
      - GITEA__server__SSH_DOMAIN=91.99.65.229
      - GITEA__server__DOMAIN=91.99.65.229
      - GITEA__actions__ENABLED=true
    volumes:
      - gitea-data:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "3000:3000"
      - "2222:22"
    depends_on:
      gitea-db:
        condition: service_healthy

  gitea-db:
    image: postgres:15
    container_name: gitea-db
    restart: always
    environment:
      POSTGRES_DB: gitea
      POSTGRES_USER: gitea
      POSTGRES_PASSWORD: <GENERATED_PASSWORD>
    volumes:
      - gitea-db-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U gitea"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  gitea-data:
  gitea-db-data:

Generate the database password with openssl rand -hex 16 and replace <GENERATED_PASSWORD> in both places.

Open the firewall for Gitea and start:

sudo ufw allow 3000/tcp
docker compose up -d
docker compose ps

Visit http://91.99.65.229:3000 and complete the setup wizard. Create an admin account (e.g. sboulahtit).

Then create a repository (e.g. orion).

Step 8: Push Repository to Gitea

Add SSH Key to Gitea

Before pushing via SSH, add your local machine's public key to Gitea:

  1. Copy your public key:

    cat ~/.ssh/id_ed25519.pub
    # Or if using RSA: cat ~/.ssh/id_rsa.pub
    
  2. In the Gitea web UI: click your avatar → SettingsSSH / GPG KeysAdd Key → paste the key.

  3. Add the Gitea SSH host to known hosts:

    ssh-keyscan -p 2222 git.wizard.lu >> ~/.ssh/known_hosts
    

Add Remote and Push

From your local machine:

cd /home/samir/Documents/PycharmProjects/letzshop-product-import
git remote add gitea ssh://git@git.wizard.lu:2222/sboulahtit/orion.git
git push gitea master

!!! note "Remote URL updated" The remote was initially set to http://91.99.65.229:3000/... during setup. After Caddy was configured, it was updated to use the domain with SSH: ssh://git@git.wizard.lu:2222/sboulahtit/orion.git

To update an existing remote:
```bash
git remote set-url gitea ssh://git@git.wizard.lu:2222/sboulahtit/orion.git
```

Step 9: Clone Repository on Server

mkdir -p ~/apps
cd ~/apps
git clone http://localhost:3000/sboulahtit/orion.git
cd orion

Step 10: Configure Production Environment

cp .env.example .env
nano .env

Critical Production Values

Generate secrets:

openssl rand -hex 32   # For JWT_SECRET_KEY
openssl rand -hex 16   # For database password
Variable How to Generate / What to Set
DEBUG False
DATABASE_URL postgresql://orion_user:YOUR_DB_PW@db:5432/orion_db
JWT_SECRET_KEY Output of openssl rand -hex 32
ADMIN_PASSWORD Strong password
USE_CELERY true
REDIS_URL redis://redis:6379/0
STRIPE_SECRET_KEY Your Stripe secret key (configure later)
STRIPE_PUBLISHABLE_KEY Your Stripe publishable key (configure later)
STRIPE_WEBHOOK_SECRET Your Stripe webhook secret (configure later)
STORAGE_BACKEND r2 (if using Cloudflare R2, configure later)

Also update the PostgreSQL password in docker-compose.yml (lines 9 and 40) to match.

Step 11: Deploy with Docker Compose

cd ~/apps/orion

# Create directories with correct permissions for the container user
mkdir -p logs uploads exports
sudo chown -R 1000:1000 logs uploads exports

# Start infrastructure first
docker compose up -d db redis

# Wait for health checks to pass
docker compose ps

# Build and start the full stack
docker compose --profile full up -d --build

Verify all services are running:

docker compose --profile full ps

Expected: api (healthy), db (healthy), redis (healthy), celery-worker (healthy), celery-beat (running), flower (running).

Step 12: Initialize Database

!!! note "PYTHONPATH required" The seed scripts need PYTHONPATH=/app set explicitly when running inside the container. Use run --rm (not exec) if the api service is not yet running.

First-time initialization

# Run migrations (use 'heads' for multi-branch Alembic)
docker compose --profile full run --rm -e PYTHONPATH=/app api alembic upgrade heads

# Seed production data (order matters)
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/init_production.py
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/init_log_settings.py
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/create_default_content_pages.py
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/seed_email_templates.py

Full reset procedure (nuclear — deletes all data)

Use this to reset the database from scratch. Stop workers first to avoid task conflicts.

# 1. Stop everything
docker compose --profile full down

# 2. Rebuild ALL images (picks up latest code)
docker compose --profile full build

# 3. Start infrastructure only
docker compose up -d db redis

# 4. Wait for healthy
docker compose exec db pg_isready -U orion_user -d orion_db
docker compose exec redis redis-cli ping

# 5. Drop and recreate schema
docker compose --profile full run --rm -e PYTHONPATH=/app api python -c "
from app.core.config import settings
from sqlalchemy import create_engine, text
e = create_engine(settings.database_url)
c = e.connect()
c.execute(text('DROP SCHEMA IF EXISTS public CASCADE'))
c.execute(text('CREATE SCHEMA public'))
c.commit()
c.close()
print('Schema reset complete')
"

# 6. Run migrations
docker compose --profile full run --rm -e PYTHONPATH=/app api alembic upgrade heads

# 7. Seed in order
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/init_production.py
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/init_log_settings.py
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/create_default_content_pages.py
docker compose --profile full run --rm -e PYTHONPATH=/app api python scripts/seed/seed_email_templates.py

# 8. Start all services
docker compose --profile full up -d

# 9. Verify
docker compose --profile full ps
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

Seeded Data Summary

Data Count
Admin users 3 (super admin + OMS admin + Loyalty admin)
Platforms 3 (OMS, Wizard, Loyalty)
Platform modules 57
Admin settings 15
Subscription tiers 12 (4 per platform: Essential, Professional, Business, Enterprise)
Log settings 6
CMS pages 30 (platform homepages + marketing pages + store defaults)
Email templates 28 (4 languages: en, fr, de, lb)

Step 13: DNS Configuration

Before setting up Caddy, point your domain's DNS to the server.

wizard.lu (Main Platform) — Completed

Type Name Value TTL
A @ 91.99.65.229 300
A www 91.99.65.229 300
A api 91.99.65.229 300
A git 91.99.65.229 300
A flower 91.99.65.229 300

omsflow.lu (OMS Platform) — Completed

Type Name Value TTL
A @ 91.99.65.229 300
A www 91.99.65.229 300
AAAA @ 2a01:4f8:1c1a:b39c::1 300
AAAA www 2a01:4f8:1c1a:b39c::1 300

rewardflow.lu (Loyalty+ Platform) — Completed

Type Name Value TTL
A @ 91.99.65.229 300
A www 91.99.65.229 300
AAAA @ 2a01:4f8:1c1a:b39c::1 300
AAAA www 2a01:4f8:1c1a:b39c::1 300

IPv6 (AAAA) Records — Completed

AAAA records are included in the DNS tables above for all domains. To verify your IPv6 address:

ip -6 addr show eth0 | grep 'scope global'

It should match the value in the Hetzner Cloud Console (Networking tab). Then create AAAA records mirroring each A record above, e.g.:

Type Name (wizard.lu) Value TTL
AAAA @ 2a01:4f8:1c1a:b39c::1 300
AAAA www 2a01:4f8:1c1a:b39c::1 300
AAAA api 2a01:4f8:1c1a:b39c::1 300
AAAA git 2a01:4f8:1c1a:b39c::1 300
AAAA flower 2a01:4f8:1c1a:b39c::1 300

Repeat for omsflow.lu and rewardflow.lu.

!!! tip "DNS propagation" Set TTL to 300 (5 minutes) initially. DNS changes can take up to 24 hours to propagate globally, but usually complete within 30 minutes. Verify with: dig api.wizard.lu +short

Step 14: Reverse Proxy with Caddy

Install Caddy:

sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' \
    | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \
    | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install caddy

Caddyfile Configuration

Edit /etc/caddy/Caddyfile:

# ─── Platform 1: Main (wizard.lu) ───────────────────────────
wizard.lu {
    reverse_proxy localhost:8001
}

www.wizard.lu {
    redir https://wizard.lu{uri} permanent
}

# ─── Platform 2: OMS (omsflow.lu) ───────────────────────────────
omsflow.lu {
    reverse_proxy localhost:8001
}

www.omsflow.lu {
    redir https://omsflow.lu{uri} permanent
}

# ─── Platform 3: Loyalty+ (rewardflow.lu) ──────────────────
rewardflow.lu {
    reverse_proxy localhost:8001
}

www.rewardflow.lu {
    redir https://rewardflow.lu{uri} permanent
}

# ─── Services ───────────────────────────────────────────────
api.wizard.lu {
    reverse_proxy localhost:8001
}

git.wizard.lu {
    reverse_proxy localhost:3000
}

flower.wizard.lu {
    reverse_proxy localhost:5555
}

!!! info "How multi-platform routing works" All platform domains (wizard.lu, omsflow.lu, rewardflow.lu) point to the same FastAPI backend on port 8001. The PlatformContextMiddleware reads the Host header to detect which platform the request is for. Caddy preserves the Host header by default, so no extra configuration is needed.

The `domain` column in the `platforms` database table must match:

| Platform | code | domain |
|---|---|---|
| Main | `main` | `wizard.lu` |
| OMS | `oms` | `omsflow.lu` |
| Loyalty+ | `loyalty` | `rewardflow.lu` |

Start Caddy:

sudo systemctl restart caddy

Caddy automatically provisions Let's Encrypt SSL certificates for all configured domains.

Verify:

curl -I https://wizard.lu
curl -I https://api.wizard.lu/health
curl -I https://git.wizard.lu

After Caddy is working, remove the temporary firewall rules:

sudo ufw delete allow 3000/tcp
sudo ufw delete allow 8001/tcp

Update Gitea's configuration to use its new domain. In ~/gitea/docker-compose.yml, change:

- GITEA__server__ROOT_URL=https://git.wizard.lu/
- GITEA__server__SSH_DOMAIN=git.wizard.lu
- GITEA__server__DOMAIN=git.wizard.lu

Then restart Gitea:

cd ~/gitea && docker compose up -d gitea

Future: Multi-Tenant Store Routing

Stores on each platform use two routing modes:

  • Standard (subdomain): acme.omsflow.lu — included in the base subscription
  • Premium (custom domain): acme.lu — available with premium subscription tiers

Both modes are handled by the StoreContextMiddleware which reads the Host header, so Caddy just needs to forward requests and preserve the header.

Wildcard Subdomains (for store subdomains)

When stores start using subdomains like acme.omsflow.lu, add wildcard blocks:

*.omsflow.lu {
    reverse_proxy localhost:8001
}

*.rewardflow.lu {
    reverse_proxy localhost:8001
}

*.wizard.lu {
    reverse_proxy localhost:8001
}

!!! warning "Wildcard SSL requires DNS challenge" Let's Encrypt cannot issue wildcard certificates via HTTP challenge. Wildcard certs require a DNS challenge, which means installing a Caddy DNS provider plugin (e.g. caddy-dns/cloudflare) and configuring API credentials for your DNS provider. See Caddy DNS challenge docs.

Custom Store Domains (for premium stores)

When premium stores bring their own domains (e.g. acme.lu), use Caddy's on-demand TLS:

https:// {
    tls {
        on_demand
    }
    reverse_proxy localhost:8001
}

On-demand TLS auto-provisions SSL certificates when a new domain connects. Add an ask endpoint to validate that the domain is registered in the store_domains table, preventing abuse:

tls {
    on_demand
    ask http://localhost:8001/api/v1/internal/verify-domain
}

!!! note "Not needed yet" Wildcard subdomains and custom domains are future work. The current Caddyfile handles all platform root domains and service subdomains.

Step 15: Gitea Actions Runner

!!! warning "ARM64 architecture" This server is ARM64. Download the arm64 binary, not amd64.

Download and install:

mkdir -p ~/gitea-runner && cd ~/gitea-runner

# Download act_runner v0.2.13 (ARM64)
wget https://gitea.com/gitea/act_runner/releases/download/v0.2.13/act_runner-0.2.13-linux-arm64
chmod +x act_runner-0.2.13-linux-arm64
ln -s act_runner-0.2.13-linux-arm64 act_runner

Register the runner (get token from Site Administration > Actions > Runners > Create new Runner):

./act_runner register \
    --instance https://git.wizard.lu \
    --token YOUR_RUNNER_TOKEN

Accept the default runner name and labels when prompted.

Create a systemd service for persistent operation:

sudo nano /etc/systemd/system/gitea-runner.service
[Unit]
Description=Gitea Actions Runner
After=network.target

[Service]
Type=simple
User=samir
WorkingDirectory=/home/samir/gitea-runner
ExecStart=/home/samir/gitea-runner/act_runner daemon
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable --now gitea-runner
sudo systemctl status gitea-runner

Verify the runner shows as Online in Gitea: Site Administration > Actions > Runners.

Step 16: Continuous Deployment

Automate deployment on every successful push to master. The Gitea Actions runner and the app both run on the same server, so the deploy job SSHes from the CI Docker container to 172.17.0.1 (Docker bridge gateway — see note in 16.2).

push to master
  ├── ruff ──────┐
  ├── pytest ────┤
  └── validate ──┤
                 └── deploy (SSH → scripts/deploy.sh)
                          ├── git stash / pull / pop
                          ├── docker compose up -d --build
                          ├── alembic upgrade heads
                          └── health check (retries)

16.1 Generate Deploy SSH Key (on server)

ssh-keygen -t ed25519 -C "gitea-deploy@wizard.lu" -f ~/.ssh/deploy_ed25519 -N ""
cat ~/.ssh/deploy_ed25519.pub >> ~/.ssh/authorized_keys

16.2 Add Gitea Secrets

In Repository Settings > Actions > Secrets, add:

Secret Value
DEPLOY_SSH_KEY Contents of ~/.ssh/deploy_ed25519 (private key)
DEPLOY_HOST 172.17.0.1 (Docker bridge gateway — not 127.0.0.1)
DEPLOY_USER samir
DEPLOY_PATH /home/samir/apps/orion

!!! important "Why 172.17.0.1 and not 127.0.0.1?" CI jobs run inside Docker containers where 127.0.0.1 is the container, not the host. 172.17.0.1 is the Docker bridge gateway that routes to the host. Ensure the firewall allows SSH from the Docker bridge network: sudo ufw allow from 172.17.0.0/16 to any port 22. When Gitea and Orion are on separate servers, replace with the Orion server's IP.

16.3 Deploy Script

The deploy script lives at scripts/deploy.sh in the repository. It:

  1. Stashes local changes (preserves .env)
  2. Pulls latest code (--ff-only)
  3. Pops stash to restore local changes
  4. Rebuilds and restarts Docker containers (docker compose --profile full up -d --build)
  5. Runs database migrations (alembic upgrade heads)
  6. Health checks http://localhost:8001/health with 12 retries (60s total)

Exit codes: 0 success, 1 git pull failed, 2 docker compose failed, 3 migration failed, 4 health check failed.

16.4 CI Workflow

The deploy job in .gitea/workflows/ci.yml runs only on master push, after ruff, pytest, and validate pass:

deploy:
  runs-on: ubuntu-latest
  if: github.event_name == 'push' && github.ref == 'refs/heads/master'
  needs: [ruff, pytest, validate]
  steps:
    - name: Deploy to production
      uses: appleboy/ssh-action@v1
      with:
        host: ${{ secrets.DEPLOY_HOST }}
        username: ${{ secrets.DEPLOY_USER }}
        key: ${{ secrets.DEPLOY_SSH_KEY }}
        port: 22
        command_timeout: 10m
        script: cd ${{ secrets.DEPLOY_PATH }} && bash scripts/deploy.sh

16.5 Manual Fallback

If CI is down, deploy manually:

cd ~/apps/orion && bash scripts/deploy.sh

16.6 Verify

# All app containers running
cd ~/apps/orion && docker compose --profile full ps

# API health (via Caddy with SSL)
curl https://api.wizard.lu/health

# Main platform
curl -I https://wizard.lu

# Gitea
curl -I https://git.wizard.lu

# Flower
curl -I https://flower.wizard.lu

# Gitea runner status
sudo systemctl status gitea-runner

Step 17: Backups

Three layers of backup protection: Hetzner server snapshots, automated PostgreSQL dumps with local rotation, and offsite sync to Cloudflare R2.

17.1 Enable Hetzner Server Backups

In the Hetzner Cloud Console:

  1. Go to Servers > select your server > Backups
  2. Click Enable backups (~20% of server cost, ~1.20 EUR/mo for CAX11)
  3. Hetzner takes automatic weekly snapshots with 7-day retention

This covers full-disk recovery (OS, Docker volumes, config files) but is coarse-grained. Database-level backups (below) give finer restore granularity.

17.2 Cloudflare R2 Setup (Offsite Backup Storage)

R2 provides S3-compatible object storage with a generous free tier (10 GB storage, 10 million reads/month).

Create Cloudflare account and R2 bucket:

  1. Sign up at cloudflare.com (free account)
  2. Go to R2 Object Storage > Create bucket
  3. Name: orion-backups, region: automatic
  4. Go to R2 > Manage R2 API Tokens > Create API token
    • Permissions: Object Read & Write
    • Specify bucket: orion-backups
  5. Note the Account ID, Access Key ID, and Secret Access Key

Install and configure AWS CLI on the server:

# awscli is not available via apt on Ubuntu 24.04; install via pip
sudo apt install -y python3-pip
pip3 install awscli --break-system-packages

# Add ~/.local/bin to PATH (pip installs binaries there)
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

aws configure --profile r2
# Access Key ID: <from step 5>
# Secret Access Key: <from step 5>
# Default region name: auto
# Default output format: json

Test connectivity:

aws s3 ls --endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com --profile r2

Add the R2 backup bucket name to your production .env:

R2_BACKUP_BUCKET=orion-backups

17.3 Backup Script

The backup script at scripts/backup.sh handles:

  • pg_dump of Orion DB (via docker exec orion-db-1)
  • pg_dump of Gitea DB (via docker exec gitea-db)
  • On Sundays: copies daily backup to weekly/ subdirectory
  • Rotation: keeps 7 daily, 4 weekly backups
  • Optional --upload flag: syncs to Cloudflare R2
# Create backup directories
mkdir -p ~/backups/{orion,gitea}/{daily,weekly}

# Run a manual backup
bash ~/apps/orion/scripts/backup.sh

# Run with R2 upload
bash ~/apps/orion/scripts/backup.sh --upload

# Verify backup integrity
ls -lh ~/backups/orion/daily/
gunzip -t ~/backups/orion/daily/*.sql.gz

17.4 Systemd Timer (Daily at 03:00)

Create the service unit:

sudo nano /etc/systemd/system/orion-backup.service
[Unit]
Description=Orion database backup
After=docker.service

[Service]
Type=oneshot
User=samir
Environment="PATH=/home/samir/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/bin/bash /home/samir/apps/orion/scripts/backup.sh --upload
StandardOutput=journal
StandardError=journal

Create the timer:

sudo nano /etc/systemd/system/orion-backup.timer
[Unit]
Description=Run Orion backup daily at 03:00

[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true

[Install]
WantedBy=timers.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable --now orion-backup.timer

# Verify timer is active
systemctl list-timers orion-backup.timer

# Test manually
sudo systemctl start orion-backup.service
journalctl -u orion-backup.service --no-pager

17.5 Restore Procedure

The restore script at scripts/restore.sh handles the full restore cycle:

# Restore Orion database
bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/orion_20260214_030000.sql.gz

# Restore Gitea database
bash ~/apps/orion/scripts/restore.sh gitea ~/backups/gitea/daily/gitea_20260214_030000.sql.gz

The script will:

  1. Stop app containers (keep DB running)
  2. Drop and recreate the database
  3. Restore from the .sql.gz backup
  4. Run Alembic migrations (Orion only)
  5. Restart all containers

To restore from R2 (if local backups are lost):

# Download from R2
aws s3 sync s3://orion-backups/ ~/backups/ \
    --endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com \
    --profile r2

# Then restore as usual
bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/<latest>.sql.gz

17.6 Verification

# Backup files exist
ls -lh ~/backups/orion/daily/
ls -lh ~/backups/gitea/daily/

# Backup integrity
gunzip -t ~/backups/orion/daily/*.sql.gz

# Timer is scheduled
systemctl list-timers orion-backup.timer

# R2 sync (if configured)
aws s3 ls s3://orion-backups/ --endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com --profile r2 --recursive

Step 18: Monitoring & Observability

Prometheus + Grafana monitoring stack with host and container metrics.

Architecture

┌──────────────┐     scrape      ┌─────────────────┐
│  Prometheus  │◄────────────────│  Orion API       │ /metrics
│  :9090       │◄────────────────│  node-exporter   │ :9100
│              │◄────────────────│  cAdvisor        │ :8080
└──────┬───────┘                 └─────────────────┘
       │ query
┌──────▼───────┐
│   Grafana    │──── https://grafana.wizard.lu
│   :3001      │
└──────────────┘

Resource Budget (4 GB Server)

Container RAM Limit Purpose
prometheus 256 MB Metrics storage (15-day retention, 2 GB max)
grafana 192 MB Dashboards (SQLite backend)
node-exporter 64 MB Host CPU/RAM/disk metrics
cadvisor 128 MB Per-container resource metrics
redis-exporter 32 MB Redis memory, connections, command stats
Total new 672 MB

Existing stack ~1.8 GB + 672 MB new = ~2.5 GB. Leaves ~1.6 GB for OS. If too tight, live-upgrade to CAX21 (8 GB/80 GB, ~7.50 EUR/mo) via Cloud Console > Server > Rescale (~2 min restart).

18.1 DNS Record

Add A and AAAA records for grafana.wizard.lu:

Type Name Value TTL
A grafana 91.99.65.229 300
AAAA grafana 2a01:4f8:1c1a:b39c::1 300

18.2 Caddy Configuration

Add to /etc/caddy/Caddyfile:

grafana.wizard.lu {
    reverse_proxy localhost:3001
}

Reload Caddy:

sudo systemctl reload caddy

18.3 Production Environment

Add to ~/apps/orion/.env:

ENABLE_METRICS=true
GRAFANA_URL=https://grafana.wizard.lu
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=<strong-password>

18.4 Deploy

cd ~/apps/orion
docker compose --profile full up -d --build

Verify all containers are running:

docker compose --profile full ps
docker stats --no-stream

18.5 Grafana First Login

  1. Open https://grafana.wizard.lu
  2. Login with admin / <password from .env>
  3. Change the default password when prompted

Import community dashboards:

  • Node Exporter Full: Dashboards > Import > ID 1860 > Select Prometheus datasource
  • Docker / cAdvisor: Dashboards > Import > ID 193 > Select Prometheus datasource

18.6 Verification

# Prometheus metrics from Orion API
curl -s https://api.wizard.lu/metrics | head -5

# Health endpoints
curl -s https://api.wizard.lu/health/live
curl -s https://api.wizard.lu/health/ready

# Prometheus targets (all should be "up")
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep health

# Grafana accessible
curl -I https://grafana.wizard.lu

# RAM usage within limits
docker stats --no-stream

Step 19: Prometheus Alerting

Alert rules and Alertmanager for email notifications when things go wrong.

19.1 Architecture

┌──────────────┐  evaluates   ┌───────────────────┐
│  Prometheus  │─────────────►│  alert.rules.yml  │
│  :9090       │              │  (host, container, │
│              │              │   API, Celery)     │
└──────┬───────┘              └───────────────────┘
       │ fires alerts
┌──────▼───────┐
│ Alertmanager │──── email ──► admin@wizard.lu
│ :9093        │
└──────────────┘

19.2 Alert Rules

Alert rules are defined in monitoring/prometheus/alert.rules.yml:

Group Alert Condition Severity
Host HostHighCpuUsage CPU >80% for 5m warning
Host HostHighMemoryUsage Memory >85% for 5m warning
Host HostHighDiskUsage Disk >80% warning
Host HostDiskFullPrediction Disk full within 4h critical
Containers ContainerHighRestartCount >3 restarts/hour critical
Containers ContainerOomKilled Any OOM kill critical
Containers ContainerHighCpu >80% CPU for 5m warning
API ApiHighErrorRate 5xx rate >1% for 5m critical
API ApiHighLatency P95 >2s for 5m warning
API ApiHealthCheckDown Health check failing 1m critical
Celery CeleryQueueBacklog >100 tasks for 10m warning
Prometheus TargetDown Any target down 2m critical

19.3 Alertmanager Configuration

Alertmanager config is in monitoring/alertmanager/alertmanager.yml:

  • Critical alerts: repeat every 1 hour
  • Warning alerts: repeat every 4 hours
  • Groups by alertname + severity, 30s wait, 5m interval
  • Inhibition: warnings suppressed when critical is already firing for same alert

!!! warning "Configure SMTP before deploying" Edit monitoring/alertmanager/alertmanager.yml and fill in the SMTP settings (host, username, password, recipient email). Alertmanager will start but won't send emails until SMTP is configured.

19.4 Docker Compose Changes

The docker-compose.yml includes:

  • alertmanager service: prom/alertmanager:latest, profiles: [full], port 127.0.0.1:9093, mem_limit: 32m
  • prometheus volumes: mounts alert.rules.yml as read-only
  • prometheus.yml: alerting: section pointing to alertmanager:9093, rule_files: for alert rules, new scrape job for alertmanager

19.5 Alertmanager SMTP Setup (SendGrid)

Alertmanager needs SMTP to send email notifications. SendGrid handles both transactional emails and marketing campaigns under one account — set it up once and use it for everything.

Free trial: 100 emails/day for 60 days. Covers alerting + transactional emails through launch. After 60 days, upgrade to a paid plan (Essentials starts at ~$20/mo for 50K emails/mo).

1. Create SendGrid account:

  1. Sign up at sendgrid.com (free plan)
  2. Complete Sender Authentication: go to Settings > Sender Authentication > Domain Authentication
  3. Authenticate your sending domain (wizard.lu) — SendGrid provides CNAME records to add to DNS
  4. Create an API key: Settings > API Keys > Create API Key (Full Access)
  5. Save the API key — you'll need it for both Alertmanager and the app's EMAIL_PROVIDER

!!! info "SendGrid SMTP credentials" SendGrid uses a single credential pattern for SMTP:

- **Server**: `smtp.sendgrid.net`
- **Port**: `587` (STARTTLS)
- **Username**: literally the string `apikey` (not your email)
- **Password**: your API key (starts with `SG.`)

2. Update alertmanager config on the server:

nano ~/apps/orion/monitoring/alertmanager/alertmanager.yml

Replace the SMTP placeholders:

global:
  smtp_smarthost: 'smtp.sendgrid.net:587'
  smtp_from: 'alerts@wizard.lu'
  smtp_auth_username: 'apikey'
  smtp_auth_password: 'SG.your-sendgrid-api-key-here'
  smtp_require_tls: true

Update the to: addresses in both receivers to your actual email.

3. Update app email config in ~/apps/orion/.env:

# SendGrid for all application emails (password reset, order confirmation, etc.)
EMAIL_PROVIDER=sendgrid
SENDGRID_API_KEY=SG.your-sendgrid-api-key-here
EMAIL_FROM_ADDRESS=noreply@wizard.lu
EMAIL_FROM_NAME=Orion

4. Restart services:

cd ~/apps/orion
docker compose --profile full restart alertmanager api
curl -s http://localhost:9093/-/healthy  # Should return OK

5. Test by triggering a test alert (optional):

# Send a test alert to alertmanager (v2 API)
curl -X POST http://localhost:9093/api/v2/alerts -H "Content-Type: application/json" -d '[{"labels":{"alertname":"TestAlert","severity":"warning"},"annotations":{"summary":"Test alert - please ignore"},"startsAt":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","endsAt":"'$(date -u -d '+5 minutes' +%Y-%m-%dT%H:%M:%SZ)'"}]'

Check your inbox within 30 seconds. Then verify the alert resolved:

curl -s http://localhost:9093/api/v2/alerts | python3 -m json.tool

!!! tip "Alternative SMTP providers" Any SMTP service works if you prefer a different provider:

- **Amazon SES**: `email-smtp.eu-west-1.amazonaws.com:587` — cheapest at scale ($0.10/1K emails)
- **Mailgun**: `smtp.mailgun.org:587` — transactional only, no built-in marketing
- **Gmail**: `smtp.gmail.com:587` with an App Password (not recommended for production)

19.6 Deploy

cd ~/apps/orion
docker compose --profile full up -d

19.7 Verification

# Alertmanager healthy
curl -s http://localhost:9093/-/healthy

# Alert rules loaded
curl -s http://localhost:9090/api/v1/rules | python3 -m json.tool | head -20

# Active alerts (should be empty if all is well)
curl -s http://localhost:9090/api/v1/alerts | python3 -m json.tool

# Alertmanager target in Prometheus
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep alertmanager

19.8 Multi-Domain Email Strategy

SendGrid supports multiple authenticated domains on a single account. This enables sending emails from client domains (e.g., orders@acme.lu) without clients needing their own SendGrid plan.

Current setup:

  • wizard.lu authenticated — used for platform emails (alerts@, noreply@)

Future: client domain onboarding

When a client wants emails sent from their domain (e.g., acme.lu):

  1. In SendGrid: Settings > Sender Authentication > Authenticate a Domain → add acme.lu
  2. SendGrid provides CNAME + TXT records
  3. Client adds the DNS records to their domain
  4. Verify in SendGrid

This is the professional approach — emails come from the client's domain with proper SPF/DKIM, not from wizard.lu. Build an admin flow to automate this as part of store onboarding.

!!! note "Volume planning" The free trial allows 100 emails/day. Once clients start sending marketing campaigns, upgrade to a paid SendGrid plan based on total volume across all client domains.


Step 19b: Sentry Error Tracking

Application-level error tracking with Sentry. While Prometheus monitors infrastructure metrics (CPU, memory, HTTP error rates), Sentry captures individual exceptions with full stack traces, request context, and breadcrumbs — making it possible to debug production errors without SSH access.

!!! info "How Sentry fits into the monitoring stack" ┌──────────────────────────────────────────────────────────────┐ │ Observability Stack │ ├──────────────────┬──────────────────┬────────────────────────┤ │ Prometheus │ Grafana │ Sentry │ │ Infrastructure │ Dashboards │ Application errors │ │ metrics & alerts │ & visualization │ & performance traces │ ├──────────────────┴──────────────────┴────────────────────────┤ │ Prometheus: "API 5xx rate is 3%" │ │ Sentry: "TypeError in /api/v1/orders/checkout line 42 │ │ request_id=abc123, user_id=7, store=acme" │ └──────────────────────────────────────────────────────────────┘

What's Already Wired

The codebase already initializes Sentry in two places — you just need to provide the DSN:

Component File Integrations
FastAPI (API server) main.py:42-58 FastApiIntegration, SqlalchemyIntegration
Celery (background workers) app/core/celery_config.py:31-39 CeleryIntegration

Both read from the same SENTRY_DSN environment variable. When unset, Sentry is silently skipped.

19b.1 Create Sentry Project

  1. Sign up at sentry.io (free Developer plan: 5K errors/month, 1 user)
  2. Create a new project:
    • Platform: Python → FastAPI
    • Project name: orion (or rewardflow)
    • Team: default
  3. Copy the DSN from the project settings — it looks like:
    https://abc123def456@o123456.ingest.de.sentry.io/7891011
    

!!! tip "Sentry pricing" | Plan | Errors/month | Cost | Notes | |---|---|---|---| | Developer (free) | 5,000 | $0 | 1 user, 30-day retention | | Team | 50,000 | $26/mo | Unlimited users, 90-day retention | | Business | 50,000 | $80/mo | SSO, audit logs, 90-day retention |

The free plan is sufficient for launch. Upgrade to Team if you exceed 5K errors/month or need multiple team members.

19b.2 Configure Environment

Add to ~/apps/orion/.env on the server:

# Sentry Error Tracking
SENTRY_DSN=https://your-key@o123456.ingest.de.sentry.io/your-project-id
SENTRY_ENVIRONMENT=production
SENTRY_TRACES_SAMPLE_RATE=0.1
Variable Default Description
SENTRY_DSN None (disabled) Project DSN from Sentry dashboard
SENTRY_ENVIRONMENT development Tags errors by environment (production, staging)
SENTRY_TRACES_SAMPLE_RATE 0.1 Fraction of requests traced for performance (0.1 = 10%)

!!! warning "Traces sample rate" 0.1 (10%) is a good starting point. At high traffic, lower to 0.01 (1%) to stay within the free plan's span limits. For initial launch with low traffic, you can temporarily set 1.0 (100%) for full visibility.

19b.3 Deploy

Restart the API and Celery containers to pick up the new env vars:

cd ~/apps/orion
docker compose --profile full restart api celery-worker celery-beat

Check the API logs to confirm Sentry initialized:

docker compose --profile full logs api --tail 20 | grep -i sentry

You should see:

Sentry initialized for environment: production

19b.4 Verify

1. Trigger a test error by hitting the API with a request that will fail:

curl -s https://api.wizard.lu/api/v1/nonexistent-endpoint-sentry-test

2. Check Sentry dashboard:

  • Go to sentry.io → your project → Issues
  • You should see a 404 Not Found or similar error appear within seconds
  • Click into it to see the full stack trace, request headers, and breadcrumbs

3. Verify Celery integration — check that the Celery worker also reports to Sentry:

docker compose --profile full logs celery-worker --tail 10 | grep -i sentry

19b.5 Sentry Features to Configure

After verifying the basic setup, configure these in the Sentry web UI:

Alerts (Sentry → Alerts → Create Alert):

Alert Condition Action
New issue spike >10 events in 1 hour Email notification
First seen error Any new issue Email notification
Unresolved high-volume >50 events in 24h Email notification

Release tracking — Sentry automatically tags errors with the release version via release=f"orion@{settings.version}" in main.py. This lets you see which deploy introduced a bug.

Source maps (optional, post-launch) — if you want JS errors from the admin frontend, add the Sentry browser SDK to your base template. Not needed for launch since most errors will be server-side.

19b.6 What Sentry Captures

With the current integration, Sentry automatically captures:

Data Source Example
Python exceptions FastAPI + Celery TypeError, ValidationError, unhandled 500s
Request context FastApiIntegration URL, method, headers, query params, user IP
DB query breadcrumbs SqlalchemyIntegration SQL queries leading up to the error
Celery task failures CeleryIntegration Task name, args, retry count, worker hostname
User info send_default_pii=True User email and IP (if authenticated)
Performance traces traces_sample_rate End-to-end request timing, DB query duration

!!! note "Privacy" send_default_pii=True is set in both main.py and celery_config.py. This sends user emails and IP addresses to Sentry for debugging context. If GDPR compliance requires stricter data handling, set this to False and configure Sentry's Data Scrubbing rules.


Step 19c: Redis Monitoring (Redis Exporter)

Add direct Redis monitoring to Prometheus. Without this, Redis can die silently — Celery tasks stop processing and emails stop sending, but no alert fires.

Why Not Just cAdvisor?

cAdvisor tells you "the Redis container is running." The Redis exporter tells you "Redis is running, responding to commands, using 45MB memory, has 3 clients connected, and command latency is 0.2ms." It also catches scenarios where the container is running but Redis itself is unhealthy (maxmemory reached, connection limit hit).

Resource Impact

Container RAM CPU Image Size
redis-exporter ~5 MB negligible ~10 MB

19c.1 Docker Compose

The redis-exporter service has been added to docker-compose.yml:

redis-exporter:
  image: oliver006/redis_exporter:latest
  restart: always
  profiles:
    - full
  ports:
    - "127.0.0.1:9121:9121"
  environment:
    REDIS_ADDR: redis://redis:6379
  depends_on:
    redis:
      condition: service_healthy
  mem_limit: 32m
  networks:
    - backend
    - monitoring

It joins both backend (to reach Redis) and monitoring (so Prometheus can scrape it).

19c.2 Prometheus Scrape Target

Added to monitoring/prometheus.yml:

- job_name: "redis"
  static_configs:
    - targets: ["redis-exporter:9121"]
      labels:
        service: "redis"

19c.3 Alert Rules

Four Redis-specific alerts added to monitoring/prometheus/alert.rules.yml:

Alert Condition Severity What It Means
RedisDown redis_up == 0 for 1m critical Redis is unreachable — all background tasks stalled
RedisHighMemoryUsage >80% of maxmemory for 5m warning Queue backlog or memory leak
RedisHighConnectionCount >50 clients for 5m warning Possible connection leak
RedisRejectedConnections Any rejected in 5m critical Redis is refusing new connections

19c.4 Deploy

cd ~/apps/orion
git pull
docker compose --profile full up -d

Verify the exporter is running and Prometheus can scrape it:

# Exporter health
curl -s http://localhost:9121/health

# Redis metrics flowing
curl -s http://localhost:9121/metrics | grep redis_up

# Prometheus target status (should show "redis" as UP)
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep -A2 '"redis"'

19c.5 Grafana Dashboard

Import the community Redis dashboard:

  1. Open https://grafana.wizard.lu
  2. DashboardsImport → ID 763 → Select Prometheus datasource
  3. You'll see: memory usage, connected clients, commands/sec, hit rate, key count

19c.6 Verification

# Redis is being monitored
curl -s http://localhost:9121/metrics | grep redis_up
# redis_up 1

# Memory usage
curl -s http://localhost:9121/metrics | grep redis_memory_used_bytes
# redis_memory_used_bytes 1.234e+07  (≈12 MB)

# Connected clients
curl -s http://localhost:9121/metrics | grep redis_connected_clients
# redis_connected_clients 4  (API + celery-worker + celery-beat + flower)

# Alert rules loaded
curl -s http://localhost:9090/api/v1/rules | python3 -m json.tool | grep -i redis

Step 20: Security Hardening

Docker network segmentation, fail2ban configuration, and automatic security updates.

20.1 Docker Network Segmentation

Three isolated networks replace the default flat network:

Network Purpose Services
orion_frontend External-facing api
orion_backend Database + workers db, redis, api, celery-worker, celery-beat, flower
orion_monitoring Metrics collection api, prometheus, grafana, node-exporter, cadvisor, alertmanager

The api service is on all three networks because it needs to:

  • Serve HTTP traffic (frontend)
  • Connect to database and Redis (backend)
  • Expose /metrics to Prometheus (monitoring)

This is already configured in the updated docker-compose.yml. After deploying, verify:

docker network ls | grep orion
# Expected: orion_frontend, orion_backend, orion_monitoring

20.2 fail2ban Configuration

fail2ban is already installed (Step 3) but needs jail configuration. All commands below are copy-pasteable.

SSH jail — bans IPs after 3 failed SSH attempts for 24 hours:

sudo tee /etc/fail2ban/jail.local << 'EOF'
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
bantime = 86400
findtime = 600
EOF

Caddy access logging — fail2ban needs a log file to watch. Add a global log directive to your Caddyfile:

sudo nano /etc/caddy/Caddyfile

Add this block at the very top of the Caddyfile, before any site blocks:

{
    log {
        output file /var/log/caddy/access.log {
            roll_size 100MiB
            roll_keep 5
        }
        format json
    }
}

Create the log directory and restart Caddy:

sudo mkdir -p /var/log/caddy
sudo chown caddy:caddy /var/log/caddy
sudo systemctl restart caddy
sudo systemctl status caddy

# Verify logging works (make a request, then check)
curl -s https://wizard.lu > /dev/null
sudo tail -1 /var/log/caddy/access.log | python3 -m json.tool | head -5

Caddy auth filter — matches 401/403 responses in Caddy's JSON logs:

sudo tee /etc/fail2ban/filter.d/caddy-auth.conf << 'EOF'
[Definition]
failregex = ^.*"remote_ip":"<HOST>".*"status":(401|403).*$
ignoreregex =
EOF

Caddy jail — bans IPs after 10 failed auth attempts for 1 hour:

sudo tee /etc/fail2ban/jail.d/caddy.conf << 'EOF'
[caddy-auth]
enabled = true
port = http,https
filter = caddy-auth
logpath = /var/log/caddy/access.log
maxretry = 10
bantime = 3600
findtime = 600
EOF

Restart and verify:

sudo systemctl restart fail2ban

# Both jails should be listed
sudo fail2ban-client status

# SSH jail details
sudo fail2ban-client status sshd

# Caddy jail details (will show 0 bans initially)
sudo fail2ban-client status caddy-auth

20.3 Unattended Security Upgrades

Install and enable automatic security updates:

sudo apt install -y unattended-upgrades apt-listchanges
sudo dpkg-reconfigure -plow unattended-upgrades

This enables security-only updates with automatic reboot disabled (safe default). Verify:

sudo unattended-upgrades --dry-run 2>&1 | head -10
cat /etc/apt/apt.conf.d/20auto-upgrades

Expected 20auto-upgrades content:

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";

20.4 Clean Up Legacy Docker Network

After deploying with network segmentation, the old default network may remain:

# Check if orion_default still exists
docker network ls | grep orion_default

# Remove it (safe — no containers should be using it)
docker network rm orion_default 2>/dev/null || echo "Already removed"

20.5 Verification

# fail2ban jails active (should show sshd and caddy-auth)
sudo fail2ban-client status

# SSH jail details
sudo fail2ban-client status sshd

# Docker networks (should show 3: frontend, backend, monitoring)
docker network ls | grep orion

# Unattended upgrades configured
sudo unattended-upgrades --dry-run 2>&1 | head

# Caddy access log being written
sudo tail -1 /var/log/caddy/access.log

Step 21: Cloudflare Domain Proxy

Move DNS to Cloudflare for WAF, DDoS protection, and CDN. This step involves DNS propagation — do it during a maintenance window.

!!! warning "DNS changes affect all services" Moving nameservers involves propagation delay (minutes to hours). Plan for brief interruption. Do this step last, after Steps 1920 are verified.

21.1 Pre-Migration: Record Email DNS

Before changing nameservers, document all email-related DNS records:

# Run for each domain (wizard.lu, omsflow.lu, rewardflow.lu)
dig wizard.lu MX +short
dig wizard.lu TXT +short
dig _dmarc.wizard.lu TXT +short
dig default._domainkey.wizard.lu TXT +short  # DKIM selector may vary

Save the output — you'll need to verify these exist after Cloudflare import.

21.2 Add Domains to Cloudflare

  1. Log in to Cloudflare Dashboard
  2. Add a site for each domain: wizard.lu, omsflow.lu, rewardflow.lu
  3. Select Free plan → choose Full setup (nameserver-based, not CNAME/partial)
  4. Block AI crawlers on all pages
  5. Cloudflare auto-scans and imports existing DNS records — review carefully:
    • Delete any stale CNAME records (leftover from partial setup)
    • Add missing A/AAAA records manually (Cloudflare scan may miss some)
    • Verify MX/SPF/DKIM/DMARC records are present before changing NS
    • Email records (MX, TXT) must stay as DNS-only (grey cloud) — never proxy MX records
  6. Set proxy status:
    • Orange cloud (proxied): @, www, api, flower, grafana — gets WAF + CDN
    • Grey cloud (DNS only): git — needs direct access for SSH on port 2222

21.3 Change Nameservers

At your domain registrar (Netim), update NS records to Cloudflare's assigned nameservers. Cloudflare shows the exact pair during activation (e.g., name1.ns.cloudflare.com, name2.ns.cloudflare.com).

Disable DNSSEC at the registrar before switching NS — re-enable later via Cloudflare.

21.4 Generate Origin Certificates

Cloudflare Origin Certificates (free, 15-year validity) avoid ACME challenge issues when traffic is proxied:

  1. In Cloudflare: SSL/TLS > Origin Server > Create Certificate
  2. Generate for each domain with specific subdomains (not wildcards):
    • wizard.lu: wizard.lu, api.wizard.lu, www.wizard.lu, flower.wizard.lu, grafana.wizard.lu
    • omsflow.lu: omsflow.lu, www.omsflow.lu
    • rewardflow.lu: rewardflow.lu, www.rewardflow.lu
  3. Download the certificate and private key (private key is shown only once)

!!! warning "Do NOT use wildcard origin certs for wizard.lu" A *.wizard.lu wildcard cert will match git.wizard.lu, which needs a Let's Encrypt cert (DNS-only, not proxied through Cloudflare). Use specific subdomains instead.

Install on the server:

sudo mkdir -p /etc/caddy/certs/{wizard.lu,omsflow.lu,rewardflow.lu}

# For each domain, create cert.pem and key.pem:
sudo nano /etc/caddy/certs/wizard.lu/cert.pem    # paste certificate
sudo nano /etc/caddy/certs/wizard.lu/key.pem      # paste private key
# Repeat for omsflow.lu and rewardflow.lu

sudo chown -R caddy:caddy /etc/caddy/certs/
sudo chmod 600 /etc/caddy/certs/*/key.pem

21.5 Update Caddyfile

For Cloudflare-proxied domains, use explicit TLS with origin certs. Keep auto-HTTPS for git.wizard.lu (DNS-only, grey cloud):

{
    log {
        output file /var/log/caddy/access.log {
            roll_size 100MiB
            roll_keep 5
        }
        format json
    }
}

# ─── Platform 1: Main (wizard.lu) ───────────────────────────
wizard.lu {
    tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
    reverse_proxy localhost:8001
}

www.wizard.lu {
    tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
    redir https://wizard.lu{uri} permanent
}

# ─── Platform 2: OMS (omsflow.lu) ───────────────────────────
omsflow.lu {
    tls /etc/caddy/certs/omsflow.lu/cert.pem /etc/caddy/certs/omsflow.lu/key.pem
    reverse_proxy localhost:8001
}

www.omsflow.lu {
    tls /etc/caddy/certs/omsflow.lu/cert.pem /etc/caddy/certs/omsflow.lu/key.pem
    redir https://omsflow.lu{uri} permanent
}

# ─── Platform 3: Loyalty+ (rewardflow.lu) ──────────────────
rewardflow.lu {
    tls /etc/caddy/certs/rewardflow.lu/cert.pem /etc/caddy/certs/rewardflow.lu/key.pem
    reverse_proxy localhost:8001
}

www.rewardflow.lu {
    tls /etc/caddy/certs/rewardflow.lu/cert.pem /etc/caddy/certs/rewardflow.lu/key.pem
    redir https://rewardflow.lu{uri} permanent
}

# ─── Services (wizard.lu origin cert) ───────────────────────
api.wizard.lu {
    tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
    reverse_proxy localhost:8001
}

flower.wizard.lu {
    tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
    reverse_proxy localhost:5555
}

grafana.wizard.lu {
    tls /etc/caddy/certs/wizard.lu/cert.pem /etc/caddy/certs/wizard.lu/key.pem
    reverse_proxy localhost:3001
}

# ─── DNS-only domain (Let's Encrypt, not proxied by Cloudflare) ─
git.wizard.lu {
    tls {
        issuer acme
    }
    reverse_proxy localhost:3000
}

Restart Caddy:

sudo systemctl restart caddy
sudo systemctl status caddy

21.6 Cloudflare Settings (per domain)

Configure these in the Cloudflare dashboard for each domain (wizard.lu, omsflow.lu, rewardflow.lu):

Setting Location Value
SSL mode SSL/TLS > Overview Full (Strict)
Always Use HTTPS SSL/TLS > Edge Certificates On
Bot Fight Mode Security > Settings On
DDoS protection Security > Security rules > DDoS Active (enabled by default)
AI crawlers Security (during setup) Blocked on all pages

Rate limiting rule (Security > Security rules > Create rule):

Field Value
Match URI Path contains /api/
Characteristics IP
Rate 100 requests per 10 seconds
Action Block
Duration 10 seconds

21.7 Production Environment

Add to ~/apps/orion/.env:

CLOUDFLARE_ENABLED=true

21.8 Verification

# CF proxy active (look for cf-ray header)
curl -I https://wizard.lu | grep cf-ray

# DNS resolves to Cloudflare IPs (not 91.99.65.229)
dig wizard.lu +short

# All domains responding
curl -I https://omsflow.lu
curl -I https://rewardflow.lu
curl -I https://api.wizard.lu/health

# git.wizard.lu still on Let's Encrypt (not CF)
curl -I https://git.wizard.lu

!!! info "git.wizard.lu stays DNS-only" The Gitea instance uses SSH on port 2222 for git operations. Cloudflare proxy only supports HTTP/HTTPS, so git.wizard.lu must remain as DNS-only (grey cloud) with Let's Encrypt auto-SSL via Caddy.


Step 22: Incident Response Runbook

A comprehensive incident response runbook is available at Incident Response. It includes:

  • Severity levels: SEV-1 (platform down, <15min), SEV-2 (feature broken, <1h), SEV-3 (minor, <4h)
  • Quick diagnosis decision tree: SSH → Docker → containers → Caddy → DNS
  • 8 runbooks with copy-paste commands for common incidents
  • Post-incident report template
  • Monitoring URLs quick reference

Step 23: Environment Reference

A complete environment variables reference is available at Environment Variables. It documents all 55+ configuration variables from app/core/config.py, grouped by category with defaults and production requirements.


Step 24: Documentation Updates

This document has been updated with Steps 1924. Additional documentation changes:

  • docs/deployment/incident-response.md — new incident response runbook
  • docs/deployment/environment.md — complete env var reference (was empty)
  • docs/deployment/launch-readiness.md — updated with Feb 2026 infrastructure status
  • mkdocs.yml — incident-response.md added to nav

Step 25: Google Wallet Integration

Enable loyalty card passes in Google Wallet so customers can add their loyalty card to their Android phone.

Prerequisites

  • Google account (personal Gmail is fine)
  • Loyalty module deployed and working

25.1 Google Pay & Wallet Console

Register as a Google Wallet Issuer:

  1. Go to pay.google.com/business/console
  2. Enter your business name (e.g., "Letzshop" or your company name) — this is for Google's review, customers don't see it on passes
  3. Note your Issuer ID from the Google Wallet API section

!!! info "Issuer ID" The Issuer ID is a long numeric string (e.g., 3388000000023089598). You'll find it under Google Wallet API → Manage in the Pay & Wallet Console.

25.2 Google Cloud Project

  1. Go to console.cloud.google.com
  2. Create a new project (e.g., "Orion")
  3. Enable the Google Wallet API:
    • Navigate to "APIs & Services" → "Library"
    • Search for "Google Wallet API" and enable it

25.3 Service Account

Create a service account for API access:

  1. Go to "APIs & Services" → "Credentials" → "Create Credentials"
  2. Select Google Wallet API as the API
  3. Select Application data (not user data — your backend calls the API directly)
  4. Name the service account (e.g., wallet-service)
  5. Click "Done"

Download the JSON key:

  1. Go to "IAM & Admin" → "Service Accounts"
  2. Click on the service account you created
  3. Go to Keys tab → Add KeyCreate new keyJSON
  4. Save the downloaded .json file securely
  1. Go back to pay.google.com/business/console
  2. In the left sidebar, click Users (not inside the Wallet API section)
  3. Invite the service account email (e.g., wallet-service@orion-488322.iam.gserviceaccount.com)
  4. Assign Admin role
  5. Verify it appears in the users list

!!! warning "Common mistake" The "Users" link is in the left sidebar of the Pay & Wallet Console, not inside the "Google Wallet API" → "Manage" section. The Manage page has "Setup test accounts" which is a different feature.

25.5 Deploy to Server

Upload the service account JSON key to the Hetzner server:

# From your local machine
scp /path/to/orion-488322-xxxxx.json samir@91.99.65.229:~/apps/orion/google-wallet-sa.json

Add the environment variables to the production .env:

ssh samir@91.99.65.229
cd ~/apps/orion
nano .env

Add:

# Google Wallet (Loyalty Module)
LOYALTY_GOOGLE_ISSUER_ID=3388000000023089598
LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON=/app/google-wallet-sa.json

!!! note "Docker path" The path must be relative to the Docker container's filesystem. If the file is in ~/apps/orion/, it maps to /app/ inside the container (check your docker-compose.yml volumes).

Mount the JSON file in docker-compose.yml if not already covered by the app volume:

services:
  api:
    volumes:
      - ./google-wallet-sa.json:/app/google-wallet-sa.json:ro

Restart the application:

docker compose --profile full up -d --build

25.6 Platform-Level Configuration

Google Wallet is a platform-wide setting — all merchants on the platform share the same Issuer ID and service account. Merchants don't need to configure anything; wallet integration activates automatically when the env vars are set.

The two required env vars:

# In production .env
LOYALTY_GOOGLE_ISSUER_ID=3388000000023089598
LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON=/app/google-wallet-sa.json

When both are set, every loyalty program on the platform automatically gets Google Wallet support: enrollment creates wallet passes, stamp/points operations sync to passes, and the storefront shows "Add to Google Wallet" buttons.

25.7 Verify Configuration

Check the API health and wallet service status:

# Check the app logs for wallet service initialization
docker compose --profile full logs api | grep -i "wallet\|loyalty"

# Test via API — enroll a customer and check the response for wallet URLs
curl -s https://api.wizard.lu/health | python3 -m json.tool

25.8 Testing Google Wallet Passes

Google provides a demo mode — passes work in test without full production approval:

  1. Console admins and developers (your Google account) can always test passes
  2. For additional testers, add their Gmail addresses in Pay & Wallet Console → Google Wallet API → Manage → Setup test accounts
  3. Use walletobjects.sandbox scope for initial testing (the code uses wallet_object.issuer which covers both)

End-to-end test flow:

  1. Create a loyalty program via the store panel and set the Google Wallet Issuer ID in Settings → Digital Wallet
  2. Enroll a customer (via store or storefront self-enrollment)
    • The system automatically creates a Google Wallet LoyaltyClass (for the program) and LoyaltyObject (for the card)
  3. Open the storefront loyalty dashboard — the "Add to Google Wallet" button appears
  4. Click the button (or open the URL on an Android device) — the pass is added to Google Wallet
  5. Add a stamp or points — the pass in Google Wallet auto-updates (no push needed, Google syncs)

25.9 Local Development Setup

You can test the full Google Wallet integration from your local machine:

# In your local .env
LOYALTY_GOOGLE_ISSUER_ID=3388000000023089598
LOYALTY_GOOGLE_SERVICE_ACCOUNT_JSON=/path/to/orion-488322-xxxxx.json

The GoogleWalletService calls Google's REST API directly over HTTPS — no special network configuration needed. The same service account JSON works on both local and server environments.

Local testing checklist:

  • Service account JSON downloaded and path set in env
  • LOYALTY_GOOGLE_ISSUER_ID set in env
  • Start the app locally: python3 -m uvicorn main:app --reload
  • Enroll a customer → check logs for "Created Google Wallet class" and "Created Google Wallet object"
  • Open storefront dashboard → "Add to Google Wallet" button should appear
  • Open the wallet URL on Android → pass added to Google Wallet
  • Add stamps → check logs for "Updated Google Wallet object", verify pass updates

25.10 How It Works (Architecture)

The integration is fully automatic — no manual API calls needed after initial setup.

┌─────────────┐     ┌──────────────┐     ┌─────────────────────┐
│  Merchant    │────▶│  Orion API   │────▶│  Google Wallet API  │
│  sets issuer │     │              │     │                     │
│  ID in UI    │     │              │     │                     │
└─────────────┘     └──────────────┘     └─────────────────────┘

┌─────────────┐     ┌──────────────┐     ┌─────────────────────┐
│  Customer    │────▶│  Orion API   │────▶│  Google Wallet API  │
│  enrolls     │     │              │     │                     │
│              │     │create_class +│     │ POST /loyaltyClass  │
│              │     │create_object │     │ POST /loyaltyObject │
│              │◀────│ save_url     │     │                     │
│              │     └──────────────┘     └─────────────────────┘
│  taps "Add   │
│  to Wallet"  │────▶ Google Wallet app adds pass automatically
└─────────────┘

┌─────────────┐     ┌──────────────┐     ┌─────────────────────┐
│  Staff adds  │────▶│  Orion API   │────▶│  Google Wallet API  │
│  stamp/pts   │     │              │     │                     │
│              │     │update_object │     │ PATCH /loyaltyObject│
└─────────────┘     └──────────────┘     └─────────────────────┘
                                          Pass auto-updates on
                                          customer's phone

Automatic triggers:

Event Wallet Action Service Call
Customer enrolls Create class (if first) + create object wallet_service.create_wallet_objects()
Stamp added/redeemed/voided Update object with new balance wallet_service.sync_card_to_wallets()
Points earned/redeemed/voided/adjusted Update object with new balance wallet_service.sync_card_to_wallets()
Customer opens dashboard Generate save URL (JWT, 1h expiry) wallet_service.get_add_to_wallet_urls()

No push notifications needed — Google syncs object changes automatically.

25.11 Next Steps

After Google Wallet is verified working:

  1. Submit for Google production approval — required before non-test users can add passes
  2. Apple Wallet — separate setup requiring Apple Developer account, APNs certificates, and pass signing certificates (see Loyalty Module docs)

Domain & Port Reference

Service Internal Port External Port Domain (via Caddy)
Orion API 8000 8001 api.wizard.lu
Main Platform 8000 8001 wizard.lu
OMS Platform 8000 8001 omsflow.lu
Loyalty+ Platform 8000 8001 rewardflow.lu
PostgreSQL 5432 5432 (internal only)
Redis 6379 6380 (internal only)
Flower 5555 5555 flower.wizard.lu
Gitea 3000 3000 git.wizard.lu
Prometheus 9090 9090 (localhost) (internal only)
Grafana 3000 3001 (localhost) grafana.wizard.lu
Node Exporter 9100 9100 (localhost) (internal only)
cAdvisor 8080 8080 (localhost) (internal only)
Redis Exporter 9121 9121 (localhost) (internal only)
Alertmanager 9093 9093 (localhost) (internal only)
Caddy 80, 443 (reverse proxy)

!!! note "Single backend, multiple domains" All platform domains route to the same FastAPI backend. The PlatformContextMiddleware identifies the platform from the Host header. See Multi-Platform Architecture for details.

Directory Structure on Server

~/
├── apps/
│   └── orion/                   # Orion application
│       ├── .env                 # Production environment
│       ├── docker-compose.yml   # App stack (API, DB, Redis, Celery, monitoring)
│       ├── monitoring/          # Prometheus + Grafana config
│       ├── logs/                # Application logs
│       ├── uploads/             # User uploads
│       └── exports/             # Export files
├── backups/
│   ├── orion/
│   │   ├── daily/              # 7-day retention
│   │   └── weekly/             # 4-week retention
│   └── gitea/
│       ├── daily/
│       └── weekly/
├── gitea/
│   └── docker-compose.yml       # Gitea + PostgreSQL
└── gitea-runner/                # CI/CD runner (act_runner v0.2.13)
    ├── act_runner               # symlink → act_runner-0.2.13-linux-arm64
    ├── act_runner-0.2.13-linux-arm64
    └── .runner                  # registration config

Troubleshooting

Permission denied on logs

The Docker container runs as appuser (UID 1000). Host-mounted volumes need matching ownership:

sudo chown -R 1000:1000 logs uploads exports

Celery workers restarting

Check logs for import errors:

docker compose --profile full logs celery-worker --tail 30

Common cause: stale task module references in app/core/celery_config.py.

SSH service name on Ubuntu 24.04

Ubuntu 24.04 uses ssh not sshd:

sudo systemctl restart ssh    # correct
sudo systemctl restart sshd   # will fail

git pull fails with local changes

If docker-compose.yml was edited on the server (e.g. passwords), stash before pulling:

git stash
git pull
git stash pop

Maintenance

Deploy updates

Deployments happen automatically when pushing to master (see Step 16). For manual deploys:

cd ~/apps/orion && bash scripts/deploy.sh

The script handles stashing local changes, pulling, rebuilding containers, running migrations, and health checks.

View logs

# Follow all logs in real-time
docker compose --profile full logs -f

# Follow a specific service
docker compose --profile full logs -f api
docker compose --profile full logs -f celery-worker
docker compose --profile full logs -f celery-beat
docker compose --profile full logs -f flower

# View last N lines (useful for debugging crashes)
docker compose --profile full logs --tail=50 api
docker compose --profile full logs --tail=100 celery-worker

# Filter logs for errors
docker compose --profile full logs api | grep -i "error\|exception\|failed"

Check container status

# Overview of all containers (health, uptime, ports)
docker compose --profile full ps

# Watch for containers stuck in "Restarting" — indicates a crash loop
# Healthy containers show: Up Xs (healthy)

Restart services

# Restart a single service
docker compose --profile full restart api

# Restart everything
docker compose --profile full restart

# Full rebuild (after code changes)
docker compose --profile full up -d --build

Quick access URLs

After Caddy is configured:

Service URL
Main Platform https://wizard.lu
API Swagger docs https://api.wizard.lu/docs
API ReDoc https://api.wizard.lu/redoc
Admin panel https://wizard.lu/admin/login
Health check https://api.wizard.lu/health
Prometheus metrics https://api.wizard.lu/metrics
Gitea https://git.wizard.lu
Flower https://flower.wizard.lu
Grafana https://grafana.wizard.lu
OMS Platform https://omsflow.lu
Loyalty+ Platform https://rewardflow.lu

Direct IP access (temporary, until firewall rules are removed):

Service URL
API http://91.99.65.229:8001/docs
Gitea http://91.99.65.229:3000
Flower http://91.99.65.229:5555