- Mark Steps 1-18 as fully complete (R2 offsite backups operational) - Fix awscli install instructions: pip3 instead of apt (Ubuntu 24.04) - Add Environment PATH to systemd service for ~/.local/bin/aws - Add --upload flag to systemd ExecStart now that R2 is configured Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
36 KiB
Hetzner Cloud Server Setup
Complete step-by-step guide for deploying Orion on a Hetzner Cloud VPS.
!!! info "Server Details"
- Provider: Hetzner Cloud
- OS: Ubuntu 24.04.3 LTS (upgraded to 24.04.4 after updates)
- Architecture: aarch64 (ARM64)
- IP: 91.99.65.229
- IPv6: 2a01:4f8:1c1a:b39c::1
- Disk: 37 GB
- RAM: 4 GB
- Auth: SSH key (configured via Hetzner Console)
- Setup date: 2026-02-11
!!! success "Progress — 2026-02-12" Completed (Steps 1–16):
- Non-root user `samir` with SSH key
- Server hardened (UFW firewall, SSH root login disabled, fail2ban)
- Docker 29.2.1 & Docker Compose 5.0.2 installed
- Gitea running at `https://git.wizard.lu` (user: `sboulahtit`, repo: `orion`)
- Repository cloned to `~/apps/orion`
- Production `.env` configured with generated secrets
- Full Docker stack deployed (API, PostgreSQL, Redis, Celery worker/beat, Flower)
- Database migrated (76 tables) and seeded (admin, platforms, CMS, email templates)
- API verified at `https://api.wizard.lu/health`
- DNS A records configured and propagated for `wizard.lu` and subdomains
- Caddy 2.10.2 reverse proxy with auto-SSL (Let's Encrypt)
- Temporary firewall rules removed (ports 3000, 8001)
- Gitea Actions runner v0.2.13 registered and running as systemd service
- SSH key added to Gitea for local push via SSH
- Git remote updated: `ssh://git@git.wizard.lu:2222/sboulahtit/orion.git`
- ProxyHeadersMiddleware added for correct HTTPS behind Caddy
- Fixed TierLimitExceededException import and Pydantic @field_validator bugs
- `wizard.lu` serving frontend with CSS over HTTPS (mixed content fixed)
- `/merchants` and `/admin` redirect fix (CMS catch-all was intercepting)
!!! success "Progress — 2026-02-13" Completed:
- CI fully green: ruff (lint), pytest, architecture, docs all passing
- Pinned ruff==0.8.4 in requirements-dev.txt (CI/local version mismatch was root cause of recurring I001 errors)
- Pre-commit hooks configured and installed (ruff auto-fix, architecture validation, trailing whitespace, end-of-file)
- AAAA (IPv6) records added for all wizard.lu domains
- mkdocs build clean (zero warnings) — all 32 orphan pages added to nav
- Pre-commit documented in `docs/development/code-quality.md`
- **Step 16: Continuous deployment** — auto-deploy on push to master via `scripts/deploy.sh` + Gitea Actions
**Next steps:**
- [x] Step 17: Backups
- [x] Step 18: Monitoring & observability
**Deferred (not urgent, do when all platforms ready):**
- [x] ~~DNS A + AAAA records for platform domains (`omsflow.lu`, `rewardflow.lu`)~~
- [x] ~~Uncomment platform domains in Caddyfile after DNS propagation~~
!!! success "Progress — 2026-02-14" Completed:
- **Wizamart → Orion rename** — 1,086 occurrences replaced across 184 files (database identifiers, email addresses, domains, config, templates, docs, seed data)
- Template renamed: `homepage-wizamart.html` → `homepage-orion.html`
- **Production DB rebuilt from scratch** with Orion naming (`orion_db`, `orion_user`)
- Platform domains configured in seed data: wizard.lu (main), omsflow.lu, rewardflow.lu (loyalty)
- Docker volume explicitly named `orion_postgres_data`
- `.dockerignore` added — prevents `.env` from being baked into Docker images
- `env_file: .env` added to `docker-compose.yml` — containers load host env vars properly
- `CapacitySnapshot` model import fixed (moved from billing to monitoring in `alembic/env.py`)
- All services verified healthy at `https://api.wizard.lu/health`
- **Step 17: Backups** — automated pg_dump scripts (daily + weekly rotation), R2 offsite upload, restore helper
- **Step 18: Monitoring** — Prometheus, Grafana, node-exporter, cAdvisor added to docker-compose; `/metrics` endpoint activated via `prometheus_client`
!!! success "Progress — 2026-02-15" Completed:
- **Step 17 server-side**: Hetzner backups enabled (5 of 7 daily images, last 6.22 GB)
- **Step 18 server-side**: Full monitoring stack deployed — Prometheus (4/4 targets up), Grafana at `https://grafana.wizard.lu` with Node Exporter Full (#1860) and Docker/cAdvisor (#193) dashboards
- **Domain rename**: `oms.lu` → `omsflow.lu`, `loyalty.lu` → `rewardflow.lu` across entire codebase (19 + 13 files)
- **Platform domains live**: all three platforms serving HTTPS via Caddy with auto-SSL
- `https://wizard.lu` (main)
- `https://omsflow.lu` (OMS)
- `https://rewardflow.lu` (Loyalty+)
- Platform `domain` column updated in production DB
- RAM usage ~2.4 GB on 4 GB server (stable, CI jobs add ~550 MB temporarily)
- **Systemd backup timer** (`orion-backup.timer`) — daily at 03:00 UTC, tested manually
- **Cloudflare R2 offsite backups** — `orion-backups` bucket, `awscli` configured with `--profile r2`, `--upload` flag added to systemd timer
- `python3-pip` and `awscli` installed on server (pip user install, PATH added to `.bashrc` and systemd service)
**Steps 1–18 fully complete.** All infrastructure operational.
Installed Software Versions
| Software | Version |
|---|---|
| Ubuntu | 24.04.4 LTS |
| Kernel | 6.8.0-100-generic (aarch64) |
| Docker | 29.2.1 |
| Docker Compose | 5.0.2 |
| PostgreSQL | 15 (container) |
| Redis | 7-alpine (container) |
| Python | 3.11-slim (container) |
| Gitea | latest (container) |
| Caddy | 2.10.2 |
| act_runner | 0.2.13 |
Step 1: Initial Server Access
ssh root@91.99.65.229
Step 2: Create Non-Root User
Create a dedicated user with sudo privileges and copy the SSH key:
# Create user
adduser samir
usermod -aG sudo samir
# Copy SSH keys to new user
rsync --archive --chown=samir:samir ~/.ssh /home/samir
Verify by connecting as the new user (from a new terminal):
ssh samir@91.99.65.229
Step 3: System Update & Essential Packages
sudo apt update && sudo apt upgrade -y
sudo apt install -y \
curl \
git \
wget \
ufw \
fail2ban \
htop \
unzip \
make
Reboot if a kernel upgrade is pending:
sudo reboot
Step 4: Firewall Configuration (UFW)
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
Verify:
sudo ufw status
Expected output:
Status: active
To Action From
-- ------ ----
OpenSSH ALLOW Anywhere
80/tcp ALLOW Anywhere
443/tcp ALLOW Anywhere
Step 5: Harden SSH
!!! warning "Before doing this step"
Make sure you can SSH as samir from another terminal first!
If you lock yourself out, you'll need to use Hetzner's console rescue mode.
sudo sed -i 's/^#\?PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart ssh # Note: Ubuntu 24.04 uses 'ssh' not 'sshd'
Step 6: Install Docker & Docker Compose
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker samir
Log out and back in for the group change:
exit
# Then: ssh samir@91.99.65.229
Verify:
docker --version
docker compose version
Step 7: Gitea (Self-Hosted Git)
Create the Gitea directory and compose file:
mkdir -p ~/gitea && cd ~/gitea
Create docker-compose.yml with nano ~/gitea/docker-compose.yml:
services:
gitea:
image: gitea/gitea:latest
container_name: gitea
restart: always
environment:
- USER_UID=1000
- USER_GID=1000
- GITEA__database__DB_TYPE=postgres
- GITEA__database__HOST=gitea-db:5432
- GITEA__database__NAME=gitea
- GITEA__database__USER=gitea
- GITEA__database__PASSWD=<GENERATED_PASSWORD>
- GITEA__server__ROOT_URL=http://91.99.65.229:3000/
- GITEA__server__SSH_DOMAIN=91.99.65.229
- GITEA__server__DOMAIN=91.99.65.229
- GITEA__actions__ENABLED=true
volumes:
- gitea-data:/data
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
ports:
- "3000:3000"
- "2222:22"
depends_on:
gitea-db:
condition: service_healthy
gitea-db:
image: postgres:15
container_name: gitea-db
restart: always
environment:
POSTGRES_DB: gitea
POSTGRES_USER: gitea
POSTGRES_PASSWORD: <GENERATED_PASSWORD>
volumes:
- gitea-db-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U gitea"]
interval: 10s
timeout: 5s
retries: 5
volumes:
gitea-data:
gitea-db-data:
Generate the database password with openssl rand -hex 16 and replace <GENERATED_PASSWORD> in both places.
Open the firewall for Gitea and start:
sudo ufw allow 3000/tcp
docker compose up -d
docker compose ps
Visit http://91.99.65.229:3000 and complete the setup wizard. Create an admin account (e.g. sboulahtit).
Then create a repository (e.g. orion).
Step 8: Push Repository to Gitea
Add SSH Key to Gitea
Before pushing via SSH, add your local machine's public key to Gitea:
-
Copy your public key:
cat ~/.ssh/id_ed25519.pub # Or if using RSA: cat ~/.ssh/id_rsa.pub -
In the Gitea web UI: click your avatar → Settings → SSH / GPG Keys → Add Key → paste the key.
-
Add the Gitea SSH host to known hosts:
ssh-keyscan -p 2222 git.wizard.lu >> ~/.ssh/known_hosts
Add Remote and Push
From your local machine:
cd /home/samir/Documents/PycharmProjects/letzshop-product-import
git remote add gitea ssh://git@git.wizard.lu:2222/sboulahtit/orion.git
git push gitea master
!!! note "Remote URL updated"
The remote was initially set to http://91.99.65.229:3000/... during setup.
After Caddy was configured, it was updated to use the domain with SSH:
ssh://git@git.wizard.lu:2222/sboulahtit/orion.git
To update an existing remote:
```bash
git remote set-url gitea ssh://git@git.wizard.lu:2222/sboulahtit/orion.git
```
Step 9: Clone Repository on Server
mkdir -p ~/apps
cd ~/apps
git clone http://localhost:3000/sboulahtit/orion.git
cd orion
Step 10: Configure Production Environment
cp .env.example .env
nano .env
Critical Production Values
Generate secrets:
openssl rand -hex 32 # For JWT_SECRET_KEY
openssl rand -hex 16 # For database password
| Variable | How to Generate / What to Set |
|---|---|
DEBUG |
False |
DATABASE_URL |
postgresql://orion_user:YOUR_DB_PW@db:5432/orion_db |
JWT_SECRET_KEY |
Output of openssl rand -hex 32 |
ADMIN_PASSWORD |
Strong password |
USE_CELERY |
true |
REDIS_URL |
redis://redis:6379/0 |
STRIPE_SECRET_KEY |
Your Stripe secret key (configure later) |
STRIPE_PUBLISHABLE_KEY |
Your Stripe publishable key (configure later) |
STRIPE_WEBHOOK_SECRET |
Your Stripe webhook secret (configure later) |
STORAGE_BACKEND |
r2 (if using Cloudflare R2, configure later) |
Also update the PostgreSQL password in docker-compose.yml (lines 9 and 40) to match.
Step 11: Deploy with Docker Compose
cd ~/apps/orion
# Create directories with correct permissions for the container user
mkdir -p logs uploads exports
sudo chown -R 1000:1000 logs uploads exports
# Start infrastructure first
docker compose up -d db redis
# Wait for health checks to pass
docker compose ps
# Build and start the full stack
docker compose --profile full up -d --build
Verify all services are running:
docker compose --profile full ps
Expected: api (healthy), db (healthy), redis (healthy), celery-worker (healthy), celery-beat (running), flower (running).
Step 12: Initialize Database
!!! note "PYTHONPATH required"
The seed scripts need PYTHONPATH=/app set explicitly when running inside the container.
# Run migrations (use 'heads' for multi-branch Alembic)
docker compose --profile full exec -e PYTHONPATH=/app api python -m alembic upgrade heads
# Seed production data
docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/init_production.py
docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/init_log_settings.py
docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/create_default_content_pages.py
docker compose --profile full exec -e PYTHONPATH=/app api python scripts/seed/seed_email_templates.py
Seeded Data Summary
| Data | Count |
|---|---|
| Admin users | 1 (admin@wizard.lu) |
| Platforms | 3 (OMS, Main, Loyalty+) |
| Admin settings | 15 |
| Subscription tiers | 4 (Essential, Professional, Business, Enterprise) |
| Log settings | 6 |
| CMS pages | 8 (About, Contact, FAQ, Shipping, Returns, Privacy, Terms, Homepage) |
| Email templates | 17 (4 languages: en, fr, de, lb) |
Step 13: DNS Configuration
Before setting up Caddy, point your domain's DNS to the server.
wizard.lu (Main Platform) — Completed
| Type | Name | Value | TTL |
|---|---|---|---|
| A | @ |
91.99.65.229 |
300 |
| A | www |
91.99.65.229 |
300 |
| A | api |
91.99.65.229 |
300 |
| A | git |
91.99.65.229 |
300 |
| A | flower |
91.99.65.229 |
300 |
omsflow.lu (OMS Platform) — Completed
| Type | Name | Value | TTL |
|---|---|---|---|
| A | @ |
91.99.65.229 |
300 |
| A | www |
91.99.65.229 |
300 |
| AAAA | @ |
2a01:4f8:1c1a:b39c::1 |
300 |
| AAAA | www |
2a01:4f8:1c1a:b39c::1 |
300 |
rewardflow.lu (Loyalty+ Platform) — Completed
| Type | Name | Value | TTL |
|---|---|---|---|
| A | @ |
91.99.65.229 |
300 |
| A | www |
91.99.65.229 |
300 |
| AAAA | @ |
2a01:4f8:1c1a:b39c::1 |
300 |
| AAAA | www |
2a01:4f8:1c1a:b39c::1 |
300 |
IPv6 (AAAA) Records — TODO
Optional but recommended. Add AAAA records for all domains above, pointing to the server's IPv6 address. Verify your IPv6 address first:
ip -6 addr show eth0 | grep 'scope global'
It should match the value in the Hetzner Cloud Console (Networking tab). Then create AAAA records mirroring each A record above, e.g.:
| Type | Name (wizard.lu) | Value | TTL |
|---|---|---|---|
| AAAA | @ |
2a01:4f8:1c1a:b39c::1 |
300 |
| AAAA | www |
2a01:4f8:1c1a:b39c::1 |
300 |
| AAAA | api |
2a01:4f8:1c1a:b39c::1 |
300 |
| AAAA | git |
2a01:4f8:1c1a:b39c::1 |
300 |
| AAAA | flower |
2a01:4f8:1c1a:b39c::1 |
300 |
Repeat for omsflow.lu and rewardflow.lu.
!!! tip "DNS propagation"
Set TTL to 300 (5 minutes) initially. DNS changes can take up to 24 hours to propagate globally, but usually complete within 30 minutes. Verify with: dig api.wizard.lu +short
Step 14: Reverse Proxy with Caddy
Install Caddy:
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' \
| sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \
| sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install caddy
Caddyfile Configuration
Edit /etc/caddy/Caddyfile:
# ─── Platform 1: Main (wizard.lu) ───────────────────────────
wizard.lu {
reverse_proxy localhost:8001
}
www.wizard.lu {
redir https://wizard.lu{uri} permanent
}
# ─── Platform 2: OMS (omsflow.lu) ───────────────────────────────
omsflow.lu {
reverse_proxy localhost:8001
}
www.omsflow.lu {
redir https://omsflow.lu{uri} permanent
}
# ─── Platform 3: Loyalty+ (rewardflow.lu) ──────────────────
rewardflow.lu {
reverse_proxy localhost:8001
}
www.rewardflow.lu {
redir https://rewardflow.lu{uri} permanent
}
# ─── Services ───────────────────────────────────────────────
api.wizard.lu {
reverse_proxy localhost:8001
}
git.wizard.lu {
reverse_proxy localhost:3000
}
flower.wizard.lu {
reverse_proxy localhost:5555
}
!!! info "How multi-platform routing works"
All platform domains (wizard.lu, omsflow.lu, rewardflow.lu) point to the same FastAPI backend on port 8001. The PlatformContextMiddleware reads the Host header to detect which platform the request is for. Caddy preserves the Host header by default, so no extra configuration is needed.
The `domain` column in the `platforms` database table must match:
| Platform | code | domain |
|---|---|---|
| Main | `main` | `wizard.lu` |
| OMS | `oms` | `omsflow.lu` |
| Loyalty+ | `loyalty` | `rewardflow.lu` |
Start Caddy:
sudo systemctl restart caddy
Caddy automatically provisions Let's Encrypt SSL certificates for all configured domains.
Verify:
curl -I https://wizard.lu
curl -I https://api.wizard.lu/health
curl -I https://git.wizard.lu
After Caddy is working, remove the temporary firewall rules:
sudo ufw delete allow 3000/tcp
sudo ufw delete allow 8001/tcp
Update Gitea's configuration to use its new domain. In ~/gitea/docker-compose.yml, change:
- GITEA__server__ROOT_URL=https://git.wizard.lu/
- GITEA__server__SSH_DOMAIN=git.wizard.lu
- GITEA__server__DOMAIN=git.wizard.lu
Then restart Gitea:
cd ~/gitea && docker compose up -d gitea
Future: Multi-Tenant Store Routing
Stores on each platform use two routing modes:
- Standard (subdomain):
acme.omsflow.lu— included in the base subscription - Premium (custom domain):
acme.lu— available with premium subscription tiers
Both modes are handled by the StoreContextMiddleware which reads the Host header, so Caddy just needs to forward requests and preserve the header.
Wildcard Subdomains (for store subdomains)
When stores start using subdomains like acme.omsflow.lu, add wildcard blocks:
*.omsflow.lu {
reverse_proxy localhost:8001
}
*.rewardflow.lu {
reverse_proxy localhost:8001
}
*.wizard.lu {
reverse_proxy localhost:8001
}
!!! warning "Wildcard SSL requires DNS challenge"
Let's Encrypt cannot issue wildcard certificates via HTTP challenge. Wildcard certs require a DNS challenge, which means installing a Caddy DNS provider plugin (e.g. caddy-dns/cloudflare) and configuring API credentials for your DNS provider. See Caddy DNS challenge docs.
Custom Store Domains (for premium stores)
When premium stores bring their own domains (e.g. acme.lu), use Caddy's on-demand TLS:
https:// {
tls {
on_demand
}
reverse_proxy localhost:8001
}
On-demand TLS auto-provisions SSL certificates when a new domain connects. Add an ask endpoint to validate that the domain is registered in the store_domains table, preventing abuse:
tls {
on_demand
ask http://localhost:8001/api/v1/internal/verify-domain
}
!!! note "Not needed yet" Wildcard subdomains and custom domains are future work. The current Caddyfile handles all platform root domains and service subdomains.
Step 15: Gitea Actions Runner
!!! warning "ARM64 architecture"
This server is ARM64. Download the arm64 binary, not amd64.
Download and install:
mkdir -p ~/gitea-runner && cd ~/gitea-runner
# Download act_runner v0.2.13 (ARM64)
wget https://gitea.com/gitea/act_runner/releases/download/v0.2.13/act_runner-0.2.13-linux-arm64
chmod +x act_runner-0.2.13-linux-arm64
ln -s act_runner-0.2.13-linux-arm64 act_runner
Register the runner (get token from Site Administration > Actions > Runners > Create new Runner):
./act_runner register \
--instance https://git.wizard.lu \
--token YOUR_RUNNER_TOKEN
Accept the default runner name and labels when prompted.
Create a systemd service for persistent operation:
sudo nano /etc/systemd/system/gitea-runner.service
[Unit]
Description=Gitea Actions Runner
After=network.target
[Service]
Type=simple
User=samir
WorkingDirectory=/home/samir/gitea-runner
ExecStart=/home/samir/gitea-runner/act_runner daemon
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable --now gitea-runner
sudo systemctl status gitea-runner
Verify the runner shows as Online in Gitea: Site Administration > Actions > Runners.
Step 16: Continuous Deployment
Automate deployment on every successful push to master. The Gitea Actions runner and the app both run on the same server, so the deploy job SSHes from the CI Docker container to 172.17.0.1 (Docker bridge gateway — see note in 16.2).
push to master
├── ruff ──────┐
├── pytest ────┤
└── validate ──┤
└── deploy (SSH → scripts/deploy.sh)
├── git stash / pull / pop
├── docker compose up -d --build
├── alembic upgrade heads
└── health check (retries)
16.1 Generate Deploy SSH Key (on server)
ssh-keygen -t ed25519 -C "gitea-deploy@wizard.lu" -f ~/.ssh/deploy_ed25519 -N ""
cat ~/.ssh/deploy_ed25519.pub >> ~/.ssh/authorized_keys
16.2 Add Gitea Secrets
In Repository Settings > Actions > Secrets, add:
| Secret | Value |
|---|---|
DEPLOY_SSH_KEY |
Contents of ~/.ssh/deploy_ed25519 (private key) |
DEPLOY_HOST |
172.17.0.1 (Docker bridge gateway — not 127.0.0.1) |
DEPLOY_USER |
samir |
DEPLOY_PATH |
/home/samir/apps/orion |
!!! important "Why 172.17.0.1 and not 127.0.0.1?"
CI jobs run inside Docker containers where 127.0.0.1 is the container, not the host. 172.17.0.1 is the Docker bridge gateway that routes to the host. Ensure the firewall allows SSH from the Docker bridge network: sudo ufw allow from 172.17.0.0/16 to any port 22. When Gitea and Orion are on separate servers, replace with the Orion server's IP.
16.3 Deploy Script
The deploy script lives at scripts/deploy.sh in the repository. It:
- Stashes local changes (preserves
.env) - Pulls latest code (
--ff-only) - Pops stash to restore local changes
- Rebuilds and restarts Docker containers (
docker compose --profile full up -d --build) - Runs database migrations (
alembic upgrade heads) - Health checks
http://localhost:8001/healthwith 12 retries (60s total)
Exit codes: 0 success, 1 git pull failed, 2 docker compose failed, 3 migration failed, 4 health check failed.
16.4 CI Workflow
The deploy job in .gitea/workflows/ci.yml runs only on master push, after ruff, pytest, and validate pass:
deploy:
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/master'
needs: [ruff, pytest, validate]
steps:
- name: Deploy to production
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
key: ${{ secrets.DEPLOY_SSH_KEY }}
port: 22
command_timeout: 10m
script: cd ${{ secrets.DEPLOY_PATH }} && bash scripts/deploy.sh
16.5 Manual Fallback
If CI is down, deploy manually:
cd ~/apps/orion && bash scripts/deploy.sh
16.6 Verify
# All app containers running
cd ~/apps/orion && docker compose --profile full ps
# API health (via Caddy with SSL)
curl https://api.wizard.lu/health
# Main platform
curl -I https://wizard.lu
# Gitea
curl -I https://git.wizard.lu
# Flower
curl -I https://flower.wizard.lu
# Gitea runner status
sudo systemctl status gitea-runner
Step 17: Backups
Three layers of backup protection: Hetzner server snapshots, automated PostgreSQL dumps with local rotation, and offsite sync to Cloudflare R2.
17.1 Enable Hetzner Server Backups
In the Hetzner Cloud Console:
- Go to Servers > select your server > Backups
- Click Enable backups (~20% of server cost, ~1.20 EUR/mo for CAX11)
- Hetzner takes automatic weekly snapshots with 7-day retention
This covers full-disk recovery (OS, Docker volumes, config files) but is coarse-grained. Database-level backups (below) give finer restore granularity.
17.2 Cloudflare R2 Setup (Offsite Backup Storage)
R2 provides S3-compatible object storage with a generous free tier (10 GB storage, 10 million reads/month).
Create Cloudflare account and R2 bucket:
- Sign up at cloudflare.com (free account)
- Go to R2 Object Storage > Create bucket
- Name:
orion-backups, region: automatic - Go to R2 > Manage R2 API Tokens > Create API token
- Permissions: Object Read & Write
- Specify bucket:
orion-backups
- Note the Account ID, Access Key ID, and Secret Access Key
Install and configure AWS CLI on the server:
# awscli is not available via apt on Ubuntu 24.04; install via pip
sudo apt install -y python3-pip
pip3 install awscli --break-system-packages
# Add ~/.local/bin to PATH (pip installs binaries there)
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
aws configure --profile r2
# Access Key ID: <from step 5>
# Secret Access Key: <from step 5>
# Default region name: auto
# Default output format: json
Test connectivity:
aws s3 ls --endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com --profile r2
Add the R2 backup bucket name to your production .env:
R2_BACKUP_BUCKET=orion-backups
17.3 Backup Script
The backup script at scripts/backup.sh handles:
pg_dumpof Orion DB (viadocker exec orion-db-1)pg_dumpof Gitea DB (viadocker exec gitea-db)- On Sundays: copies daily backup to
weekly/subdirectory - Rotation: keeps 7 daily, 4 weekly backups
- Optional
--uploadflag: syncs to Cloudflare R2
# Create backup directories
mkdir -p ~/backups/{orion,gitea}/{daily,weekly}
# Run a manual backup
bash ~/apps/orion/scripts/backup.sh
# Run with R2 upload
bash ~/apps/orion/scripts/backup.sh --upload
# Verify backup integrity
ls -lh ~/backups/orion/daily/
gunzip -t ~/backups/orion/daily/*.sql.gz
17.4 Systemd Timer (Daily at 03:00)
Create the service unit:
sudo nano /etc/systemd/system/orion-backup.service
[Unit]
Description=Orion database backup
After=docker.service
[Service]
Type=oneshot
User=samir
Environment="PATH=/home/samir/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/bin/bash /home/samir/apps/orion/scripts/backup.sh --upload
StandardOutput=journal
StandardError=journal
Create the timer:
sudo nano /etc/systemd/system/orion-backup.timer
[Unit]
Description=Run Orion backup daily at 03:00
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable --now orion-backup.timer
# Verify timer is active
systemctl list-timers orion-backup.timer
# Test manually
sudo systemctl start orion-backup.service
journalctl -u orion-backup.service --no-pager
17.5 Restore Procedure
The restore script at scripts/restore.sh handles the full restore cycle:
# Restore Orion database
bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/orion_20260214_030000.sql.gz
# Restore Gitea database
bash ~/apps/orion/scripts/restore.sh gitea ~/backups/gitea/daily/gitea_20260214_030000.sql.gz
The script will:
- Stop app containers (keep DB running)
- Drop and recreate the database
- Restore from the
.sql.gzbackup - Run Alembic migrations (Orion only)
- Restart all containers
To restore from R2 (if local backups are lost):
# Download from R2
aws s3 sync s3://orion-backups/ ~/backups/ \
--endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com \
--profile r2
# Then restore as usual
bash ~/apps/orion/scripts/restore.sh orion ~/backups/orion/daily/<latest>.sql.gz
17.6 Verification
# Backup files exist
ls -lh ~/backups/orion/daily/
ls -lh ~/backups/gitea/daily/
# Backup integrity
gunzip -t ~/backups/orion/daily/*.sql.gz
# Timer is scheduled
systemctl list-timers orion-backup.timer
# R2 sync (if configured)
aws s3 ls s3://orion-backups/ --endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com --profile r2 --recursive
Step 18: Monitoring & Observability
Prometheus + Grafana monitoring stack with host and container metrics.
Architecture
┌──────────────┐ scrape ┌─────────────────┐
│ Prometheus │◄────────────────│ Orion API │ /metrics
│ :9090 │◄────────────────│ node-exporter │ :9100
│ │◄────────────────│ cAdvisor │ :8080
└──────┬───────┘ └─────────────────┘
│ query
┌──────▼───────┐
│ Grafana │──── https://grafana.wizard.lu
│ :3001 │
└──────────────┘
Resource Budget (4 GB Server)
| Container | RAM Limit | Purpose |
|---|---|---|
| prometheus | 256 MB | Metrics storage (15-day retention, 2 GB max) |
| grafana | 192 MB | Dashboards (SQLite backend) |
| node-exporter | 64 MB | Host CPU/RAM/disk metrics |
| cadvisor | 128 MB | Per-container resource metrics |
| Total new | 640 MB |
Existing stack ~1.8 GB + 640 MB new = ~2.4 GB. Leaves ~1.6 GB for OS. If too tight, live-upgrade to CAX21 (8 GB/80 GB, ~7.50 EUR/mo) via Cloud Console > Server > Rescale (~2 min restart).
18.1 DNS Record
Add A and AAAA records for grafana.wizard.lu:
| Type | Name | Value | TTL |
|---|---|---|---|
| A | grafana |
91.99.65.229 |
300 |
| AAAA | grafana |
2a01:4f8:1c1a:b39c::1 |
300 |
18.2 Caddy Configuration
Add to /etc/caddy/Caddyfile:
grafana.wizard.lu {
reverse_proxy localhost:3001
}
Reload Caddy:
sudo systemctl reload caddy
18.3 Production Environment
Add to ~/apps/orion/.env:
ENABLE_METRICS=true
GRAFANA_URL=https://grafana.wizard.lu
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=<strong-password>
18.4 Deploy
cd ~/apps/orion
docker compose --profile full up -d --build
Verify all containers are running:
docker compose --profile full ps
docker stats --no-stream
18.5 Grafana First Login
- Open
https://grafana.wizard.lu - Login with
admin/<password from .env> - Change the default password when prompted
Import community dashboards:
- Node Exporter Full: Dashboards > Import > ID
1860> Select Prometheus datasource - Docker / cAdvisor: Dashboards > Import > ID
193> Select Prometheus datasource
18.6 Verification
# Prometheus metrics from Orion API
curl -s https://api.wizard.lu/metrics | head -5
# Health endpoints
curl -s https://api.wizard.lu/health/live
curl -s https://api.wizard.lu/health/ready
# Prometheus targets (all should be "up")
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep health
# Grafana accessible
curl -I https://grafana.wizard.lu
# RAM usage within limits
docker stats --no-stream
Domain & Port Reference
| Service | Internal Port | External Port | Domain (via Caddy) |
|---|---|---|---|
| Orion API | 8000 | 8001 | api.wizard.lu |
| Main Platform | 8000 | 8001 | wizard.lu |
| OMS Platform | 8000 | 8001 | omsflow.lu |
| Loyalty+ Platform | 8000 | 8001 | rewardflow.lu |
| PostgreSQL | 5432 | 5432 | (internal only) |
| Redis | 6379 | 6380 | (internal only) |
| Flower | 5555 | 5555 | flower.wizard.lu |
| Gitea | 3000 | 3000 | git.wizard.lu |
| Prometheus | 9090 | 9090 (localhost) | (internal only) |
| Grafana | 3000 | 3001 (localhost) | grafana.wizard.lu |
| Node Exporter | 9100 | 9100 (localhost) | (internal only) |
| cAdvisor | 8080 | 8080 (localhost) | (internal only) |
| Caddy | — | 80, 443 | (reverse proxy) |
!!! note "Single backend, multiple domains"
All platform domains route to the same FastAPI backend. The PlatformContextMiddleware identifies the platform from the Host header. See Multi-Platform Architecture for details.
Directory Structure on Server
~/
├── apps/
│ └── orion/ # Orion application
│ ├── .env # Production environment
│ ├── docker-compose.yml # App stack (API, DB, Redis, Celery, monitoring)
│ ├── monitoring/ # Prometheus + Grafana config
│ ├── logs/ # Application logs
│ ├── uploads/ # User uploads
│ └── exports/ # Export files
├── backups/
│ ├── orion/
│ │ ├── daily/ # 7-day retention
│ │ └── weekly/ # 4-week retention
│ └── gitea/
│ ├── daily/
│ └── weekly/
├── gitea/
│ └── docker-compose.yml # Gitea + PostgreSQL
└── gitea-runner/ # CI/CD runner (act_runner v0.2.13)
├── act_runner # symlink → act_runner-0.2.13-linux-arm64
├── act_runner-0.2.13-linux-arm64
└── .runner # registration config
Troubleshooting
Permission denied on logs
The Docker container runs as appuser (UID 1000). Host-mounted volumes need matching ownership:
sudo chown -R 1000:1000 logs uploads exports
Celery workers restarting
Check logs for import errors:
docker compose --profile full logs celery-worker --tail 30
Common cause: stale task module references in app/core/celery_config.py.
SSH service name on Ubuntu 24.04
Ubuntu 24.04 uses ssh not sshd:
sudo systemctl restart ssh # correct
sudo systemctl restart sshd # will fail
git pull fails with local changes
If docker-compose.yml was edited on the server (e.g. passwords), stash before pulling:
git stash
git pull
git stash pop
Maintenance
Deploy updates
Deployments happen automatically when pushing to master (see Step 16). For manual deploys:
cd ~/apps/orion && bash scripts/deploy.sh
The script handles stashing local changes, pulling, rebuilding containers, running migrations, and health checks.
View logs
# Follow all logs in real-time
docker compose --profile full logs -f
# Follow a specific service
docker compose --profile full logs -f api
docker compose --profile full logs -f celery-worker
docker compose --profile full logs -f celery-beat
docker compose --profile full logs -f flower
# View last N lines (useful for debugging crashes)
docker compose --profile full logs --tail=50 api
docker compose --profile full logs --tail=100 celery-worker
# Filter logs for errors
docker compose --profile full logs api | grep -i "error\|exception\|failed"
Check container status
# Overview of all containers (health, uptime, ports)
docker compose --profile full ps
# Watch for containers stuck in "Restarting" — indicates a crash loop
# Healthy containers show: Up Xs (healthy)
Restart services
# Restart a single service
docker compose --profile full restart api
# Restart everything
docker compose --profile full restart
# Full rebuild (after code changes)
docker compose --profile full up -d --build
Quick access URLs
After Caddy is configured:
| Service | URL |
|---|---|
| Main Platform | https://wizard.lu |
| API Swagger docs | https://api.wizard.lu/docs |
| API ReDoc | https://api.wizard.lu/redoc |
| Admin panel | https://wizard.lu/admin/login |
| Health check | https://api.wizard.lu/health |
| Prometheus metrics | https://api.wizard.lu/metrics |
| Gitea | https://git.wizard.lu |
| Flower | https://flower.wizard.lu |
| Grafana | https://grafana.wizard.lu |
| OMS Platform | https://omsflow.lu |
| Loyalty+ Platform | https://rewardflow.lu |
Direct IP access (temporary, until firewall rules are removed):
| Service | URL |
|---|---|
| API | http://91.99.65.229:8001/docs |
| Gitea | http://91.99.65.229:3000 |
| Flower | http://91.99.65.229:5555 |