Document two ways to take CI/Gitea load off the production box, since the
HostHighCpuUsage floods are caused by act_runner running ruff/pytest/validate
on the prod server (not by Gitea hosting, which is light):
- 2a "Offloading CI to a Separate Server" — move just the act_runner to a
cheap x86 box (no data migration, no DNS, no downtime). Includes the smaller
build-burst caveat (deploy still builds on prod) + the registry-pull path.
- 2c "Migrating Gitea to a Separate Server" — full separation runbook:
pg_dump + data-volume tar/restore, DNS cutover, Caddy/SSL, rollback. Notes
the box becomes stateful/critical (backups + hardening).
mkdocs --strict clean; arch validation 0 new findings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
deploy.sh already pruned old images but never build cache — the larger half
of disk creep from CI rebuilds (root fs hit 83% on prod). Add
`docker builder prune --filter until=168h` alongside the existing image prune
so cleanup happens every deploy, version-controlled, no host cron.
Docs (hetzner-server-setup.md, Maintenance section):
- New "Rescaling / Upgrading the Server" — when/why, same-arch (Arm/CAX) +
CPU-RAM-only vs irreversible disk-expand constraints, poweroff→rescale→
power-on→verify steps, and the Arm-capacity-unavailable-in-DC caveat.
- New "Disk Maintenance (Docker Pruning)" — emergency manual prune + the
automated deploy.sh approach.
- Fixed stale Resource Budget: cadvisor 128→192 MB (matches compose),
total 672→736 MB, and "live-upgrade" wording (rescale needs a power-off).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hetzner Cloud silently blocks outbound TCP 25 and 465 on every Cloud
Server. The block sits upstream of the VM — UFW and iptables look
completely clean — so it presents as a generic "connection times out"
that's easy to misdiagnose as a credential or DNS issue. Spent ~5 hours
on 2026-05-30 working through swaks/tcpdump/auth-backend hypotheses
before finding Hetzner's own docs that mention the policy.
Two doc additions:
- Step 4 (Firewall Configuration) gets a warning admonition right after
the UFW status check. Explains the upstream nature of the block,
gives the symptom signature (nc to 587 succeeds, nc to 465 silently
times out), and includes the auto-approved unblock ticket template
with sample text.
- Step 19.5 (Alertmanager SMTP) gets a "live prod uses
mail1.myservices.hosting:465" callout reflecting the reality that
the SendGrid setup documented in that section is no longer how this
prod env is wired. The callout captures the actual smarthost config
(with smtp_auth_password kept gitignored, only .example ships in
repo), the two prerequisites (Hetzner unblock + implicit-TLS-aware
smarthost port), and the redacted swaks verification command. The
rest of §19.5 stays as a reference for greenfield deploys that
prefer SendGrid.
Saves the next person from repeating the same hours-long detour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manual deploys had been using a bare `git pull && docker compose up -d
--build api` sequence, which works for the container itself but silently
skipped writing `.build-info`. The stale `.build-info` left
`?v=<commit-sha>` pointing at the previous deploy's SHA on every shared
JS/CSS URL — so browsers happily kept cached pre-fix assets even after
a successful rebuild. Bit us today: ~5 hours of "is this even deployed?"
debugging on the loyalty-dashboard redirect-flicker fix.
deploy.sh wasn't a substitute because it's a CI/CD script: stashes
working tree, runs alembic, restarts every service in the full profile
(db, redis, api, celery-worker, celery-beat, flower), 60s health budget.
Heavy and disruptive for an api-only hotfix.
New scripts/deploy-api-only.sh fills the gap with the narrow path:
- Refuses if working tree is dirty (no silent stash → no pop conflicts).
- git pull --ff-only.
- Writes .build-info (the critical missing step).
- docker compose -f docker-compose.yml --profile full up -d --build api
(only the api service — db/redis/celery untouched).
- Tight 30s health budget since DB doesn't need to come back up.
- Exit codes 0/1/2/3 for clean automation.
docs/deployment/hetzner-server-setup.md §16.5 split into 16.5a
(code-only — points at the new script as the default) and 16.5b
(full deploy fallback — kept the existing deploy.sh path for migrations
/ Dockerfile / docker-compose / requirements changes). §12 footnote on
.build-info refreshed to mention both scripts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/proposals/loyalty-go-live-readiness.md — record the
2026-05-16 session: 7 bugs found during Test 1 round 1
(TimestampMixin, CardDetailResponse, storefront i18n triple,
Makefile, meta_keywords), 6 fixed and deployed with commit
hashes, B1-F still pending repro on the clean DB.
- docs/deployment/hetzner-server-setup.md — fix section 12
step 8 to call seed_email_templates_core.py +
seed_email_templates_loyalty.py instead of the non-existent
seed_email_templates.py, and add seed_demo.py at the end.
Append a note about post-reset state (.build-info + admin
settings).
- mkdocs.yml — sweep 14 previously-unlinked proposals and 2
module docs (loyalty/production-readiness.md,
prospecting/batch-scanning.md) into the nav so the strict
build no longer warns.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- hetzner-server-setup: runner timeout 3h, shutdown_timeout 300s,
deploy.sh now writes .build-info and uses explicit -f flag
- gitea: document unit-only CI tests and xdist incompatibility
- docker: add build info section, document volume mount approach
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- deploy.sh writes .build-info with commit SHA and timestamp after git pull
- /health endpoint now returns version, commit, and deployed_at fields
- Admin sidebar footer shows version and commit SHA
- Hetzner docs updated: runner --config flag, swap, and runner timeout
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security:
- Fix TOCTOU race conditions: move balance/limit checks after row lock in redeem_points, add_stamp, redeem_stamps
- Add PIN ownership verification to update/delete/unlock store routes
- Gate adjust_points endpoint to merchant_owner role only
Data integrity:
- Track total_points_voided in void_points
- Add order_reference idempotency guard in earn_points
Correctness:
- Fix LoyaltyProgramAlreadyExistsException to use merchant_id parameter
- Add StorefrontProgramResponse excluding wallet IDs from public API
- Add bounds (±100000) to PointsAdjustRequest.points_delta
Audit & config:
- Add CARD_REACTIVATED transaction type with audit record
- Improve admin audit logging with actor identity and old values
- Use merchant-specific PIN lockout settings with global fallback
- Guard MerchantLoyaltySettings creation with get_or_create pattern
Tests: 27 new tests (265 total) covering all 12 items — unit and integration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update origin cert config: wildcards for omsflow.lu, rewardflow.lu, hostwizard.lu
- Add wildcard Caddy blocks to production Caddyfile example
- Replace "Future" section with actionable runbooks:
- Add a Store Subdomain (self-service, no infra changes)
- Add a Custom Store Domain (Cloudflare + Caddy + DB)
- Add a New Platform Domain (full setup)
- Document wizard.lu exception (no wildcard due to git.wizard.lu DNS-only)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
StoreContextMiddleware was treating platform domains (e.g. rewardflow.lu)
as custom store domains, causing store lookup to fail before reaching
path-based detection (/storefront/FASHIONHUB/...). Now skips custom
domain detection when the host matches the platform's own domain.
Also fixes menu tests to use loyalty-program instead of loyalty-overview,
and adds LOYALTY_DEFAULT_LOGO_URL and LOYALTY_GOOGLE_WALLET_ORIGINS to
Hetzner deployment docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix platform-grouped merchant sidebar menu with core items at root level
- Add merchant store management (detail page, create store, team page)
- Fix store settings 500 error by removing dead stripe/API tab
- Move onboarding translations to module-owned locale files
- Fix onboarding banner i18n with server-side rendering + context inheritance
- Refactor login language selectors to use languageSelector() function (LANG-002)
- Move HTTPException handling to global exception handler in merchant routes (API-003)
- Add language selector to all login pages and portal headers
- Fix customer module: drop order stats from customer model, add to orders module
- Fix admin menu config visibility for super admin platform context
- Fix storefront auth and layout issues
- Add missing i18n translations for onboarding steps (en/fr/de/lb)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ASCII diagram showing all services, Docker networks, port bindings,
and traffic flow from Cloudflare through Caddy to each container.
Clearly marks which ports are internet-exposed, localhost-only,
or Docker-internal-only.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Gitea docker-compose in Step 7 to bind port 3000 to 127.0.0.1
- Add REDIS_PASSWORD to Step 10 critical production values
- Replace misleading UFW rules with Docker port binding instructions
- Add warning about Docker bypassing UFW in Step 14
- Add initial setup note for temporary Gitea port exposure
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Redis password via REDIS_PASSWORD env var (--requirepass flag)
- Update all REDIS_URL and REDIS_ADDR references to include password
- Restrict /metrics endpoint to localhost and Docker internal networks (403 for external requests)
- Document Gitea port 3000 localhost binding fix (must be applied manually on server)
- Add REDIS_PASSWORD to .env.example
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker bypasses UFW iptables, so bare port mappings like "5432:5432"
exposed the database to the public internet. Removed port mappings for
PostgreSQL and Redis (they only need Docker-internal networking), and
bound the API port to 127.0.0.1 since only Caddy needs to reach it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the complete nuclear reset sequence (tested end-to-end):
stop → build → infra up → schema reset → migrations → seeds → start.
Update seeded data counts to match current output (30 CMS pages,
12 tiers, 3 admins, 28 email templates). Switch from exec to run --rm
for seed commands so they work before services are started.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Connect the fully-implemented Google Wallet service to the loyalty module:
- Create wallet class/object on customer enrollment
- Sync wallet passes on stamp and points operations
- Expose wallet URLs in storefront API responses
- Add conditional "Add to Google Wallet" buttons on dashboard and enroll-success pages
- Use platform-wide env var config (not per-merchant DB column)
- Add Google service account patterns to .gitignore
- Add LOYALTY_GOOGLE_* fields to app Settings
- Update deployment docs and add local testing guide
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Step 25 in Hetzner docs with full Google Cloud/Wallet Console setup,
service account configuration, local testing, and architecture diagrams.
Loyalty module env vars added to environment.md and .env.example.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mark all server-side tasks as complete (fail2ban, Flower password,
unattended-upgrades, verification script)
- Correct memory limits: celery-beat and flower bumped to 256m after OOM
- Update scaling guide memory budget to match actual limits
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Captures all server-side work completed on 2026-02-16:
- Cloudflare Full setup for wizard.lu, omsflow.lu, rewardflow.lu (NS, SSL, origin certs)
- SendGrid SMTP configured for Alertmanager and app transactional emails
- Caddyfile updated with origin certs and tls issuer acme for git.wizard.lu
- Alertmanager v2 API for test alerts, multi-domain email strategy documented
- Cloudflare security: bot protection, DDoS, rate limiting on /api/ paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SendGrid handles both transactional emails and marketing campaigns
under one account. Updated alertmanager SMTP placeholders, hetzner
setup guide (Step 19.5), and environment reference to recommend
SendGrid as the primary email provider.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Delete .gitlab-ci.yml (replaced by .gitea/workflows/ci.yml)
- Delete docs/deployment/gitlab.md (superseded by gitea.md)
- Update audit rules to reference .gitea/workflows/*.yml
- Update validate_audit.py to check Gitea CI paths
- Clean up GitLab references in gitea.md, mkdocs.yml, .dockerignore
- Mark IPv6 AAAA records as completed in hetzner docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mark Steps 1-18 as fully complete (R2 offsite backups operational)
- Fix awscli install instructions: pip3 instead of apt (Ubuntu 24.04)
- Add Environment PATH to systemd service for ~/.local/bin/aws
- Add --upload flag to systemd ExecStart now that R2 is configured
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All three platforms live with auto-SSL (wizard.lu, omsflow.lu, rewardflow.lu).
Monitoring stack deployed with Grafana dashboards. Hetzner backups active.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update observability.md with production container table, actual init code,
and correct env var names. Update docker.md with full 10-service table and
backup/monitoring cross-references. Add explicit AAAA records to DNS tables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clean up 28 backward compatibility instances identified in the codebase.
The app is not live, so all shims are replaced with the target architecture:
- Remove legacy Inventory.location column (use bin_location exclusively)
- Remove dashboard _extract_metric_value helper (use flat metrics dict)
- Remove legacy stat field duplicates (total_stores, total_imports, etc.)
- Remove 13 re-export shims and class aliases across modules
- Remove module-enabling JSON fallback (use PlatformModule junction table)
- Remove menu_to_legacy_format() conversion (return dataclasses directly)
- Remove title/description from MarketplaceProductBase schema
- Clean billing convenience method docstrings
- Clean test fixtures and backward-compat comments
- Add PlatformModule seeding to init_production.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backups: pg_dump scripts with daily/weekly rotation and Cloudflare R2 offsite sync.
Monitoring: Prometheus, Grafana, node-exporter, cAdvisor in docker-compose; /metrics
endpoint activated via prometheus_client.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove upload-artifact step (unsupported on Gitea GHES)
- Replace architecture+audit jobs with unified validate job running validate_all.py
- Update docs: DEPLOY_HOST must be 172.17.0.1 (Docker bridge), not 127.0.0.1
- Add ufw rule for Docker bridge network SSH access
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix `ContentPage.store_id is None` (Python identity check, always
False) → use `.is_(None)` for proper SQLAlchemy NULL filtering
- Create pages for ALL platforms instead of only OMS
- Merge create_platform_pages.py into create_default_content_pages.py
(5 overlapping pages, only platform_homepage was unique)
- Delete redundant create_platform_pages.py
- Update Makefile, install.py, and docs references
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all ~1,086 occurrences of Wizamart/wizamart/WIZAMART/WizaMart
with Orion/orion/ORION across 184 files. This includes database
identifiers, email addresses, domain references, R2 bucket names,
DNS prefixes, encryption salt, Celery app name, config defaults,
Docker configs, CI configs, documentation, seed data, and templates.
Renames homepage-wizamart.html template to homepage-orion.html.
Fixes duplicate file_pattern key in api.yaml architecture rule.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deploy job SSHes to production after ruff/pytest/architecture pass,
running scripts/deploy.sh (stash, pull, docker rebuild, migrate, health check).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Steps 16-18 outlined: continuous deployment, backups, monitoring.
Deferred multi-platform DNS/Caddy until all platforms ready.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add one-liner deploy command, log viewing/filtering, container status
checks, and update remaining tasks list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete step-by-step guide documenting the server setup performed on 2026-02-11:
- Server hardening (non-root user, UFW, SSH lockdown, fail2ban)
- Docker & Docker Compose installation
- Gitea self-hosted git with PostgreSQL
- Wizamart deployment (API, DB, Redis, Celery, Flower)
- Database migration and production seeding
- Troubleshooting section for issues encountered during setup
- DNS and Caddy reverse proxy instructions (TODO for next session)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>