Files
orion/docs/proposals/loyalty-go-live-readiness.md
Samir Boulahtit 1425b48239
All checks were successful
CI / ruff (push) Successful in 17s
CI / pytest (push) Successful in 2h41m22s
CI / validate (push) Successful in 32s
CI / dependency-scanning (push) Successful in 36s
CI / docs (push) Successful in 58s
CI / deploy (push) Successful in 1m12s
docs(loyalty): record session pause + next-session resume sequence
Add a short "Next session" subsection to the 2026-05-16 update
spelling out the three steps to pick up Test 1 round 2 (SMTP
re-apply post-reset, program sanity check, live log tail + fresh
enrollment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:50:23 +02:00

12 KiB
Raw Blame History

Loyalty Go-Live Readiness — 2026-05-10

Snapshot of where the loyalty platform stands the night of 2026-05-10. The canonical sequenced plan is still app/modules/loyalty/docs/production-launch-plan.md; this doc records the current state ( / / 🟡) and what surfaced during the prod readiness pass.

TL;DR

The technical pre-launch checklist is green. The remaining gate is a human one — walking the 8 user-journey E2E tests on prod with a real test customer and confirming nothing surprises us. After that, flip the loyalty platform live for FASHIONHUB's stores and start the Google Wallet production-access review in parallel (13 day Google review, non-blocking).

2026-05-16 update — Test 1 round 1: 7 bugs found, 6 fixed, 1 pending

First attempt at the customer-facing journey on FASHIONHUB's fallback subdomain (fashionhub.rewardflow.lu) surfaced more than expected. A critical timestamp bug was masquerading as a re-enrollment confusion, which sent us briefly down the wrong investigation path. The clean-slate reset described below cleared the bad data so the remaining gates can be verified on a known-good baseline.

Six bugs fixed and deployed to prod (5 commits):

Bug Layer Fix
TimestampMixin evaluated datetime.now(UTC) once at module import — every row stamped at process-start time models/database/base.py Pass _utc_now callable as default / onupdate. Critical: affected every created_at / updated_at on every table that uses the mixin since the last app restart.
Admin/store/merchant card detail page showed "-" for phone + birthday even when both were captured during enrollment app/modules/loyalty/schemas/card.py + 3 endpoints Added customer_phone + customer_birthday to CardDetailResponse and populated from customer.phone / customer.birth_date. Data was persisting all along — purely a serialization gap.
Storefront <html lang="en"> hardcoded made <input type="date"> show in mm/dd/yyyy on the FR storefront app/templates/storefront/base.html Dynamic lang="{{ current_language|default('en') }}" so the browser respects the FR locale.
Storefront nav "Home" rendered as English literal across all locales despite nav.home existing in every locale file app/templates/storefront/base.html Use {{ _('nav.home') }} on both desktop and mobile nav.
Store.description (the per-store tagline) was single-language only — FASHIONHUB's "Trendy clothing and accessories" rendered in EN on the FR storefront footer Store model + migration tenancy_005 + template + seed_demo.py Added description_translations JSON column with the same shape used by CMS / Platform / Subscription. Added get_translated_description(lang) getter with FR/DE → DEFAULT_LANGUAGE → description fallback. Seeded FR/DE/LB/EN for Fashion Group's two stores so they render correctly out of the box.
make init-prod and make db-reset referenced scripts/seed/seed_email_templates.py, which doesn't exist (the real seeders are _core.py + _loyalty.py) — db-reset would silently bomb mid-way Makefile Call both real scripts in both targets.
scripts/seed/create_default_content_pages.py still passed meta_keywords to ContentPage, but the column was dropped in migration cms_003 — fresh seeding failed on the first platform scripts/seed/create_default_content_pages.py Drop the meta_keywords kwarg.

One bug still open:

  • B1-F — welcome email not received. The original investigation was confounded by the timestamp bug (customer looked like it was from May 12 when it was actually fresh, making the re-enrollment hypothesis seem plausible). Needs fresh repro on the clean DB: enroll with a new email, tail api + celery-worker logs live, check email_logs for a row. If still no email, then there's a real bug in the dispatch path — notification_service.send_enrollment_confirmation is called from card_service.enroll_customer:636 and wraps the call in a try/except that only logs warnings (card_service.py:631-645), so a silent failure in _resolve_context or the Celery enqueue would be invisible from the user's perspective.

Two product decisions pending (from the same session, not yet implemented):

  • B1-E — QR code in welcome email. Scoped: pass wallet_save_url into the loyalty_enrollment template, generate QR server-side (Python qrcode), update the HTML body in all 4 locales in scripts/seed/seed_email_templates_loyalty.py:294-299, reseed. Blocked on B1-F (no point adding a QR to an email that doesn't send).
  • C1-C backfill scope. Other stores (WizaTech, BookWorld, LuxWeb, WizaMart etc.) still only have a single-language description. Fashion Group was seeded; rest can be done by hand via admin UI as merchants come online, or batch-updated later. No code work needed.

Prod data reset

Wiped and reseeded — used the corrected sequence from deployment/hetzner-server-setup.md section 12. Two doc gaps found and patched in the same pass:

  • Reset procedure called scripts/seed/seed_email_templates.py (doesn't exist) — now calls both real scripts
  • Reset procedure was missing seed_demo.py at the end of step 8 — now included

After reset, admin credentials are back to the defaults from init_production.py (admin / Ollama@8044, etc.); platform admin SMTP overrides in /admin/settings need to be re-applied (port 587, STARTTLS, support@wizard.lu).

Status board delta

  • Step 1 (email templates seeded) — re-seeded post-reset, still
  • Step 3 (migrations) — now at tenancy_005, still
  • Step 6 (web user-journey E2E tests) — Test 1 round 2 pending on clean DB; the bugs found in round 1 are no longer blockers

Next session

Session paused 2026-05-16 evening. To resume Test 1 round 2:

  1. Re-apply SMTP overrides under /admin/settings (port 587, STARTTLS, support@wizard.lu) — the reset wiped them.
  2. Confirm /admin/loyalty/programs shows the Fashion Group program (should be seeded by seed_demo.py).
  3. Tail api and celery-worker logs live, then enroll at https://fashionhub.rewardflow.lu/loyalty/join with a fresh email. The point of the live tail is to catch where B1-F actually dies — at dispatch, at SMTP, or somewhere else.

Status board

# Pre-launch step State Notes
1 Seed loyalty email templates on prod 20 rows (5 templates × 4 locales) all is_active=true
2 Google Wallet config on Hetzner Wallet config validator green: credentials valid, issuer 3388000000023089598, origin https://rewardflow.lu, default logo reachable
3 Database migrations All four module heads current incl. loyalty_011 (acting-device audit) on prod
4 FR/DE/LB translations for analytics i18n keys 🟡 8 keys still EN-only. Cosmetic, doesn't block soft launch
5 messaging.manage_templates permission for store owners 🟡 Only matters if merchants self-edit templates. Admin can edit centrally. Defer
6 8 web user-journey E2E tests The remaining gate — user does this with a real test customer
6b 6 Android terminal E2E tests Pairing, PIN, daily flows, offline queue, auto-lock, device revoke — gated on user obtaining a tablet
7 Google Wallet real-device pass test Already confirmed earlier — cards register, points/redeem visible on personal Google Wallet
8 Go live Gated by #6. Cleanup test data + enable platform feature flags for FASHIONHUB
9 Google Wallet production access Post-launch, 13 day Google review. App-side change is zero; same issuer + service account, passes become public-visible once approved

What got sorted tonight

SMTP wired to a self-hosted mail server

Started here:

  • prod .env had EMAIL_PROVIDER=sendgrid + a SendGrid API key
  • SendGrid free trial (60 days) had expired
  • SMTP_* env vars were placeholders pointing at smtp.example.com

Discovered that /admin/settings lets you store SMTP config in the DB (table admin_settings, category email) and those values win over .env. User had already configured:

  • email_provider=smtp
  • smtp_host=mail1.myservices.hosting
  • smtp_port=465 ← problematic
  • smtp_user=support@wizard.lu / encrypted password
  • smtp_use_ssl=true, smtp_use_tls=false

Diagnosis from the prod container:

Check Result
DNS resolves mail1.myservices.hosting 185.26.107.245
TCP mail1.myservices.hosting:465 timed out
TCP mail1.myservices.hosting:587 open

Either Hetzner blocks 465 outbound for this VPS or the provider firewalls Hetzner's IP range on 465. Either way, port 587 (submission

  • STARTTLS) is the modern path and works.

Fix: changed /admin/settings to port 587, SSL off, TLS on. Test email landed in inbox immediately, sender header Support Wizard <support@wizard.lu> — proving the DB override was being used.

Cosmetic bug found and fixed

The test email's body claimed the configuration that would have been used if .env were authoritative — i.e. it said Provider: sendgrid and From: noreply@wizard.lu even though the actual send went via SMTP from support@wizard.lu. Two places in the code:

  1. app/modules/core/routes/api/admin_settings.py::send_test_email — body template hardcoded app_settings.email_provider and app_settings.email_from_address
  2. app/modules/messaging/services/email_service.py — the "template not found" EmailLog branch recorded settings.email_provider / settings.email_from_address instead of the effective config

Both now read from get_effective_email_config(db) / self._platform_config, so the test email page and audit logs reflect what was actually used.

Commit: f2d1bdcd on master, deployed via Gitea Actions.

What the user does next

In priority order:

  1. Tonight or tomorrow — review email copy. Open /admin/email-templates and skim the 5 loyalty templates (EN locale). loyalty_enrollment, loyalty_welcome_bonus and loyalty_reward_ready are the customer-visible ones — adjust subject lines + body copy if anything reads off-brand.
  2. Walk the 8 web user-journey E2E tests — checklist at the bottom of app/modules/loyalty/docs/user-journeys.md. Use a personal email as the test customer. 2b. Once a tablet is on hand: walk the 6 Android terminal tests — same doc, "Android Terminal Tests" section (Tests 914). Covers pairing (QR + manual), offline PIN bcrypt verify, daily flows (stamp/earn/redeem/enroll), offline queue drain, idle auto-lock, and device revocation cutoff.
  3. Flip live for FASHIONHUB — clean any test data, double-check Celery (docker compose ps | grep celery), enable loyalty feature on FASHIONHUB's stores via the admin UI.
  4. In parallel, file Google Wallet production accesspay.google.com/business/console → Wallet API → Manage → Request production access. Use sample pass screenshots from FASHIONHUB. Google reviews the Issuer, not individual merchants — once approved all merchants on the platform are covered.

Open follow-ups (non-blocking)

These can wait but are worth tracking:

  • FR/DE/LB translations for the 8 analytics i18n keys (store.analytics.revenue_title, store.analytics.cohort_title, etc.). EN shows through; cosmetic only.
  • messaging.manage_templates permission discovery for merchant_owner role — needed if/when merchants self-edit templates. Admin can edit centrally for v1.
  • Failed-PIN-attempt reporting from Android tablet → server lockout counter — tablet bcrypts locally and silently fails; a stolen tablet's brute-forcer doesn't trip server-side lockout. Add a tiny POST /pins/{id}/record-failed-attempt endpoint plus a call from the PinViewModel's failure branch.
  • Splash screen + per-action success animation for the Android tablet — Phase F polish that was intentionally deferred.

Reference