Files
orion/docs/proposals/loyalty-go-live-readiness.md
Samir Boulahtit 7cf2420bba
All checks were successful
CI / ruff (push) Successful in 19s
CI / pytest (push) Successful in 2h52m5s
CI / validate (push) Successful in 34s
CI / dependency-scanning (push) Successful in 36s
CI / docs (push) Successful in 58s
CI / deploy (push) Successful in 1m52s
docs(loyalty): record B1-F resolution + 6 follow-ups for next session
End-of-day 2026-05-17 update to the go-live readiness doc.

Welcome email B1-F is now fully resolved end-to-end (enrollment ->
celery dispatch -> email_logs status=sent -> emails landing). The
issue was a chain of four nested bugs each masking the next — the
doc lists all four with commit hashes (44c42909, 3e650ff8, 2a216101,
5b21908b) plus the three earlier same-session fixes for the SMTP
password eye toggle, the JS error on /admin/loyalty/programs, and
the 422 on ProgramCreate.

Also captured:
- Audit finding: prospecting/tasks/__init__.py has the same bug as
  bug #3 (scan_tasks.py exists but not imported).
- Six queued follow-ups for next session: two Test 1 storefront nits
  (date format on FR, "Continuer mes achats" CTA), Test 2 cross-
  store re-enrollment, Hetzner doc check, a concrete unit-test list
  that would have caught each of the four B1-F bugs, the prospecting
  __init__.py fix, and a wider audit of every module's email path
  to find any other silently-broken @shared_task registration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 23:08:27 +02:00

16 KiB
Raw Blame History

Loyalty Go-Live Readiness — 2026-05-10

Snapshot of where the loyalty platform stands the night of 2026-05-10. The canonical sequenced plan is still app/modules/loyalty/docs/production-launch-plan.md; this doc records the current state ( / / 🟡) and what surfaced during the prod readiness pass.

TL;DR

The technical pre-launch checklist is green. The remaining gate is a human one — walking the 8 user-journey E2E tests on prod with a real test customer and confirming nothing surprises us. After that, flip the loyalty platform live for FASHIONHUB's stores and start the Google Wallet production-access review in parallel (13 day Google review, non-blocking).

2026-05-16 update — Test 1 round 1: 7 bugs found, 6 fixed, 1 pending

First attempt at the customer-facing journey on FASHIONHUB's fallback subdomain (fashionhub.rewardflow.lu) surfaced more than expected. A critical timestamp bug was masquerading as a re-enrollment confusion, which sent us briefly down the wrong investigation path. The clean-slate reset described below cleared the bad data so the remaining gates can be verified on a known-good baseline.

Six bugs fixed and deployed to prod (5 commits):

Bug Layer Fix
TimestampMixin evaluated datetime.now(UTC) once at module import — every row stamped at process-start time models/database/base.py Pass _utc_now callable as default / onupdate. Critical: affected every created_at / updated_at on every table that uses the mixin since the last app restart.
Admin/store/merchant card detail page showed "-" for phone + birthday even when both were captured during enrollment app/modules/loyalty/schemas/card.py + 3 endpoints Added customer_phone + customer_birthday to CardDetailResponse and populated from customer.phone / customer.birth_date. Data was persisting all along — purely a serialization gap.
Storefront <html lang="en"> hardcoded made <input type="date"> show in mm/dd/yyyy on the FR storefront app/templates/storefront/base.html Dynamic lang="{{ current_language|default('en') }}" so the browser respects the FR locale.
Storefront nav "Home" rendered as English literal across all locales despite nav.home existing in every locale file app/templates/storefront/base.html Use {{ _('nav.home') }} on both desktop and mobile nav.
Store.description (the per-store tagline) was single-language only — FASHIONHUB's "Trendy clothing and accessories" rendered in EN on the FR storefront footer Store model + migration tenancy_005 + template + seed_demo.py Added description_translations JSON column with the same shape used by CMS / Platform / Subscription. Added get_translated_description(lang) getter with FR/DE → DEFAULT_LANGUAGE → description fallback. Seeded FR/DE/LB/EN for Fashion Group's two stores so they render correctly out of the box.
make init-prod and make db-reset referenced scripts/seed/seed_email_templates.py, which doesn't exist (the real seeders are _core.py + _loyalty.py) — db-reset would silently bomb mid-way Makefile Call both real scripts in both targets.
scripts/seed/create_default_content_pages.py still passed meta_keywords to ContentPage, but the column was dropped in migration cms_003 — fresh seeding failed on the first platform scripts/seed/create_default_content_pages.py Drop the meta_keywords kwarg.

One bug still open:

  • B1-F — welcome email not received. The original investigation was confounded by the timestamp bug (customer looked like it was from May 12 when it was actually fresh, making the re-enrollment hypothesis seem plausible). Needs fresh repro on the clean DB: enroll with a new email, tail api + celery-worker logs live, check email_logs for a row. If still no email, then there's a real bug in the dispatch path — notification_service.send_enrollment_confirmation is called from card_service.enroll_customer:636 and wraps the call in a try/except that only logs warnings (card_service.py:631-645), so a silent failure in _resolve_context or the Celery enqueue would be invisible from the user's perspective.

Two product decisions pending (from the same session, not yet implemented):

  • B1-E — QR code in welcome email. Scoped: pass wallet_save_url into the loyalty_enrollment template, generate QR server-side (Python qrcode), update the HTML body in all 4 locales in scripts/seed/seed_email_templates_loyalty.py:294-299, reseed. Blocked on B1-F (no point adding a QR to an email that doesn't send).
  • C1-C backfill scope. Other stores (WizaTech, BookWorld, LuxWeb, WizaMart etc.) still only have a single-language description. Fashion Group was seeded; rest can be done by hand via admin UI as merchants come online, or batch-updated later. No code work needed.

Prod data reset

Wiped and reseeded — used the corrected sequence from deployment/hetzner-server-setup.md section 12. Two doc gaps found and patched in the same pass:

  • Reset procedure called scripts/seed/seed_email_templates.py (doesn't exist) — now calls both real scripts
  • Reset procedure was missing seed_demo.py at the end of step 8 — now included

After reset, admin credentials are back to the defaults from init_production.py (admin / Ollama@8044, etc.); platform admin SMTP overrides in /admin/settings need to be re-applied (port 587, STARTTLS, support@wizard.lu).

Status board delta

  • Step 1 (email templates seeded) — re-seeded post-reset, still
  • Step 3 (migrations) — now at tenancy_005, still
  • Step 6 (web user-journey E2E tests) — Test 1 round 2 pending on clean DB; the bugs found in round 1 are no longer blockers

Next session

Session paused 2026-05-16 evening. To resume Test 1 round 2:

  1. Re-apply SMTP overrides under /admin/settings (port 587, STARTTLS, support@wizard.lu) — the reset wiped them.
  2. Confirm /admin/loyalty/programs shows the Fashion Group program (should be seeded by seed_demo.py).
  3. Tail api and celery-worker logs live, then enroll at https://fashionhub.rewardflow.lu/loyalty/join with a fresh email. The point of the live tail is to catch where B1-F actually dies — at dispatch, at SMTP, or somewhere else.

2026-05-17 update — B1-F resolved (chain of 4 nested bugs)

End-to-end enrollment → Celery dispatch → email_logs status=sent → real emails arriving in inbox. Verified with the FR locale: enrollment ("Bienvenue chez Fashion Group S.A. Loyalty !") and welcome-bonus ("Vous avez gagné 50 points bonus !") both send within ~4s of submit.

The "no welcome email" symptom hid four layered bugs; each silently masked the next, which is why early diagnostics looked clean:

# Bug Fix
1 @shared_task defaulted to amqp://localhost// because celery_app.set_default() was never called AND the api process never imported celery_config. .delay() raised kombu.OperationalError: Connection refused. 44c42909set_default() + early import in main.py (with # isort: split so ruff doesn't reorder it).
2 on_failure log handler crashed on reserved LogRecord attribute name argsKeyError masked every real task exception. 3e650ff8 — rename to task_args / task_kwargs.
3 loyalty.send_notification_email wasn't in worker's task registry — notifications.py wasn't imported by loyalty/tasks/__init__.py. Worker received the message, couldn't find the task, ACKed silently. 2a216101 — add the import + __all__ entry.
4 Celery worker process never imported all models. First DB query failed InvalidRequestError: expression 'ContentPage' failed to locate a name. 5b21908b_preload_all_module_models() walks the registry and force-imports each module's models package at celery_config load.

Three earlier same-session commits also shipped: SMTP password eye toggle (64a178f4), JS error on /admin/loyalty/programs (8d6830fc), 422 on ProgramCreate (120532e6).

Audit finding

app/modules/prospecting/tasks/__init__.py has the same shape as bug #3 above — scan_tasks.py exists but isn't imported. Not blocking anything today (no prospecting Celery dispatch is wired up yet), but should be fixed alongside the unit-test pass below.

Follow-ups (queued for next session)

  1. Two Test 1 nits — date format mm/dd/yyyy on FR storefront enrollment form (verify the <html lang> deploy actually landed; if it did, the user's browser doesn't honor lang for <input type="date"> and we need a JS date-picker swap); "Continuer mes achats" CTA on enroll-success.html:118 is wrong for loyalty-only storefronts with no catalog.
  2. Test 2 — cross-store re-enrollment at FASHIONOUTLET with the email from Test 1.
  3. Hetzner doc check — verify whether docs/deployment/hetzner-server-setup.md needs any new step from tonight's fixes. Most likely no (the fixes are in-code, not deployment), but worth a glance.
  4. Unit tests — none of the four B1-F bugs were caught by the existing suite. Add at minimum:
    • Assert celery_app.conf.broker_url is redis://... after importing main (catches future set_default() ordering regressions).
    • Assert loyalty.send_notification_email is in celery_app.tasks after importing app.modules.loyalty.tasks (catches future missing imports in task package __init__.py).
    • Assert configure_mappers() succeeds after importing app.core.celery_config (catches future missing-models regressions in celery).
    • Either assert task_base.on_failure doesn't crash on a synthetic failure, or standardize an extra= sanitiser that strips reserved LogRecord attribute names.
  5. Fix prospecting/tasks/__init__.py — add the missing import.
  6. Audit every other module's email path — are billing's trial-expiration emails really dispatched via Celery? Messaging's password-reset emails? If yes, same silent-failure risk exists until a real send hits prod. Add an integration test that triggers a representative email from every module and asserts an email_logs row appears within N seconds.

Status board

# Pre-launch step State Notes
1 Seed loyalty email templates on prod 20 rows (5 templates × 4 locales) all is_active=true
2 Google Wallet config on Hetzner Wallet config validator green: credentials valid, issuer 3388000000023089598, origin https://rewardflow.lu, default logo reachable
3 Database migrations All four module heads current incl. loyalty_011 (acting-device audit) on prod
4 FR/DE/LB translations for analytics i18n keys 🟡 8 keys still EN-only. Cosmetic, doesn't block soft launch
5 messaging.manage_templates permission for store owners 🟡 Only matters if merchants self-edit templates. Admin can edit centrally. Defer
6 8 web user-journey E2E tests The remaining gate — user does this with a real test customer
6b 6 Android terminal E2E tests Pairing, PIN, daily flows, offline queue, auto-lock, device revoke — gated on user obtaining a tablet
7 Google Wallet real-device pass test Already confirmed earlier — cards register, points/redeem visible on personal Google Wallet
8 Go live Gated by #6. Cleanup test data + enable platform feature flags for FASHIONHUB
9 Google Wallet production access Post-launch, 13 day Google review. App-side change is zero; same issuer + service account, passes become public-visible once approved

What got sorted tonight

SMTP wired to a self-hosted mail server

Started here:

  • prod .env had EMAIL_PROVIDER=sendgrid + a SendGrid API key
  • SendGrid free trial (60 days) had expired
  • SMTP_* env vars were placeholders pointing at smtp.example.com

Discovered that /admin/settings lets you store SMTP config in the DB (table admin_settings, category email) and those values win over .env. User had already configured:

  • email_provider=smtp
  • smtp_host=mail1.myservices.hosting
  • smtp_port=465 ← problematic
  • smtp_user=support@wizard.lu / encrypted password
  • smtp_use_ssl=true, smtp_use_tls=false

Diagnosis from the prod container:

Check Result
DNS resolves mail1.myservices.hosting 185.26.107.245
TCP mail1.myservices.hosting:465 timed out
TCP mail1.myservices.hosting:587 open

Either Hetzner blocks 465 outbound for this VPS or the provider firewalls Hetzner's IP range on 465. Either way, port 587 (submission

  • STARTTLS) is the modern path and works.

Fix: changed /admin/settings to port 587, SSL off, TLS on. Test email landed in inbox immediately, sender header Support Wizard <support@wizard.lu> — proving the DB override was being used.

Cosmetic bug found and fixed

The test email's body claimed the configuration that would have been used if .env were authoritative — i.e. it said Provider: sendgrid and From: noreply@wizard.lu even though the actual send went via SMTP from support@wizard.lu. Two places in the code:

  1. app/modules/core/routes/api/admin_settings.py::send_test_email — body template hardcoded app_settings.email_provider and app_settings.email_from_address
  2. app/modules/messaging/services/email_service.py — the "template not found" EmailLog branch recorded settings.email_provider / settings.email_from_address instead of the effective config

Both now read from get_effective_email_config(db) / self._platform_config, so the test email page and audit logs reflect what was actually used.

Commit: f2d1bdcd on master, deployed via Gitea Actions.

What the user does next

In priority order:

  1. Tonight or tomorrow — review email copy. Open /admin/email-templates and skim the 5 loyalty templates (EN locale). loyalty_enrollment, loyalty_welcome_bonus and loyalty_reward_ready are the customer-visible ones — adjust subject lines + body copy if anything reads off-brand.
  2. Walk the 8 web user-journey E2E tests — checklist at the bottom of app/modules/loyalty/docs/user-journeys.md. Use a personal email as the test customer. 2b. Once a tablet is on hand: walk the 6 Android terminal tests — same doc, "Android Terminal Tests" section (Tests 914). Covers pairing (QR + manual), offline PIN bcrypt verify, daily flows (stamp/earn/redeem/enroll), offline queue drain, idle auto-lock, and device revocation cutoff.
  3. Flip live for FASHIONHUB — clean any test data, double-check Celery (docker compose ps | grep celery), enable loyalty feature on FASHIONHUB's stores via the admin UI.
  4. In parallel, file Google Wallet production accesspay.google.com/business/console → Wallet API → Manage → Request production access. Use sample pass screenshots from FASHIONHUB. Google reviews the Issuer, not individual merchants — once approved all merchants on the platform are covered.

Open follow-ups (non-blocking)

These can wait but are worth tracking:

  • FR/DE/LB translations for the 8 analytics i18n keys (store.analytics.revenue_title, store.analytics.cohort_title, etc.). EN shows through; cosmetic only.
  • messaging.manage_templates permission discovery for merchant_owner role — needed if/when merchants self-edit templates. Admin can edit centrally for v1.
  • Failed-PIN-attempt reporting from Android tablet → server lockout counter — tablet bcrypts locally and silently fails; a stolen tablet's brute-forcer doesn't trip server-side lockout. Add a tiny POST /pins/{id}/record-failed-attempt endpoint plus a call from the PinViewModel's failure branch.
  • Splash screen + per-action success animation for the Android tablet — Phase F polish that was intentionally deferred.

Reference