diff --git a/docs/proposals/loyalty-go-live-readiness.md b/docs/proposals/loyalty-go-live-readiness.md index bda1934b..b2ce4614 100644 --- a/docs/proposals/loyalty-go-live-readiness.md +++ b/docs/proposals/loyalty-go-live-readiness.md @@ -429,6 +429,210 @@ depth is cheap. prospecting `tasks/__init__.py` missing import, other-module email audit. +## 2026-05-30 update — Test 5 widget i18n + cache-bust sweep + 401 storefront redirect + critical prod-readiness findings + +### Test 5 — customer dashboard surfaced 2 i18n defects + +After Test 5.1 (customer login) succeeded, `/account/dashboard` showed +two issues on FR locale: the Loyalty Rewards card was hardcoded English +("Loyalty Rewards" / "View your points & rewards" / "Points Balance") +and the Account Summary section had a raw `customers.customer_number` +key. + +Root cause for the card: `StorefrontDashboardCard` is populated by +widget providers (loyalty, orders), and the widget contract had no +language threading. Root cause for the raw key: the customers-module +locale JSON has a redundant top-level `"customers"` wrapper, so the +real resolvable path is `customers.customers.customer_number` (the +same double-prefix pattern as `loyalty.loyalty.wallet.apple`). + +Fix in `5f359283`: added `language` field to `WidgetContext`, customer +dashboard route passes `request.state.language`, loyalty and orders +widget providers translate server-side via the new `widget.*` namespace +in their locale files (4 locales each). Fixed the 8 single-prefix +references to use the actual double-prefix path. + +### Cache-busting audit — FE-024 had two real gaps + +User flagged that `?v=` was missing from many assets. Audit +traced it to two problems in the FE-024 architecture rule: + +1. The anti-pattern only matched `url_for('_static', ...)` mount + names — missed the bare `'static'` mount which is what every persona + `base.html` uses for shared JS / CSS / Tailwind output. +2. `base.html` files were in the rule's exception list — exactly the + files where most shared includes live. + +Fix in `3ce94683`: swept 5 persona `base.html` files + 15 standalone +templates (login, register, forgot/reset password, error pages, +onboarding, invitation-accept, admin module-info/config, etc.) — 53 +references for `.js`/`.css` files converted from raw `url_for('static', +...)` to `static_v(request, 'static', ...)`. Then tightened the FE-024 +rule to add an anti-pattern for the bare `'static'` mount and dropped +`base.html` from the exception list (kept `partials/`). Validator +baseline unchanged at 126 warnings, 0 FE-024 hits. + +### 401 → /account/login redirect on customer storefront + +User saw the loyalty dashboard render the "Rejoignez notre programme" +CTA even though they were enrolled. Diagnosis: the page route accepts +the customer cookie; JS then calls `/api/v1/storefront/loyalty/card` +which requires the Bearer token from `localStorage.customer_token`. The +stored token was stale, server returned 401, JS swallowed it, the +template's `x-show="!loading && !card"` branch fired with the join +CTA. + +Fix in `a0ae6388`: added `redirectIfCustomerAreaUnauthorized()` helper +to apiClient. On a `/account/*` page (and not on `/account/login`) it +sets `window.location.href = '/account/login?next='`. +Called from all three apiClient 401 handlers (request, requestFormData, +getBlob). Customer login now honours `?next=` (alongside the legacy +`?return=`). Also fixed `getToken()` and `clearTokens()` path detection +to recognise `/account/*` and `/api/v1/storefront/*` (was hardcoded to +`/shop/*` from before the migration to `/storefront`). Customer JWT +TTL is 30 minutes (`JWT_EXPIRE_MINUTES` env var, +`middleware/auth.py:75`). + +Followed up with `856db328` — removed the dead `/shop/` predicates +entirely. Pure dead-code cleanup, no behaviour change. + +### Loyalty redirect flicker — two-stage fix + +User repro'd by deleting `localStorage.customer_token` and F5'ing +`/account/loyalty` — saw the "Rejoignez..." CTA flash for ~half a +second before the redirect landed. Stage 1 (`b04b36a2`): flipped +`loading: false` → `loading: true` initial state in `loyalty-dashboard.js` +and `loyalty-history.js` so the template's `x-show="loading"` spinner +covers the in-flight window. NOT enough on its own — the API throw +triggered the caller's `.finally(() => loading = false)` *before* the +browser actually navigated, so Alpine re-rendered with the wrong +state mid-redirect. Stage 2 (`6564f138`): in all three apiClient 401 +handlers, return a never-resolving `new Promise(() => {})` instead of +throwing when the redirect helper returns true. Caller's `await` never +returns, `.finally` never fires, spinner stays up until navigation. + +### Login JS i18n sweep + +`bbb481aa` translated the "Welcome back to your shopping experience" +branding subtitle on `/account/login`. `c9fe7171` translated the three +remaining hardcoded Alpine toasts in the same template: +post-registration banner, post-login success toast, login-failure +fallback. Two new `auth.*` keys × 4 locales; the third reuses the +existing `auth.invalid_credentials`. + +### `.build-info` stale → new `scripts/deploy-api-only.sh` + +User repeatedly redeployed and refreshed but every redirect repro still +flickered. Eventually noticed in the browser console: +`loadCard https://.../js/loyalty-dashboard.js?v=acbe2eff:50` — the +`?v=` was yesterday's commit hash. Browser was serving cached pre-fix +JS because the cache-bust query never bumped. + +Root cause: `?v=` is computed by `templates_config._asset_version()` +from `app/core/build_info.py`, which reads `.build-info`. That file is +bind-mounted from the host and is only written by `scripts/deploy.sh` +(line 42–45). The manual `git pull && docker compose up --build api` +sequence everyone had been using never touched it, so `?v=` stayed +pinned at the last `deploy.sh` run's commit — even though every +intervening rebuild was correctly putting new code into the image. +Five hours of "is this even deployed?" debugging chased to root. + +`deploy.sh` itself wasn't a substitute because it's a CI/CD script — +stashes the working tree, runs alembic, restarts every service in the +`full` profile (db, redis, api, celery-worker, celery-beat, flower), +60s health budget. Heavy and disruptive for an api-only hotfix; the +narrower manual pattern is correct, it was just missing the +`.build-info` write. + +Built `scripts/deploy-api-only.sh` (`c13e8e29`) to fill the gap: +refuses if working tree is dirty, `git pull --ff-only`, writes +`.build-info`, `docker compose -f docker-compose.yml --profile full +up -d --build api` (api only — db/redis/celery untouched), tight 30s +health budget. Hetzner doc §16.5 split into 16.5a (code-only fix, +default to the new script) and 16.5b (full `deploy.sh` fallback for +migrations / Dockerfile / requirements changes). + +### 🔴 Critical prod-readiness findings — SG credential in git + alertmanager misconfigured post-SMTP-migration + +The new dirty-tree gate blocked the deploy because +`monitoring/alertmanager/alertmanager.yml` has local modifications on +prod. Diff inspection: + +```diff +- smtp_auth_password: '' # TODO: Paste your SG.xxx API key here ++ smtp_auth_password: 'SG.xxxxxxxxx' # TODO: Paste your SG.xxx API key here +``` + +Three production-readiness problems surfaced in one finding: + +1. **A SendGrid API key is pasted into a tracked git file on prod**, and + the in-repo template literally says "Paste your SG.xxx API key here" + next to the empty value — actively encouraging the anti-pattern. +2. **The `alertmanager` container has been Up 13 days**, started + *before* the credential was pasted (mtime 2026-05-29 01:09 UTC). + So the running alertmanager process is still using the old empty + `smtp_auth_password` from the file at container-start time. Any + alert that needs to send email today silently fails — alerting has + been broken for at least 13 days, probably longer. +3. **The SMTP migration earlier this year never touched + `alertmanager.yml`.** That migration only updated the app's + notification settings in the `email_settings` DB table; alertmanager + reads its own config from disk and was never updated. So even with + a properly-loaded credential, the config still points at SendGrid + instead of `mail1.myservices.hosting`. + +User decided to defer today's loyalty deploy and tackle the +alertmanager work as the first thing tomorrow — production-readiness +gate ranks over incremental Test 5 progress, and fixing the root +cause (credential out of git + correct SMTP smarthost + alertmanager +reload) means the deploy will run clean without `--skip-worktree` +gymnastics. + +### Status board delta + +- Step 6 (web user-journey E2E tests) — Tests 1 ✅, 2 ✅, 3 ✅, 4 ✅, + 5.0 ✅, **5.1 in progress** (login + dashboard work, blocked on + prod deploy of today's fixes which are queued on `gitea/master` but + not yet served because of the unrelated alertmanager dirty-tree + blocker). +- New step surfaced — **alerting infrastructure is silently broken + in production** (13+ days). Should be tracked as a go-live blocker; + prod is currently flying blind on alerting. + +### Carry over for next session + +User explicitly chose tomorrow's order: prod-readiness items 1+2 BEFORE +continuing Test 5. + +1. **Trace the SG credential paste origin** — user claims sole-developer + status but doesn't remember pasting. Grep shell history, check file + ownership, find when the credential was introduced. Understand the + path so it doesn't happen again. +2. **Update `alertmanager.yml`** for the SendGrid → SMTP migration that + never landed: `smtp_smarthost: 'mail1.myservices.hosting:587'`, + `smtp_auth_username: 'support@wizard.lu'`, the SMTP password from + `/admin/settings`. Then SIGHUP alertmanager to hot-reload + (`docker compose -f docker-compose.yml --profile full kill -s SIGHUP + alertmanager`). Verify with a synthetic alert that email delivery + actually works. +3. **Move credential out of git** — `git rm --cached + monitoring/alertmanager/alertmanager.yml`, add to `.gitignore`, + ship `monitoring/alertmanager/alertmanager.yml.example` as the + template (with empty placeholder + comment pointing at the deploy + doc for the real values). Closes the recurrence path. +4. **Deploy today's queued loyalty fixes** — with `alertmanager.yml` + gitignored, the working tree on prod is clean and `bash + scripts/deploy-api-only.sh` should run without the `--skip-worktree` + dance. Then verify `?v=c13e8e29` (or later) on rendered assets. +5. **Re-run the loyalty redirect repro** to confirm the flicker is + gone now that today's JS actually reaches the browser. +6. **Continue Test 5** from 5.1 → 5.2 (/account/loyalty, 168 pts) → + 5.3 (/account/loyalty/history). +7. **Standing backlog** (lower priority): DE/LB email template quality + sweep, transaction categories permissions audit, routing pass, + Hetzner doc check, B1-F unit tests, `prospecting/tasks/__init__.py`, + other-module email audit. + ## Status board | # | Pre-launch step | State | Notes |