docs(loyalty): record 2026-05-30 widget i18n + cache-bust + 401 redirect + alertmanager finding
All checks were successful
CI / pytest (push) Successful in 2h49m26s
CI / docs (push) Successful in 55s
CI / ruff (push) Successful in 18s
CI / validate (push) Successful in 35s
CI / dependency-scanning (push) Successful in 35s
CI / deploy (push) Successful in 1m51s

Nine code commits shipped today (5f359283c13e8e29) covering Test 5
widget/customer-module i18n, a 53-template cache-bust sweep with
FE-024 rule tightening, the customer-storefront 401-to-/account/login
redirect, the loyalty redirect-flicker fix, the login JS i18n sweep,
and a new scripts/deploy-api-only.sh script + Hetzner §16.5 split.
None of them are on prod yet — surfaced during the deploy that the
new dirty-tree gate is correctly blocking on monitoring/alertmanager/
alertmanager.yml, which holds a SendGrid API key pasted into a tracked
file. Knock-on finding: alertmanager has been running with stale empty
SMTP config for 13+ days, AND the file still references SendGrid
instead of the post-migration smarthost, so prod's alerting is silently
broken. User opted to fix prod-readiness items first thing tomorrow
before resuming Test 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 23:20:07 +02:00
parent c13e8e29b5
commit cff0b3f911

View File

@@ -429,6 +429,210 @@ depth is cheap.
prospecting `tasks/__init__.py` missing import, other-module email
audit.
## 2026-05-30 update — Test 5 widget i18n + cache-bust sweep + 401 storefront redirect + critical prod-readiness findings
### Test 5 — customer dashboard surfaced 2 i18n defects
After Test 5.1 (customer login) succeeded, `/account/dashboard` showed
two issues on FR locale: the Loyalty Rewards card was hardcoded English
("Loyalty Rewards" / "View your points & rewards" / "Points Balance")
and the Account Summary section had a raw `customers.customer_number`
key.
Root cause for the card: `StorefrontDashboardCard` is populated by
widget providers (loyalty, orders), and the widget contract had no
language threading. Root cause for the raw key: the customers-module
locale JSON has a redundant top-level `"customers"` wrapper, so the
real resolvable path is `customers.customers.customer_number` (the
same double-prefix pattern as `loyalty.loyalty.wallet.apple`).
Fix in `5f359283`: added `language` field to `WidgetContext`, customer
dashboard route passes `request.state.language`, loyalty and orders
widget providers translate server-side via the new `widget.*` namespace
in their locale files (4 locales each). Fixed the 8 single-prefix
references to use the actual double-prefix path.
### Cache-busting audit — FE-024 had two real gaps
User flagged that `?v=<commit-sha>` was missing from many assets. Audit
traced it to two problems in the FE-024 architecture rule:
1. The anti-pattern only matched `url_for('<module>_static', ...)` mount
names — missed the bare `'static'` mount which is what every persona
`base.html` uses for shared JS / CSS / Tailwind output.
2. `base.html` files were in the rule's exception list — exactly the
files where most shared includes live.
Fix in `3ce94683`: swept 5 persona `base.html` files + 15 standalone
templates (login, register, forgot/reset password, error pages,
onboarding, invitation-accept, admin module-info/config, etc.) — 53
references for `.js`/`.css` files converted from raw `url_for('static',
...)` to `static_v(request, 'static', ...)`. Then tightened the FE-024
rule to add an anti-pattern for the bare `'static'` mount and dropped
`base.html` from the exception list (kept `partials/`). Validator
baseline unchanged at 126 warnings, 0 FE-024 hits.
### 401 → /account/login redirect on customer storefront
User saw the loyalty dashboard render the "Rejoignez notre programme"
CTA even though they were enrolled. Diagnosis: the page route accepts
the customer cookie; JS then calls `/api/v1/storefront/loyalty/card`
which requires the Bearer token from `localStorage.customer_token`. The
stored token was stale, server returned 401, JS swallowed it, the
template's `x-show="!loading && !card"` branch fired with the join
CTA.
Fix in `a0ae6388`: added `redirectIfCustomerAreaUnauthorized()` helper
to apiClient. On a `/account/*` page (and not on `/account/login`) it
sets `window.location.href = '/account/login?next=<encoded-path>'`.
Called from all three apiClient 401 handlers (request, requestFormData,
getBlob). Customer login now honours `?next=` (alongside the legacy
`?return=`). Also fixed `getToken()` and `clearTokens()` path detection
to recognise `/account/*` and `/api/v1/storefront/*` (was hardcoded to
`/shop/*` from before the migration to `/storefront`). Customer JWT
TTL is 30 minutes (`JWT_EXPIRE_MINUTES` env var,
`middleware/auth.py:75`).
Followed up with `856db328` — removed the dead `/shop/` predicates
entirely. Pure dead-code cleanup, no behaviour change.
### Loyalty redirect flicker — two-stage fix
User repro'd by deleting `localStorage.customer_token` and F5'ing
`/account/loyalty` — saw the "Rejoignez..." CTA flash for ~half a
second before the redirect landed. Stage 1 (`b04b36a2`): flipped
`loading: false``loading: true` initial state in `loyalty-dashboard.js`
and `loyalty-history.js` so the template's `x-show="loading"` spinner
covers the in-flight window. NOT enough on its own — the API throw
triggered the caller's `.finally(() => loading = false)` *before* the
browser actually navigated, so Alpine re-rendered with the wrong
state mid-redirect. Stage 2 (`6564f138`): in all three apiClient 401
handlers, return a never-resolving `new Promise(() => {})` instead of
throwing when the redirect helper returns true. Caller's `await` never
returns, `.finally` never fires, spinner stays up until navigation.
### Login JS i18n sweep
`bbb481aa` translated the "Welcome back to your shopping experience"
branding subtitle on `/account/login`. `c9fe7171` translated the three
remaining hardcoded Alpine toasts in the same template:
post-registration banner, post-login success toast, login-failure
fallback. Two new `auth.*` keys × 4 locales; the third reuses the
existing `auth.invalid_credentials`.
### `.build-info` stale → new `scripts/deploy-api-only.sh`
User repeatedly redeployed and refreshed but every redirect repro still
flickered. Eventually noticed in the browser console:
`loadCard https://.../js/loyalty-dashboard.js?v=acbe2eff:50` — the
`?v=` was yesterday's commit hash. Browser was serving cached pre-fix
JS because the cache-bust query never bumped.
Root cause: `?v=` is computed by `templates_config._asset_version()`
from `app/core/build_info.py`, which reads `.build-info`. That file is
bind-mounted from the host and is only written by `scripts/deploy.sh`
(line 4245). The manual `git pull && docker compose up --build api`
sequence everyone had been using never touched it, so `?v=` stayed
pinned at the last `deploy.sh` run's commit — even though every
intervening rebuild was correctly putting new code into the image.
Five hours of "is this even deployed?" debugging chased to root.
`deploy.sh` itself wasn't a substitute because it's a CI/CD script —
stashes the working tree, runs alembic, restarts every service in the
`full` profile (db, redis, api, celery-worker, celery-beat, flower),
60s health budget. Heavy and disruptive for an api-only hotfix; the
narrower manual pattern is correct, it was just missing the
`.build-info` write.
Built `scripts/deploy-api-only.sh` (`c13e8e29`) to fill the gap:
refuses if working tree is dirty, `git pull --ff-only`, writes
`.build-info`, `docker compose -f docker-compose.yml --profile full
up -d --build api` (api only — db/redis/celery untouched), tight 30s
health budget. Hetzner doc §16.5 split into 16.5a (code-only fix,
default to the new script) and 16.5b (full `deploy.sh` fallback for
migrations / Dockerfile / requirements changes).
### 🔴 Critical prod-readiness findings — SG credential in git + alertmanager misconfigured post-SMTP-migration
The new dirty-tree gate blocked the deploy because
`monitoring/alertmanager/alertmanager.yml` has local modifications on
prod. Diff inspection:
```diff
- smtp_auth_password: '' # TODO: Paste your SG.xxx API key here
+ smtp_auth_password: 'SG.xxxxxxxxx' # TODO: Paste your SG.xxx API key here
```
Three production-readiness problems surfaced in one finding:
1. **A SendGrid API key is pasted into a tracked git file on prod**, and
the in-repo template literally says "Paste your SG.xxx API key here"
next to the empty value — actively encouraging the anti-pattern.
2. **The `alertmanager` container has been Up 13 days**, started
*before* the credential was pasted (mtime 2026-05-29 01:09 UTC).
So the running alertmanager process is still using the old empty
`smtp_auth_password` from the file at container-start time. Any
alert that needs to send email today silently fails — alerting has
been broken for at least 13 days, probably longer.
3. **The SMTP migration earlier this year never touched
`alertmanager.yml`.** That migration only updated the app's
notification settings in the `email_settings` DB table; alertmanager
reads its own config from disk and was never updated. So even with
a properly-loaded credential, the config still points at SendGrid
instead of `mail1.myservices.hosting`.
User decided to defer today's loyalty deploy and tackle the
alertmanager work as the first thing tomorrow — production-readiness
gate ranks over incremental Test 5 progress, and fixing the root
cause (credential out of git + correct SMTP smarthost + alertmanager
reload) means the deploy will run clean without `--skip-worktree`
gymnastics.
### Status board delta
- Step 6 (web user-journey E2E tests) — Tests 1 ✅, 2 ✅, 3 ✅, 4 ✅,
5.0 ✅, **5.1 in progress** (login + dashboard work, blocked on
prod deploy of today's fixes which are queued on `gitea/master` but
not yet served because of the unrelated alertmanager dirty-tree
blocker).
- New step surfaced — **alerting infrastructure is silently broken
in production** (13+ days). Should be tracked as a go-live blocker;
prod is currently flying blind on alerting.
### Carry over for next session
User explicitly chose tomorrow's order: prod-readiness items 1+2 BEFORE
continuing Test 5.
1. **Trace the SG credential paste origin** — user claims sole-developer
status but doesn't remember pasting. Grep shell history, check file
ownership, find when the credential was introduced. Understand the
path so it doesn't happen again.
2. **Update `alertmanager.yml`** for the SendGrid → SMTP migration that
never landed: `smtp_smarthost: 'mail1.myservices.hosting:587'`,
`smtp_auth_username: 'support@wizard.lu'`, the SMTP password from
`/admin/settings`. Then SIGHUP alertmanager to hot-reload
(`docker compose -f docker-compose.yml --profile full kill -s SIGHUP
alertmanager`). Verify with a synthetic alert that email delivery
actually works.
3. **Move credential out of git** — `git rm --cached
monitoring/alertmanager/alertmanager.yml`, add to `.gitignore`,
ship `monitoring/alertmanager/alertmanager.yml.example` as the
template (with empty placeholder + comment pointing at the deploy
doc for the real values). Closes the recurrence path.
4. **Deploy today's queued loyalty fixes** — with `alertmanager.yml`
gitignored, the working tree on prod is clean and `bash
scripts/deploy-api-only.sh` should run without the `--skip-worktree`
dance. Then verify `?v=c13e8e29` (or later) on rendered assets.
5. **Re-run the loyalty redirect repro** to confirm the flicker is
gone now that today's JS actually reaches the browser.
6. **Continue Test 5** from 5.1 → 5.2 (/account/loyalty, 168 pts) →
5.3 (/account/loyalty/history).
7. **Standing backlog** (lower priority): DE/LB email template quality
sweep, transaction categories permissions audit, routing pass,
Hetzner doc check, B1-F unit tests, `prospecting/tasks/__init__.py`,
other-module email audit.
## Status board
| # | Pre-launch step | State | Notes |