docs(loyalty): record 2026-05-30 widget i18n + cache-bust + 401 redirect + alertmanager finding
All checks were successful
All checks were successful
Nine code commits shipped today (5f359283→c13e8e29) covering Test 5 widget/customer-module i18n, a 53-template cache-bust sweep with FE-024 rule tightening, the customer-storefront 401-to-/account/login redirect, the loyalty redirect-flicker fix, the login JS i18n sweep, and a new scripts/deploy-api-only.sh script + Hetzner §16.5 split. None of them are on prod yet — surfaced during the deploy that the new dirty-tree gate is correctly blocking on monitoring/alertmanager/ alertmanager.yml, which holds a SendGrid API key pasted into a tracked file. Knock-on finding: alertmanager has been running with stale empty SMTP config for 13+ days, AND the file still references SendGrid instead of the post-migration smarthost, so prod's alerting is silently broken. User opted to fix prod-readiness items first thing tomorrow before resuming Test 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -429,6 +429,210 @@ depth is cheap.
|
||||
prospecting `tasks/__init__.py` missing import, other-module email
|
||||
audit.
|
||||
|
||||
## 2026-05-30 update — Test 5 widget i18n + cache-bust sweep + 401 storefront redirect + critical prod-readiness findings
|
||||
|
||||
### Test 5 — customer dashboard surfaced 2 i18n defects
|
||||
|
||||
After Test 5.1 (customer login) succeeded, `/account/dashboard` showed
|
||||
two issues on FR locale: the Loyalty Rewards card was hardcoded English
|
||||
("Loyalty Rewards" / "View your points & rewards" / "Points Balance")
|
||||
and the Account Summary section had a raw `customers.customer_number`
|
||||
key.
|
||||
|
||||
Root cause for the card: `StorefrontDashboardCard` is populated by
|
||||
widget providers (loyalty, orders), and the widget contract had no
|
||||
language threading. Root cause for the raw key: the customers-module
|
||||
locale JSON has a redundant top-level `"customers"` wrapper, so the
|
||||
real resolvable path is `customers.customers.customer_number` (the
|
||||
same double-prefix pattern as `loyalty.loyalty.wallet.apple`).
|
||||
|
||||
Fix in `5f359283`: added `language` field to `WidgetContext`, customer
|
||||
dashboard route passes `request.state.language`, loyalty and orders
|
||||
widget providers translate server-side via the new `widget.*` namespace
|
||||
in their locale files (4 locales each). Fixed the 8 single-prefix
|
||||
references to use the actual double-prefix path.
|
||||
|
||||
### Cache-busting audit — FE-024 had two real gaps
|
||||
|
||||
User flagged that `?v=<commit-sha>` was missing from many assets. Audit
|
||||
traced it to two problems in the FE-024 architecture rule:
|
||||
|
||||
1. The anti-pattern only matched `url_for('<module>_static', ...)` mount
|
||||
names — missed the bare `'static'` mount which is what every persona
|
||||
`base.html` uses for shared JS / CSS / Tailwind output.
|
||||
2. `base.html` files were in the rule's exception list — exactly the
|
||||
files where most shared includes live.
|
||||
|
||||
Fix in `3ce94683`: swept 5 persona `base.html` files + 15 standalone
|
||||
templates (login, register, forgot/reset password, error pages,
|
||||
onboarding, invitation-accept, admin module-info/config, etc.) — 53
|
||||
references for `.js`/`.css` files converted from raw `url_for('static',
|
||||
...)` to `static_v(request, 'static', ...)`. Then tightened the FE-024
|
||||
rule to add an anti-pattern for the bare `'static'` mount and dropped
|
||||
`base.html` from the exception list (kept `partials/`). Validator
|
||||
baseline unchanged at 126 warnings, 0 FE-024 hits.
|
||||
|
||||
### 401 → /account/login redirect on customer storefront
|
||||
|
||||
User saw the loyalty dashboard render the "Rejoignez notre programme"
|
||||
CTA even though they were enrolled. Diagnosis: the page route accepts
|
||||
the customer cookie; JS then calls `/api/v1/storefront/loyalty/card`
|
||||
which requires the Bearer token from `localStorage.customer_token`. The
|
||||
stored token was stale, server returned 401, JS swallowed it, the
|
||||
template's `x-show="!loading && !card"` branch fired with the join
|
||||
CTA.
|
||||
|
||||
Fix in `a0ae6388`: added `redirectIfCustomerAreaUnauthorized()` helper
|
||||
to apiClient. On a `/account/*` page (and not on `/account/login`) it
|
||||
sets `window.location.href = '/account/login?next=<encoded-path>'`.
|
||||
Called from all three apiClient 401 handlers (request, requestFormData,
|
||||
getBlob). Customer login now honours `?next=` (alongside the legacy
|
||||
`?return=`). Also fixed `getToken()` and `clearTokens()` path detection
|
||||
to recognise `/account/*` and `/api/v1/storefront/*` (was hardcoded to
|
||||
`/shop/*` from before the migration to `/storefront`). Customer JWT
|
||||
TTL is 30 minutes (`JWT_EXPIRE_MINUTES` env var,
|
||||
`middleware/auth.py:75`).
|
||||
|
||||
Followed up with `856db328` — removed the dead `/shop/` predicates
|
||||
entirely. Pure dead-code cleanup, no behaviour change.
|
||||
|
||||
### Loyalty redirect flicker — two-stage fix
|
||||
|
||||
User repro'd by deleting `localStorage.customer_token` and F5'ing
|
||||
`/account/loyalty` — saw the "Rejoignez..." CTA flash for ~half a
|
||||
second before the redirect landed. Stage 1 (`b04b36a2`): flipped
|
||||
`loading: false` → `loading: true` initial state in `loyalty-dashboard.js`
|
||||
and `loyalty-history.js` so the template's `x-show="loading"` spinner
|
||||
covers the in-flight window. NOT enough on its own — the API throw
|
||||
triggered the caller's `.finally(() => loading = false)` *before* the
|
||||
browser actually navigated, so Alpine re-rendered with the wrong
|
||||
state mid-redirect. Stage 2 (`6564f138`): in all three apiClient 401
|
||||
handlers, return a never-resolving `new Promise(() => {})` instead of
|
||||
throwing when the redirect helper returns true. Caller's `await` never
|
||||
returns, `.finally` never fires, spinner stays up until navigation.
|
||||
|
||||
### Login JS i18n sweep
|
||||
|
||||
`bbb481aa` translated the "Welcome back to your shopping experience"
|
||||
branding subtitle on `/account/login`. `c9fe7171` translated the three
|
||||
remaining hardcoded Alpine toasts in the same template:
|
||||
post-registration banner, post-login success toast, login-failure
|
||||
fallback. Two new `auth.*` keys × 4 locales; the third reuses the
|
||||
existing `auth.invalid_credentials`.
|
||||
|
||||
### `.build-info` stale → new `scripts/deploy-api-only.sh`
|
||||
|
||||
User repeatedly redeployed and refreshed but every redirect repro still
|
||||
flickered. Eventually noticed in the browser console:
|
||||
`loadCard https://.../js/loyalty-dashboard.js?v=acbe2eff:50` — the
|
||||
`?v=` was yesterday's commit hash. Browser was serving cached pre-fix
|
||||
JS because the cache-bust query never bumped.
|
||||
|
||||
Root cause: `?v=` is computed by `templates_config._asset_version()`
|
||||
from `app/core/build_info.py`, which reads `.build-info`. That file is
|
||||
bind-mounted from the host and is only written by `scripts/deploy.sh`
|
||||
(line 42–45). The manual `git pull && docker compose up --build api`
|
||||
sequence everyone had been using never touched it, so `?v=` stayed
|
||||
pinned at the last `deploy.sh` run's commit — even though every
|
||||
intervening rebuild was correctly putting new code into the image.
|
||||
Five hours of "is this even deployed?" debugging chased to root.
|
||||
|
||||
`deploy.sh` itself wasn't a substitute because it's a CI/CD script —
|
||||
stashes the working tree, runs alembic, restarts every service in the
|
||||
`full` profile (db, redis, api, celery-worker, celery-beat, flower),
|
||||
60s health budget. Heavy and disruptive for an api-only hotfix; the
|
||||
narrower manual pattern is correct, it was just missing the
|
||||
`.build-info` write.
|
||||
|
||||
Built `scripts/deploy-api-only.sh` (`c13e8e29`) to fill the gap:
|
||||
refuses if working tree is dirty, `git pull --ff-only`, writes
|
||||
`.build-info`, `docker compose -f docker-compose.yml --profile full
|
||||
up -d --build api` (api only — db/redis/celery untouched), tight 30s
|
||||
health budget. Hetzner doc §16.5 split into 16.5a (code-only fix,
|
||||
default to the new script) and 16.5b (full `deploy.sh` fallback for
|
||||
migrations / Dockerfile / requirements changes).
|
||||
|
||||
### 🔴 Critical prod-readiness findings — SG credential in git + alertmanager misconfigured post-SMTP-migration
|
||||
|
||||
The new dirty-tree gate blocked the deploy because
|
||||
`monitoring/alertmanager/alertmanager.yml` has local modifications on
|
||||
prod. Diff inspection:
|
||||
|
||||
```diff
|
||||
- smtp_auth_password: '' # TODO: Paste your SG.xxx API key here
|
||||
+ smtp_auth_password: 'SG.xxxxxxxxx' # TODO: Paste your SG.xxx API key here
|
||||
```
|
||||
|
||||
Three production-readiness problems surfaced in one finding:
|
||||
|
||||
1. **A SendGrid API key is pasted into a tracked git file on prod**, and
|
||||
the in-repo template literally says "Paste your SG.xxx API key here"
|
||||
next to the empty value — actively encouraging the anti-pattern.
|
||||
2. **The `alertmanager` container has been Up 13 days**, started
|
||||
*before* the credential was pasted (mtime 2026-05-29 01:09 UTC).
|
||||
So the running alertmanager process is still using the old empty
|
||||
`smtp_auth_password` from the file at container-start time. Any
|
||||
alert that needs to send email today silently fails — alerting has
|
||||
been broken for at least 13 days, probably longer.
|
||||
3. **The SMTP migration earlier this year never touched
|
||||
`alertmanager.yml`.** That migration only updated the app's
|
||||
notification settings in the `email_settings` DB table; alertmanager
|
||||
reads its own config from disk and was never updated. So even with
|
||||
a properly-loaded credential, the config still points at SendGrid
|
||||
instead of `mail1.myservices.hosting`.
|
||||
|
||||
User decided to defer today's loyalty deploy and tackle the
|
||||
alertmanager work as the first thing tomorrow — production-readiness
|
||||
gate ranks over incremental Test 5 progress, and fixing the root
|
||||
cause (credential out of git + correct SMTP smarthost + alertmanager
|
||||
reload) means the deploy will run clean without `--skip-worktree`
|
||||
gymnastics.
|
||||
|
||||
### Status board delta
|
||||
|
||||
- Step 6 (web user-journey E2E tests) — Tests 1 ✅, 2 ✅, 3 ✅, 4 ✅,
|
||||
5.0 ✅, **5.1 in progress** (login + dashboard work, blocked on
|
||||
prod deploy of today's fixes which are queued on `gitea/master` but
|
||||
not yet served because of the unrelated alertmanager dirty-tree
|
||||
blocker).
|
||||
- New step surfaced — **alerting infrastructure is silently broken
|
||||
in production** (13+ days). Should be tracked as a go-live blocker;
|
||||
prod is currently flying blind on alerting.
|
||||
|
||||
### Carry over for next session
|
||||
|
||||
User explicitly chose tomorrow's order: prod-readiness items 1+2 BEFORE
|
||||
continuing Test 5.
|
||||
|
||||
1. **Trace the SG credential paste origin** — user claims sole-developer
|
||||
status but doesn't remember pasting. Grep shell history, check file
|
||||
ownership, find when the credential was introduced. Understand the
|
||||
path so it doesn't happen again.
|
||||
2. **Update `alertmanager.yml`** for the SendGrid → SMTP migration that
|
||||
never landed: `smtp_smarthost: 'mail1.myservices.hosting:587'`,
|
||||
`smtp_auth_username: 'support@wizard.lu'`, the SMTP password from
|
||||
`/admin/settings`. Then SIGHUP alertmanager to hot-reload
|
||||
(`docker compose -f docker-compose.yml --profile full kill -s SIGHUP
|
||||
alertmanager`). Verify with a synthetic alert that email delivery
|
||||
actually works.
|
||||
3. **Move credential out of git** — `git rm --cached
|
||||
monitoring/alertmanager/alertmanager.yml`, add to `.gitignore`,
|
||||
ship `monitoring/alertmanager/alertmanager.yml.example` as the
|
||||
template (with empty placeholder + comment pointing at the deploy
|
||||
doc for the real values). Closes the recurrence path.
|
||||
4. **Deploy today's queued loyalty fixes** — with `alertmanager.yml`
|
||||
gitignored, the working tree on prod is clean and `bash
|
||||
scripts/deploy-api-only.sh` should run without the `--skip-worktree`
|
||||
dance. Then verify `?v=c13e8e29` (or later) on rendered assets.
|
||||
5. **Re-run the loyalty redirect repro** to confirm the flicker is
|
||||
gone now that today's JS actually reaches the browser.
|
||||
6. **Continue Test 5** from 5.1 → 5.2 (/account/loyalty, 168 pts) →
|
||||
5.3 (/account/loyalty/history).
|
||||
7. **Standing backlog** (lower priority): DE/LB email template quality
|
||||
sweep, transaction categories permissions audit, routing pass,
|
||||
Hetzner doc check, B1-F unit tests, `prospecting/tasks/__init__.py`,
|
||||
other-module email audit.
|
||||
|
||||
## Status board
|
||||
|
||||
| # | Pre-launch step | State | Notes |
|
||||
|
||||
Reference in New Issue
Block a user