docs(loyalty): record 2026-05-30 widget i18n + cache-bust + 401 redirect + alertmanager finding
All checks were successful
All checks were successful
Nine code commits shipped today (5f359283→c13e8e29) covering Test 5 widget/customer-module i18n, a 53-template cache-bust sweep with FE-024 rule tightening, the customer-storefront 401-to-/account/login redirect, the loyalty redirect-flicker fix, the login JS i18n sweep, and a new scripts/deploy-api-only.sh script + Hetzner §16.5 split. None of them are on prod yet — surfaced during the deploy that the new dirty-tree gate is correctly blocking on monitoring/alertmanager/ alertmanager.yml, which holds a SendGrid API key pasted into a tracked file. Knock-on finding: alertmanager has been running with stale empty SMTP config for 13+ days, AND the file still references SendGrid instead of the post-migration smarthost, so prod's alerting is silently broken. User opted to fix prod-readiness items first thing tomorrow before resuming Test 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -429,6 +429,210 @@ depth is cheap.
|
|||||||
prospecting `tasks/__init__.py` missing import, other-module email
|
prospecting `tasks/__init__.py` missing import, other-module email
|
||||||
audit.
|
audit.
|
||||||
|
|
||||||
|
## 2026-05-30 update — Test 5 widget i18n + cache-bust sweep + 401 storefront redirect + critical prod-readiness findings
|
||||||
|
|
||||||
|
### Test 5 — customer dashboard surfaced 2 i18n defects
|
||||||
|
|
||||||
|
After Test 5.1 (customer login) succeeded, `/account/dashboard` showed
|
||||||
|
two issues on FR locale: the Loyalty Rewards card was hardcoded English
|
||||||
|
("Loyalty Rewards" / "View your points & rewards" / "Points Balance")
|
||||||
|
and the Account Summary section had a raw `customers.customer_number`
|
||||||
|
key.
|
||||||
|
|
||||||
|
Root cause for the card: `StorefrontDashboardCard` is populated by
|
||||||
|
widget providers (loyalty, orders), and the widget contract had no
|
||||||
|
language threading. Root cause for the raw key: the customers-module
|
||||||
|
locale JSON has a redundant top-level `"customers"` wrapper, so the
|
||||||
|
real resolvable path is `customers.customers.customer_number` (the
|
||||||
|
same double-prefix pattern as `loyalty.loyalty.wallet.apple`).
|
||||||
|
|
||||||
|
Fix in `5f359283`: added `language` field to `WidgetContext`, customer
|
||||||
|
dashboard route passes `request.state.language`, loyalty and orders
|
||||||
|
widget providers translate server-side via the new `widget.*` namespace
|
||||||
|
in their locale files (4 locales each). Fixed the 8 single-prefix
|
||||||
|
references to use the actual double-prefix path.
|
||||||
|
|
||||||
|
### Cache-busting audit — FE-024 had two real gaps
|
||||||
|
|
||||||
|
User flagged that `?v=<commit-sha>` was missing from many assets. Audit
|
||||||
|
traced it to two problems in the FE-024 architecture rule:
|
||||||
|
|
||||||
|
1. The anti-pattern only matched `url_for('<module>_static', ...)` mount
|
||||||
|
names — missed the bare `'static'` mount which is what every persona
|
||||||
|
`base.html` uses for shared JS / CSS / Tailwind output.
|
||||||
|
2. `base.html` files were in the rule's exception list — exactly the
|
||||||
|
files where most shared includes live.
|
||||||
|
|
||||||
|
Fix in `3ce94683`: swept 5 persona `base.html` files + 15 standalone
|
||||||
|
templates (login, register, forgot/reset password, error pages,
|
||||||
|
onboarding, invitation-accept, admin module-info/config, etc.) — 53
|
||||||
|
references for `.js`/`.css` files converted from raw `url_for('static',
|
||||||
|
...)` to `static_v(request, 'static', ...)`. Then tightened the FE-024
|
||||||
|
rule to add an anti-pattern for the bare `'static'` mount and dropped
|
||||||
|
`base.html` from the exception list (kept `partials/`). Validator
|
||||||
|
baseline unchanged at 126 warnings, 0 FE-024 hits.
|
||||||
|
|
||||||
|
### 401 → /account/login redirect on customer storefront
|
||||||
|
|
||||||
|
User saw the loyalty dashboard render the "Rejoignez notre programme"
|
||||||
|
CTA even though they were enrolled. Diagnosis: the page route accepts
|
||||||
|
the customer cookie; JS then calls `/api/v1/storefront/loyalty/card`
|
||||||
|
which requires the Bearer token from `localStorage.customer_token`. The
|
||||||
|
stored token was stale, server returned 401, JS swallowed it, the
|
||||||
|
template's `x-show="!loading && !card"` branch fired with the join
|
||||||
|
CTA.
|
||||||
|
|
||||||
|
Fix in `a0ae6388`: added `redirectIfCustomerAreaUnauthorized()` helper
|
||||||
|
to apiClient. On a `/account/*` page (and not on `/account/login`) it
|
||||||
|
sets `window.location.href = '/account/login?next=<encoded-path>'`.
|
||||||
|
Called from all three apiClient 401 handlers (request, requestFormData,
|
||||||
|
getBlob). Customer login now honours `?next=` (alongside the legacy
|
||||||
|
`?return=`). Also fixed `getToken()` and `clearTokens()` path detection
|
||||||
|
to recognise `/account/*` and `/api/v1/storefront/*` (was hardcoded to
|
||||||
|
`/shop/*` from before the migration to `/storefront`). Customer JWT
|
||||||
|
TTL is 30 minutes (`JWT_EXPIRE_MINUTES` env var,
|
||||||
|
`middleware/auth.py:75`).
|
||||||
|
|
||||||
|
Followed up with `856db328` — removed the dead `/shop/` predicates
|
||||||
|
entirely. Pure dead-code cleanup, no behaviour change.
|
||||||
|
|
||||||
|
### Loyalty redirect flicker — two-stage fix
|
||||||
|
|
||||||
|
User repro'd by deleting `localStorage.customer_token` and F5'ing
|
||||||
|
`/account/loyalty` — saw the "Rejoignez..." CTA flash for ~half a
|
||||||
|
second before the redirect landed. Stage 1 (`b04b36a2`): flipped
|
||||||
|
`loading: false` → `loading: true` initial state in `loyalty-dashboard.js`
|
||||||
|
and `loyalty-history.js` so the template's `x-show="loading"` spinner
|
||||||
|
covers the in-flight window. NOT enough on its own — the API throw
|
||||||
|
triggered the caller's `.finally(() => loading = false)` *before* the
|
||||||
|
browser actually navigated, so Alpine re-rendered with the wrong
|
||||||
|
state mid-redirect. Stage 2 (`6564f138`): in all three apiClient 401
|
||||||
|
handlers, return a never-resolving `new Promise(() => {})` instead of
|
||||||
|
throwing when the redirect helper returns true. Caller's `await` never
|
||||||
|
returns, `.finally` never fires, spinner stays up until navigation.
|
||||||
|
|
||||||
|
### Login JS i18n sweep
|
||||||
|
|
||||||
|
`bbb481aa` translated the "Welcome back to your shopping experience"
|
||||||
|
branding subtitle on `/account/login`. `c9fe7171` translated the three
|
||||||
|
remaining hardcoded Alpine toasts in the same template:
|
||||||
|
post-registration banner, post-login success toast, login-failure
|
||||||
|
fallback. Two new `auth.*` keys × 4 locales; the third reuses the
|
||||||
|
existing `auth.invalid_credentials`.
|
||||||
|
|
||||||
|
### `.build-info` stale → new `scripts/deploy-api-only.sh`
|
||||||
|
|
||||||
|
User repeatedly redeployed and refreshed but every redirect repro still
|
||||||
|
flickered. Eventually noticed in the browser console:
|
||||||
|
`loadCard https://.../js/loyalty-dashboard.js?v=acbe2eff:50` — the
|
||||||
|
`?v=` was yesterday's commit hash. Browser was serving cached pre-fix
|
||||||
|
JS because the cache-bust query never bumped.
|
||||||
|
|
||||||
|
Root cause: `?v=` is computed by `templates_config._asset_version()`
|
||||||
|
from `app/core/build_info.py`, which reads `.build-info`. That file is
|
||||||
|
bind-mounted from the host and is only written by `scripts/deploy.sh`
|
||||||
|
(line 42–45). The manual `git pull && docker compose up --build api`
|
||||||
|
sequence everyone had been using never touched it, so `?v=` stayed
|
||||||
|
pinned at the last `deploy.sh` run's commit — even though every
|
||||||
|
intervening rebuild was correctly putting new code into the image.
|
||||||
|
Five hours of "is this even deployed?" debugging chased to root.
|
||||||
|
|
||||||
|
`deploy.sh` itself wasn't a substitute because it's a CI/CD script —
|
||||||
|
stashes the working tree, runs alembic, restarts every service in the
|
||||||
|
`full` profile (db, redis, api, celery-worker, celery-beat, flower),
|
||||||
|
60s health budget. Heavy and disruptive for an api-only hotfix; the
|
||||||
|
narrower manual pattern is correct, it was just missing the
|
||||||
|
`.build-info` write.
|
||||||
|
|
||||||
|
Built `scripts/deploy-api-only.sh` (`c13e8e29`) to fill the gap:
|
||||||
|
refuses if working tree is dirty, `git pull --ff-only`, writes
|
||||||
|
`.build-info`, `docker compose -f docker-compose.yml --profile full
|
||||||
|
up -d --build api` (api only — db/redis/celery untouched), tight 30s
|
||||||
|
health budget. Hetzner doc §16.5 split into 16.5a (code-only fix,
|
||||||
|
default to the new script) and 16.5b (full `deploy.sh` fallback for
|
||||||
|
migrations / Dockerfile / requirements changes).
|
||||||
|
|
||||||
|
### 🔴 Critical prod-readiness findings — SG credential in git + alertmanager misconfigured post-SMTP-migration
|
||||||
|
|
||||||
|
The new dirty-tree gate blocked the deploy because
|
||||||
|
`monitoring/alertmanager/alertmanager.yml` has local modifications on
|
||||||
|
prod. Diff inspection:
|
||||||
|
|
||||||
|
```diff
|
||||||
|
- smtp_auth_password: '' # TODO: Paste your SG.xxx API key here
|
||||||
|
+ smtp_auth_password: 'SG.xxxxxxxxx' # TODO: Paste your SG.xxx API key here
|
||||||
|
```
|
||||||
|
|
||||||
|
Three production-readiness problems surfaced in one finding:
|
||||||
|
|
||||||
|
1. **A SendGrid API key is pasted into a tracked git file on prod**, and
|
||||||
|
the in-repo template literally says "Paste your SG.xxx API key here"
|
||||||
|
next to the empty value — actively encouraging the anti-pattern.
|
||||||
|
2. **The `alertmanager` container has been Up 13 days**, started
|
||||||
|
*before* the credential was pasted (mtime 2026-05-29 01:09 UTC).
|
||||||
|
So the running alertmanager process is still using the old empty
|
||||||
|
`smtp_auth_password` from the file at container-start time. Any
|
||||||
|
alert that needs to send email today silently fails — alerting has
|
||||||
|
been broken for at least 13 days, probably longer.
|
||||||
|
3. **The SMTP migration earlier this year never touched
|
||||||
|
`alertmanager.yml`.** That migration only updated the app's
|
||||||
|
notification settings in the `email_settings` DB table; alertmanager
|
||||||
|
reads its own config from disk and was never updated. So even with
|
||||||
|
a properly-loaded credential, the config still points at SendGrid
|
||||||
|
instead of `mail1.myservices.hosting`.
|
||||||
|
|
||||||
|
User decided to defer today's loyalty deploy and tackle the
|
||||||
|
alertmanager work as the first thing tomorrow — production-readiness
|
||||||
|
gate ranks over incremental Test 5 progress, and fixing the root
|
||||||
|
cause (credential out of git + correct SMTP smarthost + alertmanager
|
||||||
|
reload) means the deploy will run clean without `--skip-worktree`
|
||||||
|
gymnastics.
|
||||||
|
|
||||||
|
### Status board delta
|
||||||
|
|
||||||
|
- Step 6 (web user-journey E2E tests) — Tests 1 ✅, 2 ✅, 3 ✅, 4 ✅,
|
||||||
|
5.0 ✅, **5.1 in progress** (login + dashboard work, blocked on
|
||||||
|
prod deploy of today's fixes which are queued on `gitea/master` but
|
||||||
|
not yet served because of the unrelated alertmanager dirty-tree
|
||||||
|
blocker).
|
||||||
|
- New step surfaced — **alerting infrastructure is silently broken
|
||||||
|
in production** (13+ days). Should be tracked as a go-live blocker;
|
||||||
|
prod is currently flying blind on alerting.
|
||||||
|
|
||||||
|
### Carry over for next session
|
||||||
|
|
||||||
|
User explicitly chose tomorrow's order: prod-readiness items 1+2 BEFORE
|
||||||
|
continuing Test 5.
|
||||||
|
|
||||||
|
1. **Trace the SG credential paste origin** — user claims sole-developer
|
||||||
|
status but doesn't remember pasting. Grep shell history, check file
|
||||||
|
ownership, find when the credential was introduced. Understand the
|
||||||
|
path so it doesn't happen again.
|
||||||
|
2. **Update `alertmanager.yml`** for the SendGrid → SMTP migration that
|
||||||
|
never landed: `smtp_smarthost: 'mail1.myservices.hosting:587'`,
|
||||||
|
`smtp_auth_username: 'support@wizard.lu'`, the SMTP password from
|
||||||
|
`/admin/settings`. Then SIGHUP alertmanager to hot-reload
|
||||||
|
(`docker compose -f docker-compose.yml --profile full kill -s SIGHUP
|
||||||
|
alertmanager`). Verify with a synthetic alert that email delivery
|
||||||
|
actually works.
|
||||||
|
3. **Move credential out of git** — `git rm --cached
|
||||||
|
monitoring/alertmanager/alertmanager.yml`, add to `.gitignore`,
|
||||||
|
ship `monitoring/alertmanager/alertmanager.yml.example` as the
|
||||||
|
template (with empty placeholder + comment pointing at the deploy
|
||||||
|
doc for the real values). Closes the recurrence path.
|
||||||
|
4. **Deploy today's queued loyalty fixes** — with `alertmanager.yml`
|
||||||
|
gitignored, the working tree on prod is clean and `bash
|
||||||
|
scripts/deploy-api-only.sh` should run without the `--skip-worktree`
|
||||||
|
dance. Then verify `?v=c13e8e29` (or later) on rendered assets.
|
||||||
|
5. **Re-run the loyalty redirect repro** to confirm the flicker is
|
||||||
|
gone now that today's JS actually reaches the browser.
|
||||||
|
6. **Continue Test 5** from 5.1 → 5.2 (/account/loyalty, 168 pts) →
|
||||||
|
5.3 (/account/loyalty/history).
|
||||||
|
7. **Standing backlog** (lower priority): DE/LB email template quality
|
||||||
|
sweep, transaction categories permissions audit, routing pass,
|
||||||
|
Hetzner doc check, B1-F unit tests, `prospecting/tasks/__init__.py`,
|
||||||
|
other-module email audit.
|
||||||
|
|
||||||
## Status board
|
## Status board
|
||||||
|
|
||||||
| # | Pre-launch step | State | Notes |
|
| # | Pre-launch step | State | Notes |
|
||||||
|
|||||||
Reference in New Issue
Block a user