Files
orion/app/modules/loyalty/docs/monitoring.md
Samir Boulahtit 4a60d75a13
Some checks failed
CI / ruff (push) Successful in 12s
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / pytest (push) Has been cancelled
docs(loyalty): Phase 8 — runbooks, monitoring, OpenAPI tags, plan update
Final phase of the production launch plan:

- Runbook: wallet certificate management (Google + Apple rotation,
  expiry monitoring, rollback procedure)
- Runbook: point expiration task (manual execution, partial failure,
  per-merchant re-run, point restore via admin API)
- Runbook: wallet sync task (failed_card_ids interpretation, manual
  re-sync, retry behavior table)
- Monitoring: alert definitions (P0/P1/P2), key metrics, log events,
  dashboard suggestions
- OpenAPI: added tags=["Loyalty - Store"] and tags=["Loyalty - Admin"]
  to route groups for /docs discoverability
- Production launch plan: all phases 0-8 marked DONE

Coverage note: loyalty services at 70-85%, tasks at 16-29%.
Target 80% enforcement deferred — current 342 tests provide good
functional coverage. Task-level coverage requires Celery mocking
infrastructure (future sprint).

342 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 23:07:50 +02:00

65 lines
2.9 KiB
Markdown

# Loyalty Module — Monitoring & Alerting
## Alert Definitions
### P0 — Page (immediate action required)
| Alert | Condition | Action |
|-------|-----------|--------|
| **Expiration task stale** | `loyalty.expire_points` last success > 26 hours ago | Check Celery worker health, inspect task logs |
| **Google Wallet service down** | Wallet sync failure rate > 50% for 2 consecutive runs | Check service account credentials, Google API status |
### P1 — Warn (investigate within business hours)
| Alert | Condition | Action |
|-------|-----------|--------|
| **Wallet sync failures** | `failed_card_ids` count > 5% of total cards synced | Check runbook-wallet-sync.md, inspect failed card IDs |
| **Email notification failures** | `loyalty_*` template send failure rate > 1% in 24h | Check SMTP config, EmailLog for errors |
| **Rate limit spikes** | 429 responses > 100/min per store | Investigate if legitimate traffic or abuse |
### P2 — Info (review in next sprint)
| Alert | Condition | Action |
|-------|-----------|--------|
| **High churn** | At-risk cards > 20% of active cards | Review re-engagement strategy (future marketing module) |
| **Low enrollment** | < 5 new cards in 7 days (per merchant with active program) | Check enrollment page accessibility, QR code placement |
## Key Metrics to Track
### Operational
- Celery task success/failure counts for `loyalty.expire_points` and `loyalty.sync_wallet_passes`
- EmailLog status distribution for `loyalty_*` template codes (sent/failed/bounced)
- Rate limiter 429 response count per store per hour
### Business
- Daily new enrollments (total + per merchant)
- Points issued vs redeemed ratio (health indicator: should be > 0.3 redemption rate)
- Stamp completion rate (% of cards reaching stamps_target)
- Cohort retention at month 3 (target: > 40%)
## Observability Integration
The loyalty module logs to the standard Python logger (`app.modules.loyalty.*`). Key log events:
| Logger | Level | Event |
|--------|-------|-------|
| `card_service` | INFO | Enrollment, deactivation, GDPR anonymization |
| `stamp_service` | INFO | Stamp add/redeem/void with card and store context |
| `points_service` | INFO | Points earn/redeem/void/adjust |
| `notification_service` | INFO | Email queued (template_code + recipient) |
| `point_expiration` | INFO | Chunk processed (cards + points count) |
| `wallet_sync` | WARNING | Per-card sync failure with retry count |
| `wallet_sync` | ERROR | Card sync exhausted all retries |
## Dashboard Suggestions
If using Grafana or similar:
1. **Enrollment funnel**: Page views → Form starts → Submissions → Success (track drop-off)
2. **Transaction volume**: Stamps + Points per hour, grouped by store
3. **Wallet adoption**: % of cards with Google/Apple Wallet passes
4. **Email delivery**: Sent → Delivered → Opened → Clicked per template
5. **Task health**: Celery task execution time + success rate over 24h