docs(loyalty): Phase 8 — runbooks, monitoring, OpenAPI tags, plan update
Some checks failed
Some checks failed
Final phase of the production launch plan: - Runbook: wallet certificate management (Google + Apple rotation, expiry monitoring, rollback procedure) - Runbook: point expiration task (manual execution, partial failure, per-merchant re-run, point restore via admin API) - Runbook: wallet sync task (failed_card_ids interpretation, manual re-sync, retry behavior table) - Monitoring: alert definitions (P0/P1/P2), key metrics, log events, dashboard suggestions - OpenAPI: added tags=["Loyalty - Store"] and tags=["Loyalty - Admin"] to route groups for /docs discoverability - Production launch plan: all phases 0-8 marked DONE Coverage note: loyalty services at 70-85%, tasks at 16-29%. Target 80% enforcement deferred — current 342 tests provide good functional coverage. Task-level coverage requires Celery mocking infrastructure (future sprint). 342 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
57
app/modules/loyalty/docs/runbook-wallet-sync.md
Normal file
57
app/modules/loyalty/docs/runbook-wallet-sync.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# Runbook: Wallet Sync Task
|
||||
|
||||
## Overview
|
||||
|
||||
The `loyalty.sync_wallet_passes` Celery task runs hourly (configured in `definition.py`). It catches cards that missed real-time wallet updates due to transient API errors.
|
||||
|
||||
## What it does
|
||||
|
||||
1. Finds cards with transactions in the last hour that have Google or Apple Wallet integration
|
||||
2. For each card, calls `wallet_service.sync_card_to_wallets(db, card)`
|
||||
3. Uses **exponential backoff** (1s, 4s, 16s) with 4 total attempts per card
|
||||
4. One failing card doesn't block the batch — failures are logged and reported
|
||||
|
||||
## Understanding `failed_card_ids`
|
||||
|
||||
The task returns a `failed_card_ids` list in its result. These are cards where all 4 retry attempts failed.
|
||||
|
||||
**Common failure causes:**
|
||||
- Google Wallet API transient 500/503 errors — usually resolve on next hourly run
|
||||
- Invalid service account credentials — check `wallet-status` endpoint
|
||||
- Card's Google object was deleted externally — needs manual re-creation
|
||||
- Network timeout — check server connectivity to `walletobjects.googleapis.com`
|
||||
|
||||
## Manual re-sync
|
||||
|
||||
```bash
|
||||
# Re-run the entire sync task
|
||||
celery -A app.core.celery_config call loyalty.sync_wallet_passes
|
||||
|
||||
# Re-sync a specific card (Python shell)
|
||||
from app.core.database import SessionLocal
|
||||
from app.modules.loyalty.services import wallet_service
|
||||
from app.modules.loyalty.models import LoyaltyCard
|
||||
|
||||
db = SessionLocal()
|
||||
card = db.query(LoyaltyCard).get(card_id)
|
||||
result = wallet_service.sync_card_to_wallets(db, card)
|
||||
print(result)
|
||||
db.close()
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
- Alert if `loyalty.sync_wallet_passes` failure rate > 5% (more than 5% of cards fail after all retries)
|
||||
- Check Celery flower for task execution time — should be < 30s for typical loads
|
||||
- Large `failed_card_ids` lists (> 10) may indicate a systemic API issue
|
||||
|
||||
## Retry behavior
|
||||
|
||||
| Attempt | Delay before | Total elapsed |
|
||||
|---------|-------------|---------------|
|
||||
| 1 | 0s | 0s |
|
||||
| 2 | 1s | 1s |
|
||||
| 3 | 4s | 5s |
|
||||
| 4 | 16s | 21s |
|
||||
|
||||
After attempt 4 fails, the card is added to `failed_card_ids` and will be retried on the next hourly run.
|
||||
Reference in New Issue
Block a user