Commit Graph

23 Commits

Author SHA1 Message Date
d96e0ea1b4 ops: rebalance container memory limits (node-exporter 32m, cadvisor 192m)
Some checks failed
CI / ruff (push) Successful in 12s
CI / validate (push) Successful in 25s
CI / dependency-scanning (push) Successful in 29s
CI / pytest (push) Failing after 3h10m28s
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
node-exporter only uses ~20MB so 32m is safe. cadvisor was at 98% of
128m and crashing during CI runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 23:31:55 +01:00
8136739233 feat(docker): add healthchecks for celery-beat and node-exporter
All checks were successful
CI / pytest (push) Successful in 48m59s
CI / deploy (push) Successful in 49s
CI / ruff (push) Successful in 11s
CI / validate (push) Successful in 25s
CI / dependency-scanning (push) Successful in 29s
CI / docs (push) Successful in 41s
celery-beat: check that the schedule file was modified within the last
120 seconds (confirms beat is ticking).
node-exporter: check /metrics endpoint for build_info metric.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 23:56:56 +01:00
2ca313c3c7 fix(docker): increase celery-beat memory limit to 256m
Some checks failed
CI / ruff (push) Successful in 11s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled
128m was causing OOM kills (exit code 137) as the codebase grew.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 23:45:03 +01:00
31ced5f759 fix(docker): fix flower and redis-exporter healthchecks
Some checks failed
CI / ruff (push) Successful in 10s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has started running
Flower: use /healthcheck endpoint (auth-exempt) instead of root URL.
Redis-exporter: switch to alpine image (has wget) and verify redis_up
in /metrics instead of non-existent /health endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 23:17:47 +01:00
b68d542258 fix(security): harden Redis auth, restrict /metrics, document Gitea port fix
Some checks failed
CI / ruff (push) Successful in 10s
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / pytest (push) Has been cancelled
- Add Redis password via REDIS_PASSWORD env var (--requirepass flag)
- Update all REDIS_URL and REDIS_ADDR references to include password
- Restrict /metrics endpoint to localhost and Docker internal networks (403 for external requests)
- Document Gitea port 3000 localhost binding fix (must be applied manually on server)
- Add REDIS_PASSWORD to .env.example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 23:15:15 +01:00
a7392de9f6 fix(security): close exposed PostgreSQL and Redis ports (BSI/CERT-Bund report)
Some checks failed
CI / ruff (push) Successful in 12s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled
Docker bypasses UFW iptables, so bare port mappings like "5432:5432"
exposed the database to the public internet. Removed port mappings for
PostgreSQL and Redis (they only need Docker-internal networking), and
bound the API port to 127.0.0.1 since only Caddy needs to reach it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 22:31:07 +01:00
6dec1e3ca6 fix(ops): add missing env_file to celery-beat and quiet Stripe log spam
Some checks failed
CI / pytest (push) Has been cancelled
CI / ruff (push) Successful in 11s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
celery-beat was missing env_file and DATABASE_URL, so it had no access
to app config (Stripe keys, etc.). Also downgrade "Stripe API key not
configured" from warning to debug to stop log spam when Stripe is not
yet set up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 22:33:54 +01:00
f631322b4e fix(ops): rebalance container memory limits to prevent celery OOM kills
Some checks failed
CI / ruff (push) Successful in 10s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled
Celery worker was OOM-killed (41 restarts) at 512MB with 4 concurrent
workers. Reduce concurrency to 2, increase worker limit to 768MB, and
reclaim memory from over-provisioned services (db 512→256, beat 256→128,
flower 256→192). Total allocation stays within 4GB server budget.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 22:15:35 +01:00
e61e02fb39 fix(redis): configure maxmemory and eviction policy to prevent OOM
Some checks failed
CI / ruff (push) Successful in 11s
CI / pytest (push) Failing after 47m48s
CI / validate (push) Successful in 24s
CI / dependency-scanning (push) Successful in 31s
CI / docs (push) Has been skipped
CI / deploy (push) Has been skipped
Redis had no maxmemory set, causing the Prometheus alert expression
(used/max) to evaluate to +Inf. Set maxmemory to 100mb with allkeys-lru
eviction policy, and guard the alert expression against division by zero.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:57:38 +01:00
35d1559162 feat(monitoring): add Redis exporter + Sentry docs to deployment guide
Some checks failed
CI / ruff (push) Successful in 10s
CI / pytest (push) Failing after 47m30s
CI / validate (push) Successful in 24s
CI / dependency-scanning (push) Successful in 29s
CI / docs (push) Has been skipped
CI / deploy (push) Has been skipped
- Add redis-exporter container to docker-compose (oliver006/redis_exporter, 32MB)
- Add Redis scrape target to Prometheus config
- Add 4 Redis alert rules: RedisDown, HighMemory, HighConnections, RejectedConnections
- Document Step 19b (Sentry Error Tracking) in Hetzner deployment guide
- Document Step 19c (Redis Monitoring) in Hetzner deployment guide
- Update resource budget and port reference tables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 23:30:18 +01:00
64082ca877 feat: first client onboarding — fix env, add loyalty admin, dev infra-check
- Fix .env: wizamart→orion/wizard.lu, Redis port→6380
- Fix .env.example: orion.lu→wizard.lu domain references
- Add create_loyalty_admin() to init_production.py (platform-scoped admin for rewardflow.lu)
- Add `make infra-check` target running verify-server.sh
- Split verify-server.sh into dev/prod modes (auto-detected from DEBUG flag)
- Dev checks: .env config, PostgreSQL, Redis, health endpoint, migrations
- Remove stale init.sql volume mount from docker-compose.yml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 15:40:07 +01:00
44568893fd fix: verify-server.sh exit on first check, bump flower/beat to 256m
Some checks failed
CI / ruff (push) Successful in 10s
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / validate (push) Has been cancelled
CI / pytest (push) Has been cancelled
- Remove set -e so script continues through all checks
- Use POSIX arithmetic to avoid exit code 1 from (( ))
- Bump flower and celery-beat mem_limit from 128m to 256m (OOM killed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 10:56:24 +01:00
10fdf91dfa feat(infra): add launch readiness quick wins
Some checks failed
CI / ruff (push) Successful in 12s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled
- Add mem_limit to all 6 app containers (db: 512m, redis: 128m,
  api: 512m, celery-worker: 512m, celery-beat: 128m, flower: 128m)
- Restrict Flower port to localhost (127.0.0.1:5555:5555)
- Add PostgreSQL and Redis health checks to /health/ready endpoint
  with individual check details (name, status, latency)
- Add scaling guide with metrics, thresholds, Hetzner pricing
- Add server verification script (12 infrastructure checks)
- Update hetzner-server-setup.md with progress and pending tasks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 10:24:20 +01:00
4bce16fb73 feat(infra): add alerting, network segmentation, and ops docs (Steps 19-24)
All checks were successful
CI / ruff (push) Successful in 11s
CI / pytest (push) Successful in 36m6s
CI / validate (push) Successful in 22s
CI / dependency-scanning (push) Successful in 28s
CI / docs (push) Successful in 37s
CI / deploy (push) Successful in 47s
- Prometheus alert rules (host, container, API, Celery, target-down)
- Alertmanager with email routing (critical 1h, warning 4h repeat)
- Docker network segmentation (frontend/backend/monitoring)
- Incident response runbook with 8 copy-paste runbooks
- Environment variables reference (55+ vars documented)
- Hetzner setup docs updated with Steps 19-24
- Launch readiness updated with Feb 2026 infrastructure status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 22:06:54 +01:00
ef7187b508 feat: add automated backups and Prometheus/Grafana monitoring stack (Steps 17-18)
Some checks failed
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / ruff (push) Successful in 7s
CI / validate (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has started running
Backups: pg_dump scripts with daily/weekly rotation and Cloudflare R2 offsite sync.
Monitoring: Prometheus, Grafana, node-exporter, cAdvisor in docker-compose; /metrics
endpoint activated via prometheus_client.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 22:40:08 +01:00
688896d856 fix: add .dockerignore and env_file to docker-compose
Some checks failed
CI / ruff (push) Successful in 9s
CI / architecture (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / audit (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled
Prevents .env from being baked into Docker image (was overriding
config defaults). Adds env_file directive so containers load host
.env properly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 20:01:21 +01:00
ba130d4171 chore: set explicit Docker volume name orion_postgres_data
Some checks failed
CI / ruff (push) Successful in 9s
CI / dependency-scanning (push) Has been cancelled
CI / audit (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / architecture (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 19:21:26 +01:00
e9253fbd84 refactor: rename Wizamart to Orion across entire codebase
Replace all ~1,086 occurrences of Wizamart/wizamart/WIZAMART/WizaMart
with Orion/orion/ORION across 184 files. This includes database
identifiers, email addresses, domain references, R2 bucket names,
DNS prefixes, encryption salt, Celery app name, config defaults,
Docker configs, CI configs, documentation, seed data, and templates.

Renames homepage-wizamart.html template to homepage-orion.html.
Fixes duplicate file_pattern key in api.yaml architecture rule.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 16:46:56 +01:00
6af9458ad4 fix(docker): add proper healthchecks for Celery worker, beat, and flower
Some checks failed
CI / ruff (push) Has been cancelled
CI / pytest (push) Has been cancelled
CI / architecture (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / audit (push) Has been cancelled
CI / docs (push) Has been cancelled
2026-02-11 23:10:29 +01:00
2792414395 feat: add Celery/Redis task queue with feature flag support
Migrate background tasks from FastAPI BackgroundTasks to Celery with Redis
for persistent task queuing, retries, and scheduled jobs.

Key changes:
- Add Celery configuration with Redis broker/backend
- Create task dispatcher with USE_CELERY feature flag for gradual rollout
- Add Celery task wrappers for all background operations:
  - Marketplace imports
  - Letzshop historical imports
  - Product exports
  - Code quality scans
  - Test runs
  - Subscription scheduled tasks (via Celery Beat)
- Add celery_task_id column to job tables for Flower integration
- Add Flower dashboard link to admin background tasks page
- Update docker-compose.yml with worker, beat, and flower services
- Add Makefile targets: celery-worker, celery-beat, celery-dev, flower

When USE_CELERY=false (default), system falls back to FastAPI BackgroundTasks
for development without Redis dependency.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 17:35:16 +01:00
756e12fe7b removing references to letzshop, replacing by wizamart 2025-11-23 20:19:00 +01:00
af23f5b88f replacing letzshop by wizamart 2025-11-13 15:23:47 +01:00
9dd177bddc Initial commit 2025-09-05 17:27:39 +02:00