fix(ops): rebalance container memory limits to prevent celery OOM kills
Some checks failed
CI / ruff (push) Successful in 10s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled

Celery worker was OOM-killed (41 restarts) at 512MB with 4 concurrent
workers. Reduce concurrency to 2, increase worker limit to 768MB, and
reclaim memory from over-provisioned services (db 512→256, beat 256→128,
flower 256→192). Total allocation stays within 4GB server budget.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-01 22:15:35 +01:00
parent e61e02fb39
commit f631322b4e
2 changed files with 5 additions and 5 deletions

View File

@@ -91,7 +91,7 @@ celery_app.conf.update(
task_soft_time_limit=25 * 60, # 25 minutes soft limit task_soft_time_limit=25 * 60, # 25 minutes soft limit
# Worker settings # Worker settings
worker_prefetch_multiplier=1, # Disable prefetching for long tasks worker_prefetch_multiplier=1, # Disable prefetching for long tasks
worker_concurrency=4, # Number of concurrent workers worker_concurrency=2, # Keep low on 4GB servers to avoid OOM
# Result backend # Result backend
result_expires=86400, # Results expire after 24 hours result_expires=86400, # Results expire after 24 hours
# Retry policy # Retry policy

View File

@@ -11,7 +11,7 @@ services:
- postgres_data:/var/lib/postgresql/data - postgres_data:/var/lib/postgresql/data
ports: ports:
- "5432:5432" - "5432:5432"
mem_limit: 512m mem_limit: 256m
healthcheck: healthcheck:
test: ["CMD-SHELL", "pg_isready -U orion_user -d orion_db"] test: ["CMD-SHELL", "pg_isready -U orion_user -d orion_db"]
interval: 30s interval: 30s
@@ -86,7 +86,7 @@ services:
volumes: volumes:
- ./logs:/app/logs - ./logs:/app/logs
- ./exports:/app/exports - ./exports:/app/exports
mem_limit: 512m mem_limit: 768m
healthcheck: healthcheck:
test: ["CMD-SHELL", "celery -A app.core.celery_config inspect ping --timeout 10 || exit 1"] test: ["CMD-SHELL", "celery -A app.core.celery_config inspect ping --timeout 10 || exit 1"]
interval: 30s interval: 30s
@@ -107,7 +107,7 @@ services:
depends_on: depends_on:
redis: redis:
condition: service_healthy condition: service_healthy
mem_limit: 256m mem_limit: 128m
healthcheck: healthcheck:
disable: true disable: true
networks: networks:
@@ -128,7 +128,7 @@ services:
depends_on: depends_on:
redis: redis:
condition: service_healthy condition: service_healthy
mem_limit: 256m mem_limit: 192m
healthcheck: healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5555/ || exit 1"] test: ["CMD-SHELL", "curl -f http://localhost:5555/ || exit 1"]
interval: 30s interval: 30s