From 661547f6cf5c5db39bd052d1484eb18744d384c1 Mon Sep 17 00:00:00 2001 From: Samir Boulahtit Date: Mon, 23 Mar 2026 14:00:35 +0100 Subject: [PATCH] docs: update deployment docs for CI timeouts, build info, and prod safety - hetzner-server-setup: runner timeout 3h, shutdown_timeout 300s, deploy.sh now writes .build-info and uses explicit -f flag - gitea: document unit-only CI tests and xdist incompatibility - docker: add build info section, document volume mount approach Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/deployment/docker.md | 19 +++++++++++++++++++ docs/deployment/gitea.md | 10 +++++++--- docs/deployment/hetzner-server-setup.md | 19 ++++++++++++------- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/docs/deployment/docker.md b/docs/deployment/docker.md index 2e7cce85..491cdc59 100644 --- a/docs/deployment/docker.md +++ b/docs/deployment/docker.md @@ -331,6 +331,25 @@ docker compose -f docker-compose.prod.yml exec api python scripts/seed/init_prod --- +## Build Info + +The deploy script writes a `.build-info` JSON file (commit SHA + deploy timestamp) before rebuilding containers. This file is mounted as a read-only volume into the API container: + +```yaml +# In docker-compose.yml +volumes: + - ./.build-info:/app/.build-info:ro +``` + +The app reads it via `app/core/build_info.py` and exposes it in: + +- **`/health` endpoint** — `commit` and `deployed_at` fields +- **Admin sidebar** — version, commit, and deploy timestamp + +In local development (where `.build-info` doesn't exist), the app falls back to `git rev-parse` for the commit SHA. + +--- + ## Daily Operations ### View Logs diff --git a/docs/deployment/gitea.md b/docs/deployment/gitea.md index a7cc92c5..626d6e97 100644 --- a/docs/deployment/gitea.md +++ b/docs/deployment/gitea.md @@ -252,9 +252,13 @@ The `scripts/deploy.sh` script handles the full deploy lifecycle: 1. Stash local changes (preserves `.env` and other server-side edits) 2. Pull latest code (`--ff-only`) 3. Pop stash to restore local changes -4. Rebuild and restart Docker containers (`docker compose --profile full up -d --build`) -5. Run database migrations (`alembic upgrade heads`) -6. Health check `http://localhost:8001/health` with retries +4. Write `.build-info` (commit SHA + deploy timestamp) +5. Rebuild and restart Docker containers (`docker compose -f docker-compose.yml --profile full up -d --build`) +6. Run database migrations (`alembic upgrade heads`) +7. Health check `http://localhost:8001/health` with retries + +!!! note "CI test configuration" + Only unit tests run in CI (`-m "unit"` with `timeout-minutes: 150`). Integration tests are run locally via `make test`. The CAX11 runner (2 vCPU ARM, 4GB) takes ~2.5h for 2,484 unit tests. `pytest-xdist` parallel execution is not compatible with the shared database session test fixtures. See [Hetzner Server Setup — Step 16](hetzner-server-setup.md#step-16-continuous-deployment) for the full setup guide including SSH key generation and Gitea secrets configuration. diff --git a/docs/deployment/hetzner-server-setup.md b/docs/deployment/hetzner-server-setup.md index 402c1cc0..1f06334b 100644 --- a/docs/deployment/hetzner-server-setup.md +++ b/docs/deployment/hetzner-server-setup.md @@ -1081,7 +1081,8 @@ Generate a config file to override defaults (notably the 3h job timeout which ca ```bash cd ~/gitea-runner ./act_runner generate-config > config.yaml -sed -i 's/timeout: 3h/timeout: 1h/' config.yaml +sed -i 's/timeout: 3h/timeout: 3h/' config.yaml +sed -i 's/shutdown_timeout: 0s/shutdown_timeout: 300s/' config.yaml sudo systemctl restart gitea-runner ``` @@ -1089,12 +1090,12 @@ Key settings in `config.yaml`: | Setting | Default | Recommended | Why | |---|---|---|---| -| `runner.timeout` | 3h | 1h | Prevents silent failures — tests take ~25min, so 1h is generous | -| `runner.shutdown_timeout` | 0s | 0s | OK as-is | +| `runner.timeout` | 3h | 3h | 2,484 unit tests take ~2.5h on the CAX11 (2 vCPU ARM). Keep the default | +| `runner.shutdown_timeout` | 0s | 300s | Wait for running jobs to finish on restart — `0s` kills jobs immediately | | `runner.fetch_timeout` | 5s | 5s | OK as-is | !!! tip "CI also has per-job and per-test timeouts" - The `.gitea/workflows/ci.yml` sets `timeout-minutes: 45` on the pytest job and `--timeout=120` per individual test. These work together with the runner timeout to catch different failure modes. + The `.gitea/workflows/ci.yml` sets `timeout-minutes: 150` on the pytest job and `--timeout=120` per individual test. These work together with the runner timeout to catch different failure modes. ### 15.2 Swap for CI Stability @@ -1160,9 +1161,13 @@ The deploy script lives at `scripts/deploy.sh` in the repository. It: 1. Stashes local changes (preserves `.env`) 2. Pulls latest code (`--ff-only`) 3. Pops stash to restore local changes -4. Rebuilds and restarts Docker containers (`docker compose --profile full up -d --build`) -5. Runs database migrations (`alembic upgrade heads`) -6. Health checks `http://localhost:8001/health` with 12 retries (60s total) +4. Writes `.build-info` (commit SHA + deploy timestamp) +5. Rebuilds and restarts Docker containers (`docker compose -f docker-compose.yml --profile full up -d --build`) +6. Runs database migrations (`alembic upgrade heads`) +7. Health checks `http://localhost:8001/health` with 12 retries (60s total) + +!!! warning "Always use `-f docker-compose.yml` on the production server" + The explicit `-f` flag prevents `docker-compose.override.yml` (which exposes db/redis ports for local dev) from being loaded. This flag must never be removed from `deploy.sh`, and any manual `docker compose` commands on the server must also include it. See [Docker Deployment — Dev vs Prod](docker.md#dev-vs-prod-compose-architecture) for details. Exit codes: `0` success, `1` git pull failed, `2` docker compose failed, `3` migration failed, `4` health check failed.