docs: update deployment docs for CI timeouts, build info, and prod safety

- hetzner-server-setup: runner timeout 3h, shutdown_timeout 300s, deploy.sh now writes .build-info and uses explicit -f flag - gitea: document unit-only CI tests and xdist incompatibility - docker: add build info section, document volume mount approach Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 14:00:35 +01:00
parent 3015a490f9
commit 661547f6cf
3 changed files with 38 additions and 10 deletions
--- a/docs/deployment/docker.md
+++ b/docs/deployment/docker.md
@@ -331,6 +331,25 @@ docker compose -f docker-compose.prod.yml exec api python scripts/seed/init_prod

 ---

+## Build Info
+
+The deploy script writes a `.build-info` JSON file (commit SHA + deploy timestamp) before rebuilding containers. This file is mounted as a read-only volume into the API container:
+
+```yaml
+# In docker-compose.yml
+volumes:
+  - ./.build-info:/app/.build-info:ro
+```
+
+The app reads it via `app/core/build_info.py` and exposes it in:
+
+- **`/health` endpoint** — `commit` and `deployed_at` fields
+- **Admin sidebar** — version, commit, and deploy timestamp
+
+In local development (where `.build-info` doesn't exist), the app falls back to `git rev-parse` for the commit SHA.
+
+---
+
 ## Daily Operations

 ### View Logs
--- a/docs/deployment/gitea.md
+++ b/docs/deployment/gitea.md
@@ -252,9 +252,13 @@ The `scripts/deploy.sh` script handles the full deploy lifecycle:
 1. Stash local changes (preserves `.env` and other server-side edits)
 2. Pull latest code (`--ff-only`)
 3. Pop stash to restore local changes
-4. Rebuild and restart Docker containers (`docker compose --profile full up -d --build`)
-5. Run database migrations (`alembic upgrade heads`)
-6. Health check `http://localhost:8001/health` with retries
+4. Write `.build-info` (commit SHA + deploy timestamp)
+5. Rebuild and restart Docker containers (`docker compose -f docker-compose.yml --profile full up -d --build`)
+6. Run database migrations (`alembic upgrade heads`)
+7. Health check `http://localhost:8001/health` with retries
+
+!!! note "CI test configuration"
+    Only unit tests run in CI (`-m "unit"` with `timeout-minutes: 150`). Integration tests are run locally via `make test`. The CAX11 runner (2 vCPU ARM, 4GB) takes ~2.5h for 2,484 unit tests. `pytest-xdist` parallel execution is not compatible with the shared database session test fixtures.

 See [Hetzner Server Setup — Step 16](hetzner-server-setup.md#step-16-continuous-deployment) for the full setup guide including SSH key generation and Gitea secrets configuration.

--- a/docs/deployment/hetzner-server-setup.md
+++ b/docs/deployment/hetzner-server-setup.md
@@ -1081,7 +1081,8 @@ Generate a config file to override defaults (notably the 3h job timeout which ca
 ```bash
 cd ~/gitea-runner
 ./act_runner generate-config > config.yaml
-sed -i 's/timeout: 3h/timeout: 1h/' config.yaml
+sed -i 's/timeout: 3h/timeout: 3h/' config.yaml
+sed -i 's/shutdown_timeout: 0s/shutdown_timeout: 300s/' config.yaml
 sudo systemctl restart gitea-runner
 ```

@@ -1089,12 +1090,12 @@ Key settings in `config.yaml`:

 | Setting | Default | Recommended | Why |
 |---|---|---|---|
-| `runner.timeout` | 3h | 1h | Prevents silent failures — tests take ~25min, so 1h is generous |
-| `runner.shutdown_timeout` | 0s | 0s | OK as-is |
+| `runner.timeout` | 3h | 3h | 2,484 unit tests take ~2.5h on the CAX11 (2 vCPU ARM). Keep the default |
+| `runner.shutdown_timeout` | 0s | 300s | Wait for running jobs to finish on restart — `0s` kills jobs immediately |
 | `runner.fetch_timeout` | 5s | 5s | OK as-is |

 !!! tip "CI also has per-job and per-test timeouts"
-    The `.gitea/workflows/ci.yml` sets `timeout-minutes: 45` on the pytest job and `--timeout=120` per individual test. These work together with the runner timeout to catch different failure modes.
+    The `.gitea/workflows/ci.yml` sets `timeout-minutes: 150` on the pytest job and `--timeout=120` per individual test. These work together with the runner timeout to catch different failure modes.

 ### 15.2 Swap for CI Stability

@@ -1160,9 +1161,13 @@ The deploy script lives at `scripts/deploy.sh` in the repository. It:
 1. Stashes local changes (preserves `.env`)
 2. Pulls latest code (`--ff-only`)
 3. Pops stash to restore local changes
-4. Rebuilds and restarts Docker containers (`docker compose --profile full up -d --build`)
-5. Runs database migrations (`alembic upgrade heads`)
-6. Health checks `http://localhost:8001/health` with 12 retries (60s total)
+4. Writes `.build-info` (commit SHA + deploy timestamp)
+5. Rebuilds and restarts Docker containers (`docker compose -f docker-compose.yml --profile full up -d --build`)
+6. Runs database migrations (`alembic upgrade heads`)
+7. Health checks `http://localhost:8001/health` with 12 retries (60s total)
+
+!!! warning "Always use `-f docker-compose.yml` on the production server"
+    The explicit `-f` flag prevents `docker-compose.override.yml` (which exposes db/redis ports for local dev) from being loaded. This flag must never be removed from `deploy.sh`, and any manual `docker compose` commands on the server must also include it. See [Docker Deployment — Dev vs Prod](docker.md#dev-vs-prod-compose-architecture) for details.

 Exit codes: `0` success, `1` git pull failed, `2` docker compose failed, `3` migration failed, `4` health check failed.