docs(ops): record 2026-06-06 Gitea+CI migration execution + runbook lessons
Some checks are pending
CI / pytest (push) Waiting to run
CI / validate (push) Waiting to run
CI / dependency-scanning (push) Waiting to run
CI / docs (push) Blocked by required conditions
CI / deploy (push) Blocked by required conditions
CI / ruff (push) Successful in 15s

Add the "Executed: 2026-06-06" record to the 2c runbook (new box
gitea-ci-fsn1-1, Falkenstein CX22, IPs, outcome) and fold the real-world
lessons into the steps: pin the Gitea image version (not latest),
ON_ERROR_STOP + count check on DB restore, the old-runner-survives-in-
migrated-DB gotcha (delete from action_runner + stop prod service), generate
runner token as the git user, expected volume-already-exists warning, and the
root-vs-sudo note.

Held local (not pushed) — pushing stacks a 2nd ~3h CI run behind the in-flight one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-06 21:01:54 +02:00
parent c93346f8ff
commit 223650a52b

View File

@@ -3314,6 +3314,11 @@ host `2222`) and `gitea-db` (`postgres:15`). Data lives in two named volumes:
/ `POSTGRES_PASSWORD` — copy them from the current file; do not regenerate, or
the restored DB won't authenticate). Keep `ROOT_URL`/`DOMAIN`/`SSH_DOMAIN`
as `git.wizard.lu`.
**Pin the Gitea image to the running version, not `latest`** — check it first
with `docker exec gitea gitea --version` and set e.g. `image: gitea/gitea:1.25.4`.
If the new box pulls a newer `latest`, Gitea runs unexpected DB migrations on
first start against your freshly-restored data. (`postgres:15` is already
pinned on the major, fine.)
2. **Announce downtime / stop writes** on the old Gitea.
3. **Dump the data on the old box:**
@@ -3327,11 +3332,16 @@ host `2222`) and `gitea-db` (`postgres:15`). Data lives in two named volumes:
4. **Transfer** `/tmp/gitea-db.sql` + `/tmp/gitea-data.tgz` to the new box
(`scp`/`rsync`).
5. **Restore the DB** on the new box:
5. **Restore the DB** on the new box (the `postgres:15` container auto-creates an
empty `gitea` DB; restore into it. Add `-v ON_ERROR_STOP=1` so a bad restore
fails loudly instead of silently):
```bash
docker compose up -d gitea-db # wait until healthy
cat gitea-db.sql | docker exec -i gitea-db psql -U gitea -d gitea
docker exec -i gitea-db psql -U gitea -d gitea -v ON_ERROR_STOP=1 < gitea-db.sql
# sanity-check counts match the source:
docker exec gitea-db psql -U gitea -d gitea -t \
-c "SELECT 'repos',count(*) FROM repository UNION ALL SELECT 'secrets',count(*) FROM secret;"
```
6. **Restore the data volume** on the new box:
@@ -3360,6 +3370,23 @@ host `2222`) and `gitea-db` (`postgres:15`). Data lives in two named volumes:
11. **No remote/runner URL changes needed** — the hostname `git.wizard.lu`
stays the same (only the IP moved), so your `gitea` git remote and the
runner's `--instance https://git.wizard.lu` keep working after DNS flips.
Install the new runner per [2a](#offloading-ci-to-a-separate-server-2a-recommended).
⚠️ **Critical gotcha — the OLD runner registration travels in the migrated
DB.** Because the DB is copied wholesale, the old prod runner still exists in
`action_runner` and — since `git.wizard.lu` now resolves to the new box — it
can re-authenticate and grab jobs. You must BOTH (a) remove its registration
from the migrated DB and (b) stop its process on prod, or CI may still run on
the old box:
```bash
# (a) on the NEW box — drop the stale runner registration:
docker exec gitea-db psql -U gitea -d gitea \
-c "DELETE FROM action_runner WHERE name='<old-runner-name>';"
# (b) on PROD — stop the orphaned runner process:
sudo systemctl disable --now gitea-runner.service
```
(Generate the new runner's token with `docker exec -u git gitea gitea actions
generate-runner-token` — Gitea refuses to run that as root.)
12. **Decommission Gitea on prod** (keep volumes + backups for a rollback
window):
@@ -3374,6 +3401,35 @@ host `2222`) and `gitea-db` (`postgres:15`). Data lives in two named volumes:
works, a push triggers CI, and repos/actions history are intact. (See the
"Backup coverage & rollback" callout above if anything needs reverting.)
#### Executed: 2026-06-06 (production run)
This migration was carried out on **2026-06-06**, moving Gitea + the CI runner
off the prod box (`91.99.65.229`, Nuremberg) — which had been suffering CPU
floods from CI running on it — to a dedicated box.
- **New box:** `gitea-ci-fsn1-1`, Falkenstein (`fsn1`), CX22 (2 vCPU / 4 GB x86,
Ubuntu 24.04, Hetzner backups on). IPv4 `167.233.28.95`, IPv6
`2a01:4f8:c015:b6cb::1`. ~5.29 EUR/mo.
- **Outcome:** Gitea `1.25.4` + runner `gitea-ci-fsn1-1` (act_runner v0.2.13) now
run on the new box; `git.wizard.lu` serves from it with a fresh Let's Encrypt
cert; CI runs off-prod (prod CPU stayed at its ~1.4 baseline during a CI run,
no burst). DB restore counts matched source exactly (1 repo, 2 users, 4
secrets). The git-SSH host key travelled in `gitea-data` → no host-key-changed
warnings on push.
- **Real-world notes / deviations from the generic steps above (now folded in):**
- Pinned `gitea/gitea:1.25.4` (step 1) — prod was on 1.25.4; avoid `latest`.
- Restore = `pg_dump` (plain SQL) + `gitea-data` volume tar; a `gitea dump`
archive was taken first as the one-shot safety net and pulled to the laptop.
- Had to delete the **old runner** from the migrated DB + stop its prod
service (step 11 gotcha) — otherwise it kept eligibility for jobs.
- On the new box, the `samir` user's sudo needs a password (not NOPASSWD), so
automated/admin commands were run as `root` over key-only SSH;
`PermitRootLogin prohibit-password` was kept during the migration (tighten
to `no` + give `samir` a sudo password afterward if desired).
- The `docker compose` warning *"volume gitea_gitea-data already exists but
was not created by Docker Compose"* is expected — the volume is pre-created
when you restore into it before first `up`. Harmless.
### View logs
```bash