minimal-examples/runner-not-picking-up-jobs

No description

Shell 60.9%
Nix 20.8%
Python 18.3%

Find a file

Reproduce Script 9fd8475c9a Some checks are pending / build-nix-builder (push) Waiting to run Details / test-on-ci (push) Successful in 2s Details Configure pi with ZAI provider and glm-5.1 model Add activation script that preconfigures pi (coding agent) with the same provider and model as forge code: - Extracts the ZAI API key from forge credentials and writes it to ~/.pi/agent/auth.json - Sets default provider to 'zai', default model to 'glm-5.1', and default thinking level to 'high' in ~/.pi/agent/settings.json This mirrors the existing forge code preconfiguration so pi works out of the box on VM startup without manual login or model selection.		2026-04-19 17:15:53 +00:00
.forgejo/workflows	could not reproduce bug	2026-04-11 20:35:38 +02:00
forgejo
reproduce	add tg-forge-relay	2026-04-19 16:20:01 +02:00
runner
vm	Configure pi with ZAI provider and glm-5.1 model	2026-04-19 17:15:53 +00:00
.gitignore	Add reproduction script for repo-scoped runner TasksVersion bug	2026-04-11 21:09:34 +00:00
README.md
run-vm.sh	Persist Tailscale state across VM rebuilds	2026-04-10 20:38:20 +02:00

README.md

runner-not-picking-up-jobs

Minimal reproduction for a Forgejo runner bug where the runner stops picking up new jobs after completing the first host-mode job.

The Bug

When using the host executor (labels like deploy:host), the runner picks up the first job after startup, but never picks up subsequent jobs. The runner process stays alive and shows as online in the Forgejo UI. Restarting the runner container fixes it — until the next job runs.

Likely cause: the runner process runs as PID 1 inside its container and doesn't reap zombie child processes after host-mode jobs complete. Zombie processes accumulate and eventually exhaust the process table, blocking new job spawns.

Tested workaround: init: true in docker-compose.yml (adds Docker's built-in tini init system as PID 1, which reaps zombies). Status: unconfirmed — this reproduction exists to verify it.

How to reproduce

1. Start Forgejo

cd forgejo
docker compose up -d

Open http://localhost:3000 and complete initial setup:

Create an admin user
Go to Site Administration → Actions → enable Actions

2. Create the repository

In Forgejo UI, create a new repo named runner-not-picking-up-jobs (any owner). Push this repo's code to it:

git remote add local ssh://git@localhost/<owner>/runner-not-picking-up-jobs.git
git push -u local main

3. Set up runners

cd runner
export CI_SECRET=$(openssl rand -hex 20)
export DEPLOY_SECRET=$(openssl rand -hex 20)
export FORGEJO_CONTAINER=forgejo-forgejo-1   # adjust if needed
FORGEJO_URL=http://localhost:3000 bash setup.sh

4. Trigger the workflow

Go to the repo → Actions → test.yml → Run workflow (workflow_dispatch).

Expected: both job-1 and job-2 complete successfully.

Bug behaviour: job-1 completes, job-2 waits forever with "Waiting for a runner with the following labels: docker-deploy".

5. Test the workaround

To reproduce without the init: true workaround:

Edit runner/docker-compose.yml, remove init: true from runner-deploy
Restart: docker compose up -d --build
Trigger the workflow again — the bug should now appear

To confirm the fix:

Re-add init: true
Restart and trigger again — both jobs should complete