Skip to content

DKP-2859 start: bound docker readiness wait and dump logs on timeout#36

Merged
ebriney merged 2 commits into
mainfrom
fix-docker-ready-timeout
Jun 9, 2026
Merged

DKP-2859 start: bound docker readiness wait and dump logs on timeout#36
ebriney merged 2 commits into
mainfrom
fix-docker-ready-timeout

Conversation

@mat007

@mat007 mat007 commented Jun 9, 2026

Copy link
Copy Markdown
Member

Closes #3.

What this PR does

Bounds the "wait for Docker" loop so a daemon that never comes up fails fast instead of hanging until the job is cancelled, and documents the runner sizing that avoids the hang in the first place.

Notes for the reviewer

The loop ran until docker ps. Turns out when the daemon is wedged that call blocks forever rather than returning non-zero, so the loop never iterates and we never sleep, never retry, never give up. The step just sits there until the job timeout kills it (~34 min in this run). You can tell because "docker not ready…" is never printed once.

So each docker ps now runs under timeout 60, which lets a hung daemon return and the loop give up at a 600s deadline. On timeout I dump the desktop logs so a failed start leaves a trace instead of nothing.

A step/action-level timeout isn't supported for composite actions (see the discussion in #3), hence doing it inside the loop. And the per-call timeout 60 matters: an outer deadline check alone wouldn't fire, since docker ps itself blocks on a wedged daemon.

Paths checked against the pinata paths package on Linux:

  • ~/.docker/desktop/log/host/ holds the backend log (the monitor tees the backend output to host/monitor.log).
  • ~/.docker/desktop/log/vm/ holds the VM console (console.log, written straight to file by the qemu engine), which is really the one that'll tell us why the VM didn't come up.

While debugging a hang with this, the console showed the VM and containerd starting fine but dockerd stalling at startup on a small 2 vCPU / 7 GB runner. So I also added a "Choosing a runner" section to the README recommending 4 vCPU / 16 GB, and calling out that ubuntu-latest is smaller on private/internal repos.

This makes the failure visible and fast, it doesn't fix whatever is keeping the VM from booting (looks like nested KVM on the runner, but that's another story).

@mat007 mat007 changed the title start: bound docker readiness wait and dump logs on timeout DKP-2859 start: bound docker readiness wait and dump logs on timeout Jun 9, 2026
mat007 added 2 commits June 9, 2026 15:57
Run each `docker ps` under `timeout 60` so a hung daemon returns instead of
blocking forever, and give up at a 600s deadline. On timeout, dump the host
and VM logs (host/*.log and vm/console.log) so a failed start leaves a trace
instead of hanging until the job is cancelled.
Small runners (the 2 vCPU / 7 GB ubuntu-latest on private/internal
repos) starve dockerd at startup and the action hangs. Document the
requirement and the public-vs-private ubuntu-latest gotcha.
@mat007 mat007 force-pushed the fix-docker-ready-timeout branch from 3c16fd0 to 6116eeb Compare June 9, 2026 13:57
@mat007 mat007 requested a review from a team June 9, 2026 13:59
@ebriney ebriney merged commit 6a8d449 into main Jun 9, 2026
4 checks passed
Comment thread start/action.yml
cat ~/.docker/desktop/log/vm/*.log 2>/dev/null || true
exit 1
fi
echo "docker not ready, sleep 10 s and try again"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

within 600s vs. sleep 10 s. I prefer without space. Either way, better stay consistent with the two occurrences.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I’ll open a follow-up.

@mat007 mat007 Jun 9, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Opened #38 to make the spacing consistent in a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature request: timeouts

3 participants