ci: add retry/backoff to GHCR docker pull in integration-test workflow (PLT-753)#3622
ci: add retry/backoff to GHCR docker pull in integration-test workflow (PLT-753)#3622amir-deris wants to merge 1 commit into
Conversation
PR SummaryLow Risk Overview The Load prebuilt seid and pull Docker images step defines a Reviewed by Cursor Bugbot for commit 1774498. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3622 +/- ##
==========================================
- Coverage 58.99% 58.12% -0.88%
==========================================
Files 2224 2150 -74
Lines 182708 174161 -8547
==========================================
- Hits 107782 101223 -6559
+ Misses 65235 63947 -1288
+ Partials 9691 8991 -700
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Problem
The
Integration Testmatrix jobs intermittently fail at the "Load prebuilt seid and pull Docker images" step with transient GHCR errors, before any test runs:Root cause
PR #3582 switched image distribution from a 1 GB artifact download to GHCR
docker pull. That pull step used a baredocker pullwith no retry wrapper. The docker client only retries layer blob downloads automatically — it does NOT retry the initial auth-token fetch / manifest HEAD request, which is exactly where the failures occur. When ~40 matrix jobs start simultaneously and hammerghcr.io/token, a briefly-slow auth response times out,docker pullexits 1, and with no retry loop the whole step (and job) fails.Fix
Wrap the pulls in a retry-with-backoff loop so the token/manifest request is also retried (5 attempts, linear backoff 5/10/15/20s):
Tagging logic is unchanged.
References