Skip to content

ci: add retry/backoff to GHCR docker pull in integration-test workflow (PLT-753)#3622

Open
amir-deris wants to merge 1 commit into
mainfrom
amir/plt-753-add-retry-for-ci-ghcr-auth
Open

ci: add retry/backoff to GHCR docker pull in integration-test workflow (PLT-753)#3622
amir-deris wants to merge 1 commit into
mainfrom
amir/plt-753-add-retry-for-ci-ghcr-auth

Conversation

@amir-deris

@amir-deris amir-deris commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Problem

The Integration Test matrix jobs intermittently fail at the "Load prebuilt seid and pull Docker images" step with transient GHCR errors, before any test runs:

Get "https://ghcr.io/token?...&scope=...rpcnode:pull...":
  context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Head "https://ghcr.io/v2/.../localnode/manifests/...":
  net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Root cause

PR #3582 switched image distribution from a 1 GB artifact download to GHCR docker pull. That pull step used a bare docker pull with no retry wrapper. The docker client only retries layer blob downloads automatically — it does NOT retry the initial auth-token fetch / manifest HEAD request, which is exactly where the failures occur. When ~40 matrix jobs start simultaneously and hammer ghcr.io/token, a briefly-slow auth response times out, docker pull exits 1, and with no retry loop the whole step (and job) fails.

Fix

Wrap the pulls in a retry-with-backoff loop so the token/manifest request is also retried (5 attempts, linear backoff 5/10/15/20s):

pull_with_retry() {
  local ref="$1"
  for attempt in 1 2 3 4 5; do
    if docker pull "$ref"; then return 0; fi
    echo "docker pull $ref failed (attempt $attempt), retrying in $((attempt*5))s..."
    sleep $((attempt*5))
  done
  echo "docker pull $ref failed after 5 attempts"; return 1
}
pull_with_retry "${GHCR_LOCALNODE}:${{ github.run_id }}"
pull_with_retry "${GHCR_RPCNODE}:${{ github.run_id }}"

Tagging logic is unchanged.

References

@amir-deris amir-deris self-assigned this Jun 22, 2026
@amir-deris amir-deris changed the title Added retry for ghcr pull ci: add retry/backoff to GHCR docker pull in integration-test workflow (PLT-753) Jun 22, 2026
@cursor

cursor Bot commented Jun 22, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
CI workflow-only change; no application, auth, or runtime behavior is modified.

Overview
Integration matrix jobs now retry pulling the run-scoped localnode and rpcnode images from GHCR instead of failing on the first transient registry/network error.

The Load prebuilt seid and pull Docker images step defines a pull_with_retry helper that attempts each docker pull up to five times with increasing waits (5s per attempt index), then fails the step if all attempts fail. Tagging to sei-chain/localnode and sei-chain/rpcnode is unchanged.

Reviewed by Cursor Bugbot for commit 1774498. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 22, 2026, 9:36 PM

@amir-deris amir-deris requested review from bdchatham and masih June 22, 2026 21:38
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.12%. Comparing base (daba2e7) to head (1774498).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3622      +/-   ##
==========================================
- Coverage   58.99%   58.12%   -0.88%     
==========================================
  Files        2224     2150      -74     
  Lines      182708   174161    -8547     
==========================================
- Hits       107782   101223    -6559     
+ Misses      65235    63947    -1288     
+ Partials     9691     8991     -700     
Flag Coverage Δ
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 74 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants