Skip to content

SDCICD-1844: Add ephemeral ROSA deploy script for hypershift-pr-check#3272

Open
minlei98 wants to merge 1 commit into
openshift:mainfrom
minlei98:SDCICD-1844
Open

SDCICD-1844: Add ephemeral ROSA deploy script for hypershift-pr-check#3272
minlei98 wants to merge 1 commit into
openshift:mainfrom
minlei98:SDCICD-1844

Conversation

@minlei98

@minlei98 minlei98 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Provide bonfire deploy/release helpers and ephemeral-cluster config so Prow can provision a ROSA HCP cluster and run hypershift tests without cluster reserve or osde2e destroy.

Summary by CodeRabbit

  • New Features

    • Added a new ephemeral ROSA cluster preset for test runs.
    • Added a deploy/release script to provision and clean up ephemeral ROSA environments.
    • Automatically captures the cluster ID and makes it available to downstream steps.
  • Bug Fixes

    • Improved cleanup handling so release steps safely skip when no ephemeral namespace is set.
    • Added more robust cluster detection and validation during deploys.
  • Chores

    • Updated test settings for ephemeral runs, including longer ad hoc timeout and cluster health check behavior.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

There are test jobs defined for this repository which are not configured to run automatically. Comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger jobs most likely to be impacted by the proposed changes.Comment /pipeline required to trigger all required & necessary jobs.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@minlei98: This pull request references SDCICD-1844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Provide bonfire deploy/release helpers and ephemeral-cluster config so Prow can provision a ROSA HCP cluster and run hypershift tests without cluster reserve or osde2e destroy.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c0682f5b-8ba5-4ebc-b906-9e790b73ae0e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: minlei98
Once this PR has been reviewed and has the lgtm label, please assign yiqinzhang for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/ephemeral-rosa-deploy.sh`:
- Around line 72-85: Persist the ephemeral namespace so the `release` entrypoint
can reuse it later instead of starting with an empty `EPHEMERAL_NS` and exiting
early. Update the deploy path in `scripts/ephemeral-rosa-deploy.sh` alongside
the existing `cluster-id` write so it also saves `EPHEMERAL_NS` to
`${SHARED_DIR}`, and have the `release` flow read that value back before
attempting `bonfire namespace release`. Keep the fix localized around the
`deploy`/`release` handling and the `EPHEMERAL_NS`/`CLUSTER_ID` persistence
logic.
- Around line 26-28: Create the bonfire virtualenv in a unique temporary
directory instead of hardcoding /tmp/bonfire-venv, because the current setup can
fail on existing stale state or concurrent runs. Update the
ephemeral-rosa-deploy.sh setup around the venv creation and activation steps to
use a mktemp-generated directory, store it in a variable, source its
bin/activate path, and add cleanup on exit so the temporary venv is removed
automatically.
- Around line 14-15: Guard the shell variable setup in the ephemeral deploy
script so `SHARED_DIR` does not eagerly expand `ARTIFACT_DIR` when both are
unset; update the initialization around the `EPHEMERAL_CREDS_DIR` and
`SHARED_DIR` assignments to either lazily derive `SHARED_DIR` or explicitly
require `ARTIFACT_DIR`. Also make the `deploy`/`release` flow persist
`EPHEMERAL_NS` across invocations by writing it out during deployment and
reading it back during release so cleanup can find the prior namespace.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 81c80c09-98cd-4d63-968b-efc4b987076a

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae4e9 and 8918dce.

📒 Files selected for processing (2)
  • configs/ephemeral-cluster.yaml
  • scripts/ephemeral-rosa-deploy.sh

Comment on lines +14 to +15
EPHEMERAL_CREDS_DIR="${EPHEMERAL_CREDS_DIR:-/usr/local/ci-secrets/ephemeral-cluster}"
SHARED_DIR="${SHARED_DIR:-${ARTIFACT_DIR}/shared}"

@coderabbitai coderabbitai Bot Jun 29, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

git ls-files scripts/ephemeral-rosa-deploy.sh
echo '---'
nl -ba scripts/ephemeral-rosa-deploy.sh | sed -n '1,220p'

Repository: openshift/osde2e

Length of output: 231


🏁 Script executed:

#!/bin/bash
set -euo pipefail

nl -ba scripts/ephemeral-rosa-deploy.sh | sed -n '1,140p'

Repository: openshift/osde2e

Length of output: 194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,140p' scripts/ephemeral-rosa-deploy.sh | cat -n

Repository: openshift/osde2e

Length of output: 4543


Guard ARTIFACT_DIR initialization and persist the namespace for cleanup.
SHARED_DIR="${SHARED_DIR:-${ARTIFACT_DIR}/shared}" expands ARTIFACT_DIR at source time, so set -u aborts when both variables are unset. Initialize it lazily or require ARTIFACT_DIR explicitly. The standalone deploy|release path also loses EPHEMERAL_NS between invocations, so release cannot tear down a prior deployment unless the namespace is persisted.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/ephemeral-rosa-deploy.sh` around lines 14 - 15, Guard the shell
variable setup in the ephemeral deploy script so `SHARED_DIR` does not eagerly
expand `ARTIFACT_DIR` when both are unset; update the initialization around the
`EPHEMERAL_CREDS_DIR` and `SHARED_DIR` assignments to either lazily derive
`SHARED_DIR` or explicitly require `ARTIFACT_DIR`. Also make the
`deploy`/`release` flow persist `EPHEMERAL_NS` across invocations by writing it
out during deployment and reading it back during release so cleanup can find the
prior namespace.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EPHEMERAL_CREDS_DIR="${EPHEMERAL_CREDS_DIR:-/usr/local/ci-secrets/ephemeral-cluster}"
SHARED_DIR="${SHARED_DIR:-${ARTIFACT_DIR}/shared}"
EPHEMERAL_CREDS_DIR="${EPHEMERAL_CREDS_DIR:-/usr/local/ci-secrets/ephemeral-cluster}"
SHARED_DIR="${SHARED_DIR:-${ARTIFACT_DIR:-/tmp}/shared}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== outline =="
ast-grep outline scripts/ephemeral-rosa-deploy.sh --view expanded || true

echo
echo "== full file =="
nl -ba scripts/ephemeral-rosa-deploy.sh | sed -n '1,220p'

Length of output: 636


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== scripts/ephemeral-rosa-deploy.sh =="
cat -n scripts/ephemeral-rosa-deploy.sh | sed -n '1,220p'

Length of output: 4701


@YiqinZhang This addresses the first issue on Line 15: it avoids the set -u failure when ARTIFACT_DIR is unset.

One gap remains: the standalone release path is still not recoverable. EPHEMERAL_NS is discovered during deploy on Line 72, but ephemeral_rosa_release() on Line 89 only uses the in-memory variable. So ./scripts/ephemeral-rosa-deploy.sh release cannot clean up a previous deploy invocation.

A minimal production-grade follow-up is to persist and reload the namespace, e.g.:

EPHEMERAL_NS_FILE="${SHARED_DIR}/ephemeral-ns"

# after EPHEMERAL_NS is resolved in deploy
echo -n "${EPHEMERAL_NS}" > "${EPHEMERAL_NS_FILE}"

# at the start of release
if [[ -z "${EPHEMERAL_NS}" && -f "${EPHEMERAL_NS_FILE}" ]]; then
  EPHEMERAL_NS="$(cat "${EPHEMERAL_NS_FILE}")"
fi

After successful release, the file should be removed as well.

Comment thread scripts/ephemeral-rosa-deploy.sh
Comment on lines +72 to +85
EPHEMERAL_NS=$(grep -oE "namespace 'ephemeral-[^']+'" "${deploy_log}" | tail -1 | tr -d "'" | awk '{print $2}')
if [[ -z "${EPHEMERAL_NS}" ]]; then
EPHEMERAL_NS=$(bonfire namespace list --mine 2>/dev/null | awk '/ephemeral-/ {print $1; exit}')
fi
test -n "${EPHEMERAL_NS}"
bonfire namespace describe "${EPHEMERAL_NS}" | grep -q 'ROSA Cluster configuration detected'

local cluster_id
cluster_id=$(_ephemeral_rosa_get_cluster_id_from_capi "${EPHEMERAL_NS}")
test -n "${cluster_id}"
echo -n "${cluster_id}" > "${SHARED_DIR}/cluster-id"
echo "ephemeral namespace=${EPHEMERAL_NS} cluster_id=${cluster_id}"
export EPHEMERAL_NS
export CLUSTER_ID="${cluster_id}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Persist the namespace for the release entrypoint.

deploy writes only ${SHARED_DIR}/cluster-id. A later ./scripts/ephemeral-rosa-deploy.sh release starts with an empty EPHEMERAL_NS, hits Line 89, and exits without releasing the bonfire namespace. That leaks the reservation in the exact multi-step Prow flow this helper is meant to support.

Suggested fix
   test -n "${EPHEMERAL_NS}"
   bonfire namespace describe "${EPHEMERAL_NS}" | grep -q 'ROSA Cluster configuration detected'
@@
   cluster_id=$(_ephemeral_rosa_get_cluster_id_from_capi "${EPHEMERAL_NS}")
   test -n "${cluster_id}"
+  echo -n "${EPHEMERAL_NS}" > "${SHARED_DIR}/ephemeral-namespace"
   echo -n "${cluster_id}" > "${SHARED_DIR}/cluster-id"
@@
 ephemeral_rosa_release() {
+  if [[ -z "${EPHEMERAL_NS}" && -f "${SHARED_DIR}/ephemeral-namespace" ]]; then
+    EPHEMERAL_NS="$(cat "${SHARED_DIR}/ephemeral-namespace")"
+  fi
   if [[ -z "${EPHEMERAL_NS}" ]]; then
     return 0
   fi
@@
   EPHEMERAL_NS=""
+  rm -f "${SHARED_DIR}/ephemeral-namespace"
   export EPHEMERAL_NS
 }

Also applies to: 88-96

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/ephemeral-rosa-deploy.sh` around lines 72 - 85, Persist the ephemeral
namespace so the `release` entrypoint can reuse it later instead of starting
with an empty `EPHEMERAL_NS` and exiting early. Update the deploy path in
`scripts/ephemeral-rosa-deploy.sh` alongside the existing `cluster-id` write so
it also saves `EPHEMERAL_NS` to `${SHARED_DIR}`, and have the `release` flow
read that value back before attempting `bonfire namespace release`. Keep the fix
localized around the `deploy`/`release` handling and the
`EPHEMERAL_NS`/`CLUSTER_ID` persistence logic.

@YiqinZhang

Copy link
Copy Markdown
Contributor

/retest

export KUBECONFIG="${EPH_KUBECONFIG}"
set +x
oc login --token="${OC_LOGIN_TOKEN}" --server="${OC_LOGIN_SERVER}" --insecure-skip-tls-verify=true >/dev/null
set -x 2>/dev/null || true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -x 2>/dev/null || true
[[ $- == *x* ]] && _was_tracing=true || _was_tracing=false
set +x
oc login --token="${OC_LOGIN_TOKEN}" --server="${OC_LOGIN_SERVER}" --insecure-skip-tls-verify=true >/dev/null
$_was_tracing && set -x

Provide bonfire deploy/release helpers and ephemeral-cluster config so
Prow can provision a ROSA HCP cluster and run hypershift tests without
cluster reserve or osde2e destroy.

Merge dependency: merge this PR to openshift/osde2e main before the
companion openshift/release PR (SDCICD-1844). The release job sources
scripts/ephemeral-rosa-deploy.sh and configs/ephemeral-cluster.yaml from
this repository at runtime.

Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@minlei98: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants