gather-extra: collect loaded kernel modules from all nodes#80798
gather-extra: collect loaded kernel modules from all nodes#80798sdodson wants to merge 2 commits into
Conversation
Add lsmod collection to the per-node gather loop so that loaded kernel modules are captured from both worker and control plane hosts on every CI run. The goal is to build a broad understanding of which modules are loaded by default across supported platforms. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> rh-pre-commit.version: 2.4.0 rh-pre-commit.check-secrets: ENABLED
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughFour lines are added to the per-node parallel collection loop in ChangesPer-node lsmod collection
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ci-operator/step-registry/gather/extra/gather-extra-commands.sh`:
- Line 151: The queue command on line 151 contains unquoted variable expansions
that can cause word-splitting and globbing issues. Quote the variable expansions
${ARTIFACT_DIR} and $i (which appears twice in the command) by wrapping them in
double quotes to ensure they are treated as single arguments with safe
boundaries, particularly important when these variables might contain spaces or
special characters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d30574b9-8167-4a00-bcde-833b48f9b44d
📒 Files selected for processing (1)
ci-operator/step-registry/gather/extra/gather-extra-commands.sh
|
/pj-rehearse |
|
@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-ovn-ipv6 periodic-ci-openshift-release-main-ci-5.0-e2e-azure-ovn-upgrade periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node periodic-ci-openshift-release-main-nightly-5.0-e2e-vsphere-ovn periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-s2s-libvirt-ppc64le |
|
@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse abort |
|
@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Switch from oc debug (which spawns a new pod) to exec into the already-running machine-config-daemon pod on each node. The MCD approach is faster (~1s vs ~2s per node), doesn't require scheduling a new pod, and works reliably on both worker and control plane nodes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> rh-pre-commit.version: 2.4.0 rh-pre-commit.check-secrets: ENABLED
|
/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-ovn-ipv6 periodic-ci-openshift-release-main-ci-5.0-e2e-azure-ovn-upgrade periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node periodic-ci-openshift-release-main-nightly-5.0-e2e-vsphere-ovn periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-s2s-libvirt-ppc64le periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-libvirt-s390x |
|
@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[REHEARSALNOTIFIER]
A total of 42065 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@sdodson: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
lsmodoutput collection to the per-node gather loop in thegather-extrastep${ARTIFACT_DIR}/nodes/<node>/lsmodalongside existing per-node artifactsThe goal is to build a broad, cross-run understanding of which kernel modules are loaded by default across supported platforms (AWS, GCP, Azure, bare metal, etc.), which can inform debugging and platform-specific investigations.
Test plan
gather-extrastep producesnodes/<node>/lsmodfiles in its artifacts for both worker and control plane nodesSummary by CodeRabbit
This PR enhances the OpenShift CI infrastructure's
gather-extrastep by adding collection of kernel module information from cluster nodes. Specifically, it modifies the gather extra commands script to capturelsmodoutput from both worker and control plane nodes during CI test runs.The change integrates a new per-node data collection task that invokes
oc debug node/$ito retrieve the list of loaded kernel modules and stores the output in${ARTIFACT_DIR}/nodes/<node>/lsmod. This collection runs in parallel with existing per-node artifacts (such as heap dumps and audit logs), maintaining the efficiency of the gather phase.This enhancement applies across all supported OpenShift CI platforms (AWS, GCP, Azure, and bare metal environments). The collected kernel module data builds a baseline dataset that helps with platform-specific debugging and understanding default kernel configurations across different infrastructure providers during continuous integration testing.