Skip to content

gather-extra: collect loaded kernel modules from all nodes#80798

Open
sdodson wants to merge 2 commits into
openshift:mainfrom
sdodson:gather-extra-lsmod
Open

gather-extra: collect loaded kernel modules from all nodes#80798
sdodson wants to merge 2 commits into
openshift:mainfrom
sdodson:gather-extra-lsmod

Conversation

@sdodson

@sdodson sdodson commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds lsmod output collection to the per-node gather loop in the gather-extra step
  • Captures loaded kernel modules from both worker and control plane hosts on every CI run
  • Output is saved to ${ARTIFACT_DIR}/nodes/<node>/lsmod alongside existing per-node artifacts

The goal is to build a broad, cross-run understanding of which kernel modules are loaded by default across supported platforms (AWS, GCP, Azure, bare metal, etc.), which can inform debugging and platform-specific investigations.

Test plan

  • Verify that a CI job using the gather-extra step produces nodes/<node>/lsmod files in its artifacts for both worker and control plane nodes

Summary by CodeRabbit

This PR enhances the OpenShift CI infrastructure's gather-extra step by adding collection of kernel module information from cluster nodes. Specifically, it modifies the gather extra commands script to capture lsmod output from both worker and control plane nodes during CI test runs.

The change integrates a new per-node data collection task that invokes oc debug node/$i to retrieve the list of loaded kernel modules and stores the output in ${ARTIFACT_DIR}/nodes/<node>/lsmod. This collection runs in parallel with existing per-node artifacts (such as heap dumps and audit logs), maintaining the efficiency of the gather phase.

This enhancement applies across all supported OpenShift CI platforms (AWS, GCP, Azure, and bare metal environments). The collected kernel module data builds a baseline dataset that helps with platform-specific debugging and understanding default kernel configurations across different infrastructure providers during continuous integration testing.

Add lsmod collection to the per-node gather loop so that loaded kernel
modules are captured from both worker and control plane hosts on every
CI run. The goal is to build a broad understanding of which modules are
loaded by default across supported platforms.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

rh-pre-commit.version: 2.4.0
rh-pre-commit.check-secrets: ENABLED
@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2026
@openshift-ci openshift-ci Bot requested review from sosiouxme and stbenjam June 19, 2026 17:25
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 513342b8-2c49-42f8-a136-523216501c5e

📥 Commits

Reviewing files that changed from the base of the PR and between 3e49d04 and 41df324.

📒 Files selected for processing (1)
  • ci-operator/step-registry/gather/extra/gather-extra-commands.sh

Walkthrough

Four lines are added to the per-node parallel collection loop in gather-extra-commands.sh. The script locates the machine-config-daemon pod scheduled on each node and queues an oc exec command to run lsmod inside the pod's host chroot, writing the output to ${ARTIFACT_DIR}/nodes/$i/lsmod.

Changes

Per-node lsmod collection

Layer / File(s) Summary
lsmod capture in per-node gather loop
ci-operator/step-registry/gather/extra/gather-extra-commands.sh
Adds queued oc exec command that locates the machine-config-daemon pod on each node and captures lsmod output into ${ARTIFACT_DIR}/nodes/$i/lsmod during the parallel per-node artifact collection phase.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding collection of loaded kernel modules from all nodes via lsmod output in the gather-extra step.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies only gather-extra-commands.sh, a bash shell script with no Ginkgo tests. Check not applicable.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code; it only modifies bash shell scripts and documentation files. The custom check for Ginkgo test quality is not applicable.
Microshift Test Compatibility ✅ Passed This PR modifies a bash script (gather-extra-commands.sh) that gathers CI diagnostics, not adding any Ginkgo e2e tests. The MicroShift test compatibility check only applies when new tests are added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes are only to a bash script that collects diagnostic artifacts via lsmod, which is infrastructure code not subject to SNO compatibility checks.
Topology-Aware Scheduling Compatibility ✅ Passed This PR modifies a CI diagnostic script to collect kernel module info, not deployment manifests, operator code, or controllers. No scheduling constraints are introduced, so the topology-aware check...
Ote Binary Stdout Contract ✅ Passed Custom check is not applicable: PR modifies a bash CI script, not an OTE binary. The check targets Go process-level code structures (main(), TestMain(), BeforeSuite(), etc.) that don't exist in she...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies only a shell script (gather-extra-commands.sh) to collect kernel module info; no Ginkgo e2e tests are added, so IPv6/disconnected network check does not apply.
No-Weak-Crypto ✅ Passed The PR adds kernel module info collection (lsmod) to a CI script. No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or insecure secret comp...
Container-Privileges ✅ Passed PR modifies only a shell script to add diagnostics via 'oc exec' into existing pods; no new Kubernetes manifests or container security configurations introduced.
No-Sensitive-Data-In-Logs ✅ Passed The PR adds lsmod collection to gather kernel module information. lsmod output contains only public kernel module names, memory usage, and reference counts—no passwords, tokens, API keys, PII, or c...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci-operator/step-registry/gather/extra/gather-extra-commands.sh`:
- Line 151: The queue command on line 151 contains unquoted variable expansions
that can cause word-splitting and globbing issues. Quote the variable expansions
${ARTIFACT_DIR} and $i (which appears twice in the command) by wrapping them in
double quotes to ensure they are treated as single arguments with safe
boundaries, particularly important when these variables might contain spaces or
special characters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d30574b9-8167-4a00-bcde-833b48f9b44d

📥 Commits

Reviewing files that changed from the base of the PR and between 52c74a1 and 3e49d04.

📒 Files selected for processing (1)
  • ci-operator/step-registry/gather/extra/gather-extra-commands.sh

Comment thread ci-operator/step-registry/gather/extra/gather-extra-commands.sh Outdated
@sdodson

sdodson commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

/pj-rehearse

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@sdodson

sdodson commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-ovn-ipv6 periodic-ci-openshift-release-main-ci-5.0-e2e-azure-ovn-upgrade periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node periodic-ci-openshift-release-main-nightly-5.0-e2e-vsphere-ovn periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-s2s-libvirt-ppc64le
periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-libvirt-s390x

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@sdodson

sdodson commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

/pj-rehearse abort

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Switch from oc debug (which spawns a new pod) to exec into the
already-running machine-config-daemon pod on each node. The MCD
approach is faster (~1s vs ~2s per node), doesn't require scheduling
a new pod, and works reliably on both worker and control plane nodes.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

rh-pre-commit.version: 2.4.0
rh-pre-commit.check-secrets: ENABLED
@sdodson

sdodson commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-ovn-ipv6 periodic-ci-openshift-release-main-ci-5.0-e2e-azure-ovn-upgrade periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node periodic-ci-openshift-release-main-nightly-5.0-e2e-vsphere-ovn periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-s2s-libvirt-ppc64le periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-libvirt-s390x

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@sdodson: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@sdodson: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-netobserv-netobserv-operator-main-e2e-operator netobserv/netobserv-operator presubmit Registry content changed
pull-ci-netobserv-netobserv-operator-main-e2etest netobserv/netobserv-operator presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.7-e2e-aws-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.7-e2e-aws-console-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.6-e2e-aws-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.6-e2e-aws-console-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.5-e2e-aws-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.5-e2e-aws-console-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.4-e2e-aws-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.4-e2e-aws-console-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.3-e2e-aws-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.3-e2e-aws-console-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.2-e2e-aws-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.2-e2e-aws-console-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.1-e2e-aws-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.1-e2e-aws-console-olm operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.2-e2e-aws operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.1-e2e-aws operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.7-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.6-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.5-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.4-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.3-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.2-e2e-aws-upgrade operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.1-e2e-aws-upgrade operator-framework/operator-lifecycle-manager presubmit Registry content changed

A total of 42065 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci

openshift-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

@sdodson: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-s2s-libvirt-ppc64le 41df324 link unknown /pj-rehearse periodic-ci-openshift-multiarch-main-nightly-5.0-ocp-e2e-ovn-remote-s2s-libvirt-ppc64le
ci/rehearse/periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node 41df324 link unknown /pj-rehearse periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node
ci/rehearse/periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-ovn-ipv6 41df324 link unknown /pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant