Skip to content

fix(ci): don't kill ancestor processes in handle_dangling_processes#21300

Draft
davdhacs wants to merge 2 commits into
masterfrom
fix-dangling-process-kill
Draft

fix(ci): don't kill ancestor processes in handle_dangling_processes#21300
davdhacs wants to merge 2 commits into
masterfrom
fix-dangling-process-kill

Conversation

@davdhacs

Copy link
Copy Markdown
Contributor

Description

handle_dangling_processes in ci_exit_trap walks all processes and kills anything that isn't
self ($$), a direct child, entrypoint, or defunct. When dispatch.sh runs inside a ci-operator
step script (as in the LP interop jobs), the step script is an ancestor — not $$, not a child,
and its command doesn't match entrypoint. So it gets killed, causing the Prow entrypoint wrapper
to report signal: terminated / exit code 255.

This is why the periodic-ci-stackrox-stackrox-master-ocp-4.22-lpMainline-lp-ocp-compat-cr--acs--tests-aws
job has never passed — even when all tests succeed.

The fix builds the full ancestor PID chain ($$ → parent → grandparent → ... → init) and skips
all of them. This replaces the previous pid == $$ check which only protected the current process.

Related: openshift/release#80796 (fixes the JUnit Secret size issue in the same job)

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

No new tests. The LP interop rehearsal on the openshift/release PR will validate both
fixes together.

How I validated my change

Analyzed process trees from multiple CI runs:

  • dispatch.sh (PID 32) runs handle_dangling_processes which finds the step script (PID 29, ppid=21)
  • PID 29 doesn't match $$ (32), isn't a child of 32, and doesn't match entrypoint|defunct
  • So it gets killed → signal: terminated → exit 255
  • With this fix, PID 29 would be in the ancestor set (32 → 29 via ppid chain) and skipped

handle_dangling_processes walks all processes and kills anything that
isn't self, a direct child, entrypoint, or defunct. When dispatch.sh
runs inside a ci-operator step script, the step script is an ancestor
(grandparent via entrypoint). Killing it causes the Prow entrypoint
to report "signal: terminated" / exit 255, failing the job even when
all tests pass.

Build the full ancestor chain ($$, parent, grandparent, ...) and skip
all of them. This replaces the previous pid == $$ check which only
protected the current process, not its parents.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 05e10cd0-0925-43b6-af89-f4e4902ae340

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-dangling-process-kill

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🚀 Build Images Ready

Images are ready for commit 9c41746. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.12.x-258-g9c41746dbe

Replace `local -A` (bash 4+ associative array) with a simple
space-delimited string for the ancestor PID set. This avoids
potential compatibility issues in minimal build containers where
bash features or /proc behavior may differ.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant