fix(ci): don't kill ancestor processes in handle_dangling_processes#21300
fix(ci): don't kill ancestor processes in handle_dangling_processes#21300davdhacs wants to merge 2 commits into
Conversation
handle_dangling_processes walks all processes and kills anything that isn't self, a direct child, entrypoint, or defunct. When dispatch.sh runs inside a ci-operator step script, the step script is an ancestor (grandparent via entrypoint). Killing it causes the Prow entrypoint to report "signal: terminated" / exit 255, failing the job even when all tests pass. Build the full ancestor chain ($$, parent, grandparent, ...) and skip all of them. This replaces the previous pid == $$ check which only protected the current process, not its parents. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Build Images ReadyImages are ready for commit 9c41746. To use with deploy scripts: export MAIN_IMAGE_TAG=4.12.x-258-g9c41746dbe |
Replace `local -A` (bash 4+ associative array) with a simple space-delimited string for the ancestor PID set. This avoids potential compatibility issues in minimal build containers where bash features or /proc behavior may differ. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
handle_dangling_processesinci_exit_trapwalks all processes and kills anything that isn'tself (
$$), a direct child, entrypoint, or defunct. Whendispatch.shruns inside a ci-operatorstep script (as in the LP interop jobs), the step script is an ancestor — not
$$, not a child,and its command doesn't match
entrypoint. So it gets killed, causing the Prow entrypoint wrapperto report
signal: terminated/ exit code 255.This is why the
periodic-ci-stackrox-stackrox-master-ocp-4.22-lpMainline-lp-ocp-compat-cr--acs--tests-awsjob has never passed — even when all tests succeed.
The fix builds the full ancestor PID chain (
$$→ parent → grandparent → ... → init) and skipsall of them. This replaces the previous
pid == $$check which only protected the current process.Related: openshift/release#80796 (fixes the JUnit Secret size issue in the same job)
User-facing documentation
Testing and quality
Automated testing
No new tests. The LP interop rehearsal on the openshift/release PR will validate both
fixes together.
How I validated my change
Analyzed process trees from multiple CI runs:
handle_dangling_processeswhich finds the step script (PID 29, ppid=21)$$(32), isn't a child of 32, and doesn't matchentrypoint|defunctsignal: terminated→ exit 255