Skip to content

refactor(central): decouple compliance persistence from watcher#21225

Draft
guzalv wants to merge 1 commit into
masterfrom
refactor/decouple-compliance-persistence-from-watcher
Draft

refactor(central): decouple compliance persistence from watcher#21225
guzalv wants to merge 1 commit into
masterfrom
refactor/decouple-compliance-persistence-from-watcher

Conversation

@guzalv

@guzalv guzalv commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Description

The compliance scan and result pipelines are coupled to the report manager's watcher system in a way that makes data persistence conditional on watcher success. Both pipelines call HandleScan/HandleResult before UpsertScan/UpsertResult. If the watcher returns any error, the pipeline short-circuits and the data is never persisted.

The watcher exists solely for report generation tracking, but it acts as a gatekeeper for whether compliance data exists in the database at all.

Shortcomings of this coupling

What this PR changes

Swaps the order: persist first, then notify the watcher. Watcher error semantics are preserved (still propagated to the caller). This is the minimal structural change that breaks the coupling.

Order swap is safe because:

  • Neither HandleScan nor HandleResult mutate the scan/result objects
  • ComplianceCheckResult CRs are updated in-place across scan cycles (same K8s UID). The DedupingQueue replaces events with the same UID in-place, ensuring the latest version is processed last within FIFO ordering
  • DeleteOldResults provides timestamp-based cleanup of stale data on each successful scan cycle
  • HandleScan does not depend on the scan being in the scan datastore. HandleResult depends on the SCAN (not the result) being in the scan datastore, which is unaffected by the result pipeline's order change.

Follow-up PRs enabled by this change

  • Make watcher errors non-fatal. Log HandleScan/HandleResult errors instead of returning them. Events succeed regardless of watcher state. This completes the decoupling.
  • Fix timeout deletion (undo the PR ROX-34779: skip DeleteOldResults on scan watcher timeout #20752 trade-off). With persistence decoupled, call DeleteOldResults on timeout with includeCurrentResults=false (strictly < comparison) instead of skipping it entirely. Deletes truly old results while preserving current ones. Any aggressively-deleted results would be re-persisted on the next sensor sync since persistence is now unconditional.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • All pipeline unit tests pass (scan and result pipelines)
  • Report manager unit tests pass (no regressions)
  • New tests verify ordering with gomock.InOrder (UpsertScan/UpsertResult before HandleScan/HandleResult)
  • New tests verify error scenarios: watcher error after successful persistence, and upsert error short-circuiting before watcher notification
  • Pre-commit lint/style checks pass

Cluster verification

Deployed the patched central to an OCP 4.22 cluster (ga-ocp4-cron) with sensor on a second cluster (ga-ocp4-cron-2) running Compliance Operator v1.9.0.

After triggering CIS compliance scans, 240 check results were persisted to the database despite the watcher returning "this scan is not handled by any known scan configuration" errors for all events. Central logs confirm the new behavior:

pipeline.go:91: Error: unable to handle the check result in the report manager: this scan is not handled by any known scan configuration
worker_queue.go:96: Error: Unretryable error found while handling sensor message *central.SensorEvent_ComplianceOperatorResultV2 permanently

Before this change, those 240 results would NOT have been persisted — the watcher error would have short-circuited before UpsertResult was called. Now UpsertResult runs first (data persisted), then HandleResult fails (logged, error propagated to worker queue).

The compliance scan and result pipelines called HandleScan/HandleResult
before UpsertScan/UpsertResult, making data persistence conditional on
watcher success. The watcher exists for report generation tracking but
acted as a gatekeeper for whether compliance data exists at all.

Swap the order: persist first, then notify. Watcher errors are still
propagated (error semantics unchanged). This ensures valid compliance
data from clusters is always written to the database, regardless of
watcher-internal state (e.g. no matching scan config yet, watcher
already stopped, scan already handled).

Follow-up PRs will make watcher errors non-fatal (completing the
decoupling) and fix the timeout deletion trade-off from PR #20752.

Partially-generated with AI assistance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: e164433b-7a2d-4175-bbd6-fb37a0a404f0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/decouple-compliance-persistence-from-watcher

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🚀 Build Images Ready

Images are ready for commit f4f23fb. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.12.x-208-gf4f23fb3da

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant