Skip to content

Fix Xygeni SAST/Secrets deduplication: key on uniqueHash, not issueId#15061

Open
lmrb-1968 wants to merge 2 commits into
DefectDojo:bugfixfrom
xygeni:fix/xygeni_sast_wrong_dedup
Open

Fix Xygeni SAST/Secrets deduplication: key on uniqueHash, not issueId#15061
lmrb-1968 wants to merge 2 commits into
DefectDojo:bugfixfrom
xygeni:fix/xygeni_sast_wrong_dedup

Conversation

@lmrb-1968

Copy link
Copy Markdown
Contributor

⚠️ Pre-Approval check ⚠️

This is a bug fix for an existing parser (the Xygeni parser added in #14769, refined in #15003), which falls under the "Acceptable examples" in CONTRIBUTING (bug fix for an existing parser), so no separate pre-approval issue is filed.

Description

A user reported a corner case in the dedup keys introduced in #15003: two SAST findings for the same detector on the very same line, differing only in their code, were incorrectly deduplicated into one. #15003 keyed unique_id_from_tool on Xygeni's issueId, which encodes the file path and line but not the code, so the two findings share an issueId even though their uniqueHash differs.

uniqueHash is Xygeni's location-independent identity for a finding:

  • SAST: MD5(kind + detector + filepath + normalized code) — the line is deliberately excluded.
  • SCA: CVE#component:version.
  • Secrets: hashes the secret value + type + detector + file + key.

Keying unique_id_from_tool on uniqueHash keeps genuinely distinct findings apart (different code on the same line) and keeps a finding's identity stable when code shifts to a different line (no reopen churn).

This also improves vuln_id_from_tool, which is a non-unique grouping label only (never a dedup key in our configuration): it now uses the detector for SAST/Secrets and the user-friendly vulnerability id (userId: CVE / GHSA / OSV) for SCA, matching how comparable parsers populate the field (Semgrep check_id, SARIF ruleId; Grype/Snyk vulnerability id).

Scan type unique_id_from_tool (prev → new) vuln_id_from_tool (prev → new) Algorithm
SAST issueIduniqueHash uniqueHashdetector unique_id_from_tool
SCA uniqueHash (unchanged) issueIduserId unique_id_from_tool_or_hash_codeunique_id_from_tool
Secrets issueIduniqueHash uniqueHashdetector unique_id_from_tool

All three scan types now deduplicate on unique_id_from_tool keyed on uniqueHash. SCA's unique_id_from_tool value is unchanged, so SCA findings are not affected by the algorithm switch; its now-unused hash-code settings entries are removed.

Repeated secrets in one file

The same secret value can be leaked on several lines of one file. Because uniqueHash excludes the line, those occurrences share one identity. Rather than splitting them with a synthetic per-line/sequence id (which would reopen findings whenever the lines shift), the Secrets parser aggregates them into a single finding and lists every line where the secret appears in the description.

Test results

Updated unittests/tools/test_xygeni_parser.py:

  • SAST/Secrets/SCA assertions updated to the new unique_id_from_tool (uniqueHash) and vuln_id_from_tool (detector / userId) values.
  • test_sast_same_code_repeated_in_same_file_shares_unique_id — identical code on 4 lines now shares one unique_id_from_tool, so dedup folds the occurrences into one finding.
  • test_secrets_repeated_in_same_file_aggregate_into_one_finding — a secret repeated on lines 9 and 29 now yields one finding whose description lists both lines.

Ruff (0.15.15) passes on all changed files.

Documentation

  • docs/content/supported_tools/parsers/file/xygeni.md — Deduplication section rewritten for the uniqueHash mapping.
  • docs/content/releases/os_upgrading/3.0.200.md — upgrade note describing the one-time reimport effect (existing SAST/Secrets findings re-key from issueId to uniqueHash), superseding the 3.0.100 note. This is intentionally a one-time hit rather than a versioned (V2) parser; the unique_id_from_tool mapping is now considered final.

Checklist

  • Make sure to rebase your PR against the very latest dev. (Branched off the latest bugfix, the correct base for a bug fix.)
  • Bugfixes should be submitted against the bugfix branch.
  • Give a meaningful name to your PR.
  • Your code is Ruff compliant (see ruff.toml).
  • Your code is python 3.13 compliant.
  • If this is a new feature and not a bug fix, you've included the proper documentation. (Not a new feature; docs updated anyway.)
  • Model changes must include the necessary migrations in the dojo/db_migrations folder. (No model changes.)
  • Add applicable tests to the unit tests.
  • Add the proper label to categorize your PR. (bugfix)

@github-actions github-actions Bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser labels Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant