Fix Xygeni SAST/Secrets deduplication: key on uniqueHash, not issueId#15061
Open
lmrb-1968 wants to merge 2 commits into
Open
Fix Xygeni SAST/Secrets deduplication: key on uniqueHash, not issueId#15061lmrb-1968 wants to merge 2 commits into
lmrb-1968 wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a bug fix for an existing parser (the Xygeni parser added in #14769, refined in #15003), which falls under the "Acceptable examples" in CONTRIBUTING (bug fix for an existing parser), so no separate pre-approval issue is filed.
Description
A user reported a corner case in the dedup keys introduced in #15003: two SAST findings for the same detector on the very same line, differing only in their
code, were incorrectly deduplicated into one. #15003 keyedunique_id_from_toolon Xygeni'sissueId, which encodes the file path and line but not the code, so the two findings share anissueIdeven though theiruniqueHashdiffers.uniqueHashis Xygeni's location-independent identity for a finding:MD5(kind + detector + filepath + normalized code)— the line is deliberately excluded.CVE#component:version.Keying
unique_id_from_toolonuniqueHashkeeps genuinely distinct findings apart (different code on the same line) and keeps a finding's identity stable when code shifts to a different line (no reopen churn).This also improves
vuln_id_from_tool, which is a non-unique grouping label only (never a dedup key in our configuration): it now uses thedetectorfor SAST/Secrets and the user-friendly vulnerability id (userId: CVE / GHSA / OSV) for SCA, matching how comparable parsers populate the field (Semgrepcheck_id, SARIFruleId; Grype/Snyk vulnerability id).unique_id_from_tool(prev → new)vuln_id_from_tool(prev → new)issueId→uniqueHashuniqueHash→detectorunique_id_from_tooluniqueHash(unchanged)issueId→userIdunique_id_from_tool_or_hash_code→unique_id_from_toolissueId→uniqueHashuniqueHash→detectorunique_id_from_toolAll three scan types now deduplicate on
unique_id_from_toolkeyed onuniqueHash. SCA'sunique_id_from_toolvalue is unchanged, so SCA findings are not affected by the algorithm switch; its now-unused hash-code settings entries are removed.Repeated secrets in one file
The same secret value can be leaked on several lines of one file. Because
uniqueHashexcludes the line, those occurrences share one identity. Rather than splitting them with a synthetic per-line/sequence id (which would reopen findings whenever the lines shift), the Secrets parser aggregates them into a single finding and lists every line where the secret appears in the description.Test results
Updated
unittests/tools/test_xygeni_parser.py:unique_id_from_tool(uniqueHash) andvuln_id_from_tool(detector/userId) values.test_sast_same_code_repeated_in_same_file_shares_unique_id— identical code on 4 lines now shares oneunique_id_from_tool, so dedup folds the occurrences into one finding.test_secrets_repeated_in_same_file_aggregate_into_one_finding— a secret repeated on lines 9 and 29 now yields one finding whose description lists both lines.Ruff (0.15.15) passes on all changed files.
Documentation
docs/content/supported_tools/parsers/file/xygeni.md— Deduplication section rewritten for theuniqueHashmapping.docs/content/releases/os_upgrading/3.0.200.md— upgrade note describing the one-time reimport effect (existing SAST/Secrets findings re-key fromissueIdtouniqueHash), superseding the 3.0.100 note. This is intentionally a one-time hit rather than a versioned (V2) parser; theunique_id_from_toolmapping is now considered final.Checklist
dev. (Branched off the latestbugfix, the correct base for a bug fix.)bugfixbranch.bugfix)