Skip to content

Bound 'B' aux array element count in ParseAuxFields (heap OOB read)#1083

Open
evilgensec wants to merge 1 commit into
google:r1.10from
evilgensec:fix-sam-aux-B-oob
Open

Bound 'B' aux array element count in ParseAuxFields (heap OOB read)#1083
evilgensec wants to merge 1 commit into
google:r1.10from
evilgensec:fix-sam-aux-B-oob

Conversation

@evilgensec

Copy link
Copy Markdown

Problem

ParseAuxFields() in third_party/nucleus/io/sam_reader.cc parses BAM aux tags. The
'B' (array) branch reads the element count straight from the record bytes:

const int n_elements = le_to_u32(s);
s += 4;

and then iterates n_elements times, dereferencing the buffer on each element
(le_to_u32(s) / le_to_i16(s) / le_to_float(s) ... s += element_size), with no
check that n_elements * element_size fits within the bytes remaining in the record.

Every scalar aux branch in the same function already bounds-checks before reading
(if (size < 0 || end - s < size) return DataLoss(...)); the 'B' array branch is the
one path that trusts the on-file count. A BAM record whose 'B' array declares more
elements than the record contains makes the loop walk past
end = b->data + b->l_data, a heap out-of-bounds read. Out-of-bounds bytes are appended
to the parsed read's INFO field (information disclosure into output), and a large count
reads past the mapping and crashes.

htslib's bam_read1 validates only the fixed core fields (qname/cigar/seq/qual) and
CIGAR/qlen; it does not validate the aux region, so the malformed count reaches this
code unchecked.

Reachability

Triggered when aux parsing is enabled: make_examples --parse_sam_aux_fields
(make_examples_core.py passes parse_aux_fields=self.options.parse_sam_aux_fields),
which DeepTrio enables by default (scripts/run_deeptrio.py), and the public
SamReader(parse_aux_fields=True) API. Input is an untrusted BAM.

Fix

Reject a 'B' element count whose total byte size exceeds the remaining record,
mirroring the scalar branches and computing in 64-bit (the count is read as a uint32
into an int):

if (n_elements < 0 ||
    static_cast<int64_t>(n_elements) * element_size > (end - s)) {
  return ::nucleus::DataLoss("B-array length exceeds record for tag " + tag);
}

Verified with AddressSanitizer against htslib 1.18: a crafted BAM (a BX:B:I array
declaring 1,000,000 elements with one element of payload) is accepted by sam_read1
and triggers a heap-buffer-overflow read in the unpatched ParseAuxFields; with this
change the record is rejected with DataLoss and no out-of-bounds access occurs.

The BAM aux 'B' (array) branch in ParseAuxFields read the element count
(n_elements) directly from the record bytes and then iterated that many
elements, dereferencing the buffer on each iteration, without checking the
count against the bytes remaining in the record. Every scalar aux branch in
the same function already bounds-checks (`end - s < size`); the 'B' array
branch did not. A BAM whose 'B' array declares more elements than the record
contains caused a heap out-of-bounds read past the htslib record buffer when
parse_aux_fields is enabled (e.g. make_examples --parse_sam_aux_fields, which
DeepTrio enables by default).

Reject an element count whose total byte size exceeds the remaining record,
computed in 64-bit since n_elements is read as a uint32 into an int.
@pichuan

pichuan commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Hi @evilgensec,

Thanks for the PR!

Because of the way our project is set up, we aren't able to merge GitHub PRs directly.

If you are okay with it, I can review and test your changes. If they look good, I will commit them, crediting your GitHub username and referencing this PR in the commit description.

Let me know if that works for you, or if you'd prefer that we close this PR instead.

-pichuan

@pichuan pichuan self-assigned this Jun 24, 2026
@evilgensec

Copy link
Copy Markdown
Author

Hi @pichuan

Thank you for the clarification. I’m happy to proceed with that approach.

Please feel free to review and test the changes, and if everything looks good, you may commit them on my behalf while crediting my GitHub username and referencing this PR in the commit message.

My main goal is to ensure the contribution is properly recognized and remains eligible under the Google VRP contribution guidelines.

Thanks for your help, and please let me know if you need anything further from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants