Bound 'B' aux array element count in ParseAuxFields (heap OOB read)#1083
Bound 'B' aux array element count in ParseAuxFields (heap OOB read)#1083evilgensec wants to merge 1 commit into
Conversation
The BAM aux 'B' (array) branch in ParseAuxFields read the element count (n_elements) directly from the record bytes and then iterated that many elements, dereferencing the buffer on each iteration, without checking the count against the bytes remaining in the record. Every scalar aux branch in the same function already bounds-checks (`end - s < size`); the 'B' array branch did not. A BAM whose 'B' array declares more elements than the record contains caused a heap out-of-bounds read past the htslib record buffer when parse_aux_fields is enabled (e.g. make_examples --parse_sam_aux_fields, which DeepTrio enables by default). Reject an element count whose total byte size exceeds the remaining record, computed in 64-bit since n_elements is read as a uint32 into an int.
|
Hi @evilgensec, Thanks for the PR! Because of the way our project is set up, we aren't able to merge GitHub PRs directly. If you are okay with it, I can review and test your changes. If they look good, I will commit them, crediting your GitHub username and referencing this PR in the commit description. Let me know if that works for you, or if you'd prefer that we close this PR instead. -pichuan |
|
Hi @pichuan Thank you for the clarification. I’m happy to proceed with that approach. Please feel free to review and test the changes, and if everything looks good, you may commit them on my behalf while crediting my GitHub username and referencing this PR in the commit message. My main goal is to ensure the contribution is properly recognized and remains eligible under the Google VRP contribution guidelines. Thanks for your help, and please let me know if you need anything further from me. |
Problem
ParseAuxFields()inthird_party/nucleus/io/sam_reader.ccparses BAM aux tags. The'B'(array) branch reads the element count straight from the record bytes:and then iterates
n_elementstimes, dereferencing the buffer on each element(
le_to_u32(s)/le_to_i16(s)/le_to_float(s)...s += element_size), with nocheck that
n_elements * element_sizefits within the bytes remaining in the record.Every scalar aux branch in the same function already bounds-checks before reading
(
if (size < 0 || end - s < size) return DataLoss(...)); the'B'array branch is theone path that trusts the on-file count. A BAM record whose
'B'array declares moreelements than the record contains makes the loop walk past
end = b->data + b->l_data, a heap out-of-bounds read. Out-of-bounds bytes are appendedto the parsed read's INFO field (information disclosure into output), and a large count
reads past the mapping and crashes.
htslib's
bam_read1validates only the fixed core fields (qname/cigar/seq/qual) andCIGAR/qlen; it does not validate the aux region, so the malformed count reaches this
code unchecked.
Reachability
Triggered when aux parsing is enabled:
make_examples --parse_sam_aux_fields(
make_examples_core.pypassesparse_aux_fields=self.options.parse_sam_aux_fields),which DeepTrio enables by default (
scripts/run_deeptrio.py), and the publicSamReader(parse_aux_fields=True)API. Input is an untrusted BAM.Fix
Reject a
'B'element count whose total byte size exceeds the remaining record,mirroring the scalar branches and computing in 64-bit (the count is read as a
uint32into an
int):Verified with AddressSanitizer against htslib 1.18: a crafted BAM (a
BX:B:Iarraydeclaring 1,000,000 elements with one element of payload) is accepted by
sam_read1and triggers a heap-buffer-overflow read in the unpatched
ParseAuxFields; with thischange the record is rejected with
DataLossand no out-of-bounds access occurs.