GH-1179: Correct the size of var-width vector with >0 start offset during vector append#1180
Open
jordepic wants to merge 1 commit into
Open
GH-1179: Correct the size of var-width vector with >0 start offset during vector append#1180jordepic wants to merge 1 commit into
jordepic wants to merge 1 commit into
Conversation
…width vectors with non-zero start offsets VectorAppender computed the delta vector's data size as its last offset value, which is only correct when the offset buffer starts at zero. Vectors imported through the C data interface from sliced arrays can have a non-zero first offset; appending them copied the unreferenced data buffer prefix into the target, inflating it on every append until allocation eventually failed with OversizedAllocationException. Compute the data size as the distance between the first and last offsets, copy from the first offset, and rebase appended offsets accordingly. Fixes apache#1179.
|
Thank you for opening a pull request! Please label the PR with one or more of:
Also, add the 'breaking-change' label if appropriate. See CONTRIBUTING.md for details. |
Author
|
Could a maintainer please add the bug-fix label here? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's Changed
Fix VectorAppender data size computation for variable-width vectors with non-zero start offsets
When appending a variable width offset vector in DataFusion comet I was receiving exceptions
due to allocating too much memory. This is because Comet passes variable width arrays back
to Java where the initial offset vector entry is greater than 0. Prior to this change, arrow-java
determines how many bytes to copy by just looking at the last offset entry in the buffer,
completely disregarding the value of the first. If first = 100 and last = 200, Java will still
copy 200 bytes instead of 100. In this change we fix that.
Closes #1179