Improve memory accounting for ArrowBytesViewMap#20077
Conversation
There was a problem hiding this comment.
Pull request overview
This PR attempts to improve memory accounting for ArrowBytesViewMap by changing from calculating null buffer size based on logical length (self.nulls.len() / 8) to using the allocated size (self.nulls.allocated_size()). The intent is to account for allocated memory rather than just the used memory, which aligns with DataFusion's memory accounting goals.
Changes:
- Modified
ArrowBytesViewMap::size()method to useNullBufferBuilder::allocated_size()for calculating null buffer memory usage
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
alamb
left a comment
There was a problem hiding this comment.
Thanks for working on this @vigneshsiva11 🙏
| let in_progress_size = self.in_progress.capacity(); | ||
| let completed_size: usize = self.completed.iter().map(|b| b.len()).sum(); | ||
| let nulls_size = self.nulls.len() / 8; | ||
| let nulls_size = self.nulls.allocated_size() / 8; |
There was a problem hiding this comment.
Allocated size is in bytes (not bits) https://docs.rs/arrow/latest/arrow/array/struct.NullBufferBuilder.html#method.allocated_size
Return the allocated size of this builder, in bytes, useful for memory accounting.
So I don't think we should still be dividing by 8
There was a problem hiding this comment.
Thanks for the clarification and the link! I’ve updated the code to use allocated_size() directly without dividing by 8.
There was a problem hiding this comment.
The code still appears to divide by 8 🤔
There was a problem hiding this comment.
According to the comment in datafusion/physical-plan/src/aggregates/group_values/null_builder.rs:66, "NullBufferBuilder::allocated_size returns capacity in bits". Other places in the codebase that use NullBufferBuilder directly (such as MaybeNullBufferBuilder::allocated_size() at datafusion/physical-plan/src/aggregates/group_values/null_builder.rs:67
Copilot is confused by this comment elsewhere in the source code (it's there).
Adjust nulls_size calculation to use allocated_size directly.
|
You’re right. I’ve updated the PR to remove the division by 8 and now use allocated_size() directly, as it already returns bytes. |
Thanks @vigneshsiva11 |
Which issue does this PR close?
Rationale for this change
ArrowBytesViewMap was previously accounting for the logical number of null bits when reporting memory usage. This under-reported memory consumption is because NullBufferBuilder may allocate more memory than is currently used.
Memory accounting in DataFusion is expected to reflect allocated memory rather than logical usage to ensure accurate memory tracking.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?