perf: improve performance of array_repeat function#20049
perf: improve performance of array_repeat function#20049Jefffrey merged 9 commits intoapache:mainfrom
array_repeat function#20049Conversation
| let mut outer_offsets = Vec::with_capacity(counts.len() + 1); | ||
| outer_offsets.push(O::zero()); | ||
| let mut running_offset = 0usize; | ||
| for &count in counts.iter() { |
There was a problem hiding this comment.
Can use OffsetBuffer::from_lengths here I believe?
There was a problem hiding this comment.
Would using OffsetBuffer::from_lengths cause an extra loop over the lengths array internally?
I'm a bit concerned this could introduce some performance overhead.
There was a problem hiding this comment.
Here we already have to iterate over counts, which is equivalent to what from_lengths is doing I believe?
There was a problem hiding this comment.
Oh, I misread that, the outer offsets can indeed be replaced with from_lengths, which is clearer. Thanks!
| let data_type = list_array.data_type(); | ||
| let value_type = list_array.value_type(); | ||
| let mut new_values = vec![]; | ||
| let counts = count_array.values(); |
There was a problem hiding this comment.
What happens if an element of count_array is null? It looks like we still access the underlying primitive?
There was a problem hiding this comment.
will produce an empty array. This function seems to assume count[i] >= 0 and doesn’t handle count[i] being null. This also confused me, but I kept the original behavior since I’m not sure whether existing users rely on null count being treated as empty array.
There was a problem hiding this comment.
We might need a followup issue if this is the existing behaviour; we can't rely on null slots being 0 value (potential bug), moreover I feel we should be consistent with null semantics (usually a null input would return null 🤔)
There was a problem hiding this comment.
I agree. I’ve opened a follow-up issue to track this #20075.
Since this could potentially introduce a user-facing behavioral change around null semantics, I’d prefer to keep this PR focused on the performance improvement. I can address the null-handling semantics separately in a follow-up PR.
|
Thanks @lyne7-sc 🚀 |
Which issue does this PR close?
Rationale for this change
The current
array_repeatimplementation built intermediate arrays row by row usingMutableArrayData + concat. This caused unnecessary allocations and per-row overhead.This PR replaces it with a vectorized implementation based on
offsets + take, which constructs the output in a single pass and reduces intermediate memory usage.What changes are included in this PR?
Reimplemented
general_repeatandgeneral_list_repeatusingtake, eliminating per-row materialization.Benchmarks
Are these changes tested?
Yes. Existing SLT tests pass without modification.
Are there any user-facing changes?
No.