GH-48846: [C++] Optimize ReadMessage to read metadata and body in one go#48975
GH-48846: [C++] Optimize ReadMessage to read metadata and body in one go#48975abhishek593 wants to merge 4 commits intoapache:mainfrom
Conversation
|
|
pitrou
left a comment
There was a problem hiding this comment.
Thank you for the submission @abhishek593. This looks good generally but I've made a number of suggestions below, feel free to update your PR when you have some time!
cpp/src/arrow/ipc/message.h
Outdated
| const int64_t offset, const int32_t metadata_length, io::RandomAccessFile* file, | ||
| const FieldsLoaderFunction& fields_loader = {}); | ||
|
|
||
| /// \brief Read encapsulated RPC message from position in file |
There was a problem hiding this comment.
Ok, I think "RPC" is a typo, it should be "IPC", could you perhaps fix all occurrences in this file?
| // there are 3 read IOs before reading body: | ||
| // 1) read magic and footer length IO | ||
| // 2) read footer IO | ||
| // 3) read record batch metadata IO | ||
| EXPECT_EQ(read_ranges.size(), 3 + expected_body_read_lengths.size()); |
There was a problem hiding this comment.
Ok, can we keep a less strict assertion to check that we can at least access the two first elements without segfaulting?
Something like:
// there are at least 2 read IOs before reading body:
// 1) read magic and footer length IO
// 2) read footer IO
EXPECT_GE(read_ranges.size(), 2);
cpp/src/arrow/ipc/read_write_test.cc
Outdated
| // there are 3 read IOs before reading body: | ||
| // 1) read magic and footer length IO | ||
| // 2) read footer IO | ||
| // 3) read record batch metadata IO |
There was a problem hiding this comment.
This comment only matches one of the two code paths. Perhaps fix it or use a separate comment per code path?
| int64_t total_body = 0; | ||
| for (auto len : expected_body_read_lengths) total_body += len; | ||
|
|
||
| EXPECT_GT(read_ranges[2].length, total_body); |
There was a problem hiding this comment.
Add a small comment explaining why? Also can take inspiration from the other code path and add:
EXPECT_LE(read_ranges[2].length, total_body + footer_length);
| } | ||
| } | ||
|
|
||
| Result<std::unique_ptr<Message>> ReadMessage(const int64_t offset, |
There was a problem hiding this comment.
There's a lot of code duplication with the previous ReadMessage overload above (the one that takes a FieldsLoaderFunction). Can we refactor using a common helper so that most of the duplication goes away?
There was a problem hiding this comment.
Refactored both the overloads into a single ReadMessageInternal that takes an optional body_length.
… helper and fix documentation
10990b2 to
cc7dd86
Compare
Rationale for this change
ReadMessageAsync takes a body_length parameter and reads Message metadata + body in one go, but the blocking version ReadMessage reads the body length from the Message and issues a second read for the body. This PR adds a ReadMessage overload that takes the body length as parameter and does a single read like the async version does.
What changes are included in this PR?
Are these changes tested?
Yes, added TestReadMessage.ReadBodyWithLength and updated other tests to use the new overload.
Are there any user-facing changes?
No.