Skip to content

GH-48846: [C++] Optimize ReadMessage to read metadata and body in one go#48975

Open
abhishek593 wants to merge 4 commits intoapache:mainfrom
abhishek593:feature/ipc-read-message-optimization
Open

GH-48846: [C++] Optimize ReadMessage to read metadata and body in one go#48975
abhishek593 wants to merge 4 commits intoapache:mainfrom
abhishek593:feature/ipc-read-message-optimization

Conversation

@abhishek593
Copy link
Contributor

@abhishek593 abhishek593 commented Jan 24, 2026

Rationale for this change

ReadMessageAsync takes a body_length parameter and reads Message metadata + body in one go, but the blocking version ReadMessage reads the body length from the Message and issues a second read for the body. This PR adds a ReadMessage overload that takes the body length as parameter and does a single read like the async version does.

What changes are included in this PR?

  • Added ReadMessage overload accepting body_length
  • Updated ReadMessageFromBlock to use the new ReadMessage overload when reading full record batches.
  • Added new ReadBodyWithLength test case and updated IO range validation tests.

Are these changes tested?

Yes, added TestReadMessage.ReadBodyWithLength and updated other tests to use the new overload.

Are there any user-facing changes?

No.

@github-actions
Copy link

⚠️ GitHub issue #48846 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the submission @abhishek593. This looks good generally but I've made a number of suggestions below, feel free to update your PR when you have some time!

const int64_t offset, const int32_t metadata_length, io::RandomAccessFile* file,
const FieldsLoaderFunction& fields_loader = {});

/// \brief Read encapsulated RPC message from position in file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think "RPC" is a typo, it should be "IPC", could you perhaps fix all occurrences in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines -3006 to -3010
// there are 3 read IOs before reading body:
// 1) read magic and footer length IO
// 2) read footer IO
// 3) read record batch metadata IO
EXPECT_EQ(read_ranges.size(), 3 + expected_body_read_lengths.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, can we keep a less strict assertion to check that we can at least access the two first elements without segfaulting?
Something like:

  // there are at least 2 read IOs before reading body:
  // 1) read magic and footer length IO
  // 2) read footer IO
  EXPECT_GE(read_ranges.size(), 2);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 3045 to 3048
// there are 3 read IOs before reading body:
// 1) read magic and footer length IO
// 2) read footer IO
// 3) read record batch metadata IO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment only matches one of the two code paths. Perhaps fix it or use a separate comment per code path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int64_t total_body = 0;
for (auto len : expected_body_read_lengths) total_body += len;

EXPECT_GT(read_ranges[2].length, total_body);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a small comment explaining why? Also can take inspiration from the other code path and add:

  EXPECT_LE(read_ranges[2].length, total_body + footer_length);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

Result<std::unique_ptr<Message>> ReadMessage(const int64_t offset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code duplication with the previous ReadMessage overload above (the one that takes a FieldsLoaderFunction). Can we refactor using a common helper so that most of the duplication goes away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored both the overloads into a single ReadMessageInternal that takes an optional body_length.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 16, 2026
@abhishek593 abhishek593 force-pushed the feature/ipc-read-message-optimization branch from 10990b2 to cc7dd86 Compare February 16, 2026 17:00
@abhishek593 abhishek593 requested a review from pitrou February 16, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants