-
Notifications
You must be signed in to change notification settings - Fork 247
Ledger Chunk download API #7550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Eddy Ashton <edashton@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new API for downloading committed ledger chunks, designed to enable ledger backup processes that don't have direct access to ledger storage directories. The implementation adds HTTP endpoints /node/ledger-chunk and /node/ledger-chunk/{chunk_name} with intelligent redirection logic to handle scenarios where nodes may not have all historical chunks (e.g., when started from a snapshot).
Changes:
- New ledger chunk download API with HEAD/GET endpoints supporting byte-range requests
- Serialization of ledger access using mutexes to ensure thread-safe concurrent access
- fsync of committed chunks before closing to ensure file integrity for remote serving
- New
ReadLedgerSubsysteminterface for accessing ledger metadata - Comprehensive end-to-end tests covering chunk access, redirection, and gap scenarios
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/schema.py | Added test registration for ledger chunk download |
| tests/infra/remote.py | Added helper to get main ledger directory |
| tests/infra/node.py | Added version check for LedgerChunkRead feature and helper method |
| tests/infra/interfaces.py | Enabled LedgerChunkRead operator feature on file serving interface |
| tests/e2e_operations.py | Added comprehensive tests for chunk access, redirection scenarios, and gap handling |
| src/node/rpc/node_frontend.h | Updated API version to 5.0.1 |
| src/node/rpc/ledger_subsystem.h | New subsystem implementation for ledger read operations |
| src/node/rpc/ledger_interface.h | New interface definition for ledger subsystem |
| src/node/rpc/file_serving_handlers.h | Implemented ledger chunk endpoints with redirection logic and byte-range support |
| src/host/test/ledger.cpp | Updated ledger test capture to include file sizes |
| src/host/run.cpp | Added ledger parameter to enclave creation |
| src/host/ledger.h | Added state_lock for thread safety, fsync for committed files, new query methods |
| src/enclave/main.cpp | Updated to pass ledger reference to enclave |
| src/enclave/entry_points.h | Updated signature to accept ledger reference |
| src/enclave/enclave.h | Installed ReadLedgerSubsystem in node context |
| include/ccf/http_consts.h | Added CCF_LEDGER_CHUNK_NAME header constant |
| doc/schemas/node_openapi.json | Added OpenAPI schema for new endpoints, updated version |
eddyashton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The consolidation of helper functions in file_serving_handlers.h is great. I think we could go slightly further and factor out the ?since= param lookup, and possibly standardise the pattern for subsystem lookup, but these can be deferred to a future PR. I'm a little suspicious of the concept of init_idx, but it looks like its populated and tested correctly so I'm happy.
I had a go, but it's mandatory in one case and optional in the other, and so it's a little awkward. I'll try again. |
Co-authored-by: Eddy Ashton <edashton@microsoft.com>
I am also unhappy with that because LedgerFiles clearly has too much state already, and this is more state. But on the other hand, this is the right cutoff, and there is no other obvious way to get it. |
Adding an API aiming to allow a ledger backup process that does not have access to the ledger storage directories to efficiently fetch committed ledger chunks for archival/retention purposes.
Typical Scenario
Alternative Scenario
The initial node (Primary in this case) that client hits has started from a snapshot, and does not have some past chunks. To make this more readable, let's say that Primary started from
snapshot_100.committedand locally has:Backup has:
/ledger-chunk?since=indexneeds to redirect to the next node in a stable order when a file is not found and index is strictly less than locally known start index (i.e. the startup snapshot version)