Conversation
Mesa DescriptionThis PR introduces a local, file-based caching layer to improve file read performance and reduce network traffic to the Mesa API. Key Changes
Description generated by Mesa. Update settings |
There was a problem hiding this comment.
Performed full review of d6db315...26ecc80
Analysis
-
The caching library relies on nightly-only Rust features but doesn't enable all necessary unstable gates, making it incompilable on both stable and nightly.
-
The FileCache implementation always wipes the target directory on initialization, preventing cache persistence across restarts and undermining the purpose of a file-backed cache.
-
Eviction operations mix synchronous filesystem calls in async workers without proper blocking isolation, risking runtime stalls during large batch evictions.
-
The architecture splits caching into a separate crate, but the integration has inconsistencies that may impact production usability.
Tip
Help
Slash Commands:
/review- Request a full code review/review latest- Review only changes since the last review/describe- Generate PR description. This will update the PR body or issue comment depending on your configuration/help- Get help with Mesa commands and configuration options
0 files reviewed | 1 comments | Edit Agent Settings • Read Docs
The upsert_async + LRU notification sequence was not atomic: concurrent
inserts or evictions could interleave between the HashMap mutation and
the LRU tracker message, causing stale DeleterCtx{fid} values that make
remove_if_async silently fail and leave entries permanently un-evictable.
Fix by switching to entry_async (holds the bucket lock while allocating a
global monotonic version), then using a sync try_send-based upsert() for
the LRU notification so it cannot be lost to task cancellation.
Key changes:
- Add Versioned trait to lru.rs for version-based message deduplication
- Replace Message::Inserted with Message::Upserted; worker deduplicates
by comparing versions and gracefully handles missing keys on Accessed
- Replace async insert() with sync upsert() using try_send + spawn
fallback, making the LRU notification non-cancellable
- FileCache::insert uses entry_async to atomically read old entry and
allocate version under the bucket lock; all post-lock operations
(guard defuse, LRU upsert, size accounting) are synchronous
- Add check-cfg for loom and wire up the sync shim module to enable
future loom-based concurrency testing
32a76ff to
c741f6c
Compare
- DeletionIndicator::have_pending_work now checks != 0 (was >= 1<<32) so the eviction loop waits for in-flight deletions, not just pending batches, preventing cascading over-eviction. A DeletionGuard drop guard on spawned deletion tasks ensures observe_deletion() fires even on cancellation or panic. - Size accounting moved inside the scc bucket lock in insert() to prevent transient AtomicUsize underflow when concurrent inserts to the same key interleave their deltas. - access() changed from async send().await (cancellable, panics on channel close) to sync try_send + tokio::spawn fallback, matching the existing upsert() pattern. Removes the unreachable!() panic path during runtime shutdown.
f5d78d6 to
abfdc5e
Compare
The unified ReadableCache/WritableCache traits and Sync* variants had no consumers outside traits.rs itself. Remove them along with the negative impls that required nightly, keeping only AsyncReadableCache and AsyncWritableCache which FileCache actually implements.
… false test, increase sleep durations - Fix concurrent_inserts_with_eviction: add sequential warmup so LRU worker has eviction candidates before concurrent inserts trigger eviction (fixes flakiness where all entries survived) - Add try_cull_returns_false_when_channel_full test - Increase sleep durations from 50ms to 100ms in LRU tests - Tighten concurrent_cull_requests lower bound assertion
The Dockerfile only copied src/ but the library crate lives at lib/lib.rs, causing the Docker build to fail with "couldn't read lib/lib.rs".
9a11b90 to
4b16c59
Compare
No description provided.