Skip to content

MES-719: Local Caching#38

Merged
markovejnovic merged 33 commits intomainfrom
marko/mes-719-local-caching-pt-2
Feb 17, 2026
Merged

MES-719: Local Caching#38
markovejnovic merged 33 commits intomainfrom
marko/mes-719-local-caching-pt-2

Conversation

@markovejnovic
Copy link
Collaborator

No description provided.

@mesa-dot-dev
Copy link

mesa-dot-dev bot commented Feb 13, 2026

Mesa Description

This PR introduces a local, file-based caching layer to improve file read performance and reduce network traffic to the Mesa API.

Key Changes

  • New Caching Library (lib/cache):

    • A new, reusable cache library was created to house all caching logic.
    • Introduced FileCache, a thread-safe, asynchronous file-based cache for storing arbitrary byte data.
    • Implemented an asynchronous LRU (Least Recently Used) eviction strategy to manage the cache's size on disk, ensuring it doesn't grow indefinitely.
  • Filesystem Integration:

    • The RepoFs module, which handles file system operations for individual repositories, is now equipped with the FileCache.
    • The read operation was updated to follow a cache-aside pattern: it first attempts to retrieve file content from the local cache. On a cache miss, it fetches the content from the Mesa API and then stores it in the cache for subsequent requests.
  • Configuration and Testing:

    • Cache behavior is now configurable, with settings passed down from the main daemon configuration.
    • Comprehensive correctness and concurrency tests were added for both the FileCache implementation and the underlying LRU eviction logic to ensure reliability under load.

Description generated by Mesa. Update settings

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of d6db315...26ecc80

Analysis

  1. The caching library relies on nightly-only Rust features but doesn't enable all necessary unstable gates, making it incompilable on both stable and nightly.

  2. The FileCache implementation always wipes the target directory on initialization, preventing cache persistence across restarts and undermining the purpose of a file-backed cache.

  3. Eviction operations mix synchronous filesystem calls in async workers without proper blocking isolation, risking runtime stalls during large batch evictions.

  4. The architecture splits caching into a separate crate, but the integration has inconsistencies that may impact production usability.

Tip

Help

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

0 files reviewed | 1 comments | Edit Agent SettingsRead Docs

The upsert_async + LRU notification sequence was not atomic: concurrent
inserts or evictions could interleave between the HashMap mutation and
the LRU tracker message, causing stale DeleterCtx{fid} values that make
remove_if_async silently fail and leave entries permanently un-evictable.

Fix by switching to entry_async (holds the bucket lock while allocating a
global monotonic version), then using a sync try_send-based upsert() for
the LRU notification so it cannot be lost to task cancellation.

Key changes:
- Add Versioned trait to lru.rs for version-based message deduplication
- Replace Message::Inserted with Message::Upserted; worker deduplicates
  by comparing versions and gracefully handles missing keys on Accessed
- Replace async insert() with sync upsert() using try_send + spawn
  fallback, making the LRU notification non-cancellable
- FileCache::insert uses entry_async to atomically read old entry and
  allocate version under the bucket lock; all post-lock operations
  (guard defuse, LRU upsert, size accounting) are synchronous
- Add check-cfg for loom and wire up the sync shim module to enable
  future loom-based concurrency testing
@markovejnovic markovejnovic force-pushed the marko/mes-719-local-caching-pt-2 branch from 32a76ff to c741f6c Compare February 16, 2026 04:07
- DeletionIndicator::have_pending_work now checks != 0 (was >= 1<<32)
  so the eviction loop waits for in-flight deletions, not just pending
  batches, preventing cascading over-eviction.  A DeletionGuard drop
  guard on spawned deletion tasks ensures observe_deletion() fires even
  on cancellation or panic.

- Size accounting moved inside the scc bucket lock in insert() to
  prevent transient AtomicUsize underflow when concurrent inserts to the
  same key interleave their deltas.

- access() changed from async send().await (cancellable, panics on
  channel close) to sync try_send + tokio::spawn fallback, matching the
  existing upsert() pattern.  Removes the unreachable!() panic path
  during runtime shutdown.
@markovejnovic markovejnovic force-pushed the marko/mes-719-local-caching-pt-2 branch from f5d78d6 to abfdc5e Compare February 16, 2026 04:44
… false test, increase sleep durations

- Fix concurrent_inserts_with_eviction: add sequential warmup so LRU
  worker has eviction candidates before concurrent inserts trigger
  eviction (fixes flakiness where all entries survived)
- Add try_cull_returns_false_when_channel_full test
- Increase sleep durations from 50ms to 100ms in LRU tests
- Tighten concurrent_cull_requests lower bound assertion
The Dockerfile only copied src/ but the library crate lives at lib/lib.rs,
causing the Docker build to fail with "couldn't read lib/lib.rs".
@markovejnovic markovejnovic force-pushed the marko/mes-719-local-caching-pt-2 branch from 9a11b90 to 4b16c59 Compare February 16, 2026 09:18
@markovejnovic markovejnovic merged commit 946cf37 into main Feb 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant