Skip to content

Comments

fix(rivetkit): fix sqlite teardown logic#4259

Closed
NathanFlurry wants to merge 1 commit intomainfrom
02-22-fix_rivetkit_fix_sqlite_teardown_logic
Closed

fix(rivetkit): fix sqlite teardown logic#4259
NathanFlurry wants to merge 1 commit intomainfrom
02-22-fix_rivetkit_fix_sqlite_teardown_logic

Conversation

@NathanFlurry
Copy link
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4259 February 22, 2026 20:59 Destroyed
@railway-app
Copy link

railway-app bot commented Feb 22, 2026

🚅 Deployed to the rivet-pr-4259 environment in rivet-frontend

Service Status Web Updated (UTC)
website ❌ Build Failed (View Logs) Web Feb 22, 2026 at 9:01 pm
mcp-hub ✅ Success (View Logs) Web Feb 22, 2026 at 9:00 pm
frontend-cloud ❌ Build Failed (View Logs) Web Feb 22, 2026 at 9:00 pm
frontend-inspector ❌ Build Failed (View Logs) Web Feb 22, 2026 at 8:59 pm
ladle ❌ Build Failed (View Logs) Web Feb 22, 2026 at 8:59 pm

Copy link
Member Author

NathanFlurry commented Feb 22, 2026


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Feb 22, 2026

PR Review: fix(rivetkit): fix sqlite teardown logic

This PR fixes a real correctness bug: onDisconnect was previously fire-and-forget, so the actor could sleep (and close the SQLite DB) before the callback finished writing to it. The overall approach is sound and the fix is well-structured.


Core Fix: connection-manager.ts

The key change is correct. Moving resetSleepTimer() from before the try block to inside the finally block means the sleep timer is only re-evaluated after the disconnect callback completes. The old code reset it immediately on disconnect, creating a race window where the DB could be torn down mid-callback.

pendingDisconnectCount implementation looks correct. Incremented before the try, decremented in finally, with a Math.max(0, ...) guard as a reasonable defensive measure. One minor note: the sleep check ordering in mod.ts:

for (const _conn of this.connectionManager.connections.values()) {
    return CanSleep.ActiveConns; // always returns on first conn
}
if (this.connectionManager.pendingDisconnectCount > 0) {
    return CanSleep.ActiveDisconnectCallbacks;
}

The ActiveDisconnectCallbacks guard is only reachable when there are zero active connections. This ordering is correct (disconnects can only be pending after the connection is already removed from the map), but the TODO comment on line 1532 about the commented-out isHibernatable check is still there — worth revisiting since it causes the loop to always return on the first connection regardless of hibernatability.


conn/mod.ts

Moving CONN_DRIVER_SYMBOL = undefined to finally is correct. Previously the symbol was cleared after connDisconnected without a guarantee of execution if the call threw. The new structure ensures it's always cleared. The added CONN_DRIVER_SYMBOL = undefined in the else branch for the "missing driver state" warning is also correct (it was already falsy, just being explicit now).


db/mod.ts + db/drizzle/mod.ts

The closed flag + mutex for close is correct. Wrapping close() in the mutex prevents a race where execute() could run concurrently with or after close(). The idempotent close (returning early if closed is already true) prevents double-close errors.

Minor: The ensureOpen() helper is called inside the mutex in execute(), and isClosed() is checked inside the mutex in the drizzle proxy callback — both are inside the mutex, so by the time they run the check is authoritative. This is correct.


sqlite-runtime.ts

configureSqliteRuntimeDatabase looks reasonable. busy_timeout = 5000 reduces transient lock failures during restarts. WAL mode for file-backed databases is a good default for concurrent read/write.

Minor: busy_timeout is set for all paths including ":memory:". In-memory databases can't be locked by other processes, so it's harmless but unnecessary for that case. Not worth changing.

Potential concern: PRAGMA journal_mode = WAL is applied universally to all file-backed databases. This is generally safe, but it creates WAL sidecar files (.db-wal, .db-shm) alongside the database. If anything downstream does a simple file copy of the .db file (e.g., for backup/snapshot), it would get an inconsistent state without the WAL file. Worth documenting or confirming this is acceptable.


Tests

The new test in actor-db.ts correctly validates the fix by checking that a 250ms-delayed disconnect insert completes before the actor sleeps. The test uses timing (SLEEP_WAIT_MS + 250 = 400ms), which is consistent with the existing sleep test pattern.

One issue with the test: It doesn't explicitly disconnect — it relies on the actor's implicit sleep mechanism to trigger disconnect. If the actor doesn't sleep within SLEEP_WAIT_MS (150ms), the test could pass trivially (the insert may not have happened yet, and getDisconnectInsertCount is called on a fresh connection that wakes the actor before disconnect fires). Consider making the disconnect explicit, or at least adding a comment explaining the timing assumption.

The sqlite-lock-repro.test.ts is a useful reproduction case documenting the underlying SQLite behavior being worked around. The it.skip fallback for Node < 22 is correct. One note: this test verifies the problem rather than the fix — it documents expected SQLite behavior (locks throw) rather than validating the WAL/busy_timeout mitigation works. That's fine as a regression-repro test.


Summary

The fix is correct and well-tested. The main things worth a follow-up:

  1. The TODO comment around isHibernatable in #canSleep() is still live — this means all non-hibernatable connections also prevent sleep today regardless of the new pendingDisconnectCount check. Intentional for now?
  2. The disconnect test relies on implicit timing to trigger onDisconnect. Consider an explicit disconnect or a comment explaining the assumption.
  3. WAL sidecar files — confirm snapshots/backups handle .db-wal/.db-shm correctly.

@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 65d3775 to 6840f4f Compare February 22, 2026 21:06
@jog1t jog1t mentioned this pull request Feb 22, 2026
11 tasks
@NathanFlurry NathanFlurry marked this pull request as ready for review February 23, 2026 06:40
@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 6840f4f to 63849b3 Compare February 23, 2026 18:22
@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 63849b3 to 6840f4f Compare February 23, 2026 18:34
@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 6840f4f to 81946f1 Compare February 23, 2026 18:35
@jog1t jog1t force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 81946f1 to 854e432 Compare February 23, 2026 23:02
@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 854e432 to 81946f1 Compare February 23, 2026 23:34
@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 81946f1 to cdb63b8 Compare February 24, 2026 02:39
@NathanFlurry NathanFlurry mentioned this pull request Feb 24, 2026
11 tasks
@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from cdb63b8 to 48bfc9a Compare February 24, 2026 02:57
@NathanFlurry NathanFlurry force-pushed the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch from 48bfc9a to 81946f1 Compare February 24, 2026 03:19
@graphite-app
Copy link
Contributor

graphite-app bot commented Feb 24, 2026

Merge activity

  • Feb 24, 3:21 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Feb 24, 3:22 AM UTC: CI is running for this pull request on a draft pull request (#4274) due to your merge queue CI optimization settings.
  • Feb 24, 3:23 AM UTC: Merged by the Graphite merge queue via draft PR: #4274.

graphite-app bot pushed a commit that referenced this pull request Feb 24, 2026
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
@graphite-app graphite-app bot closed this Feb 24, 2026
@graphite-app graphite-app bot deleted the 02-22-fix_rivetkit_fix_sqlite_teardown_logic branch February 24, 2026 03:23
Comment on lines +1 to +4
import { afterEach, describe, expect, it } from "vitest";
import { mkdtempSync, rmSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These import statements need to be sorted according to Biome's import sorting rules. Node.js built-in modules (fs, path, os) should typically come before other imports, and they should be alphabetically sorted.

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant