Handle Abrupt Server Disconnect in Middle of Tool Call#1321
Closed
EiffL wants to merge 5 commits intomodelcontextprotocol:mainfrom
Closed
Handle Abrupt Server Disconnect in Middle of Tool Call#1321EiffL wants to merge 5 commits intomodelcontextprotocol:mainfrom
EiffL wants to merge 5 commits intomodelcontextprotocol:mainfrom
Conversation
fix: handle transport exceptions and ensure proper cleanup of in-flig…
felixweinberger
requested changes
Sep 23, 2025
| try: | ||
| await ctx.read_stream_writer.send(e) | ||
| finally: | ||
| await ctx.read_stream_writer.aclose() |
Contributor
There was a problem hiding this comment.
Closing on any type of exception seems to broad - we should be more explicit about the failure cases that definitely mean the session is unusable, there might be recoverable errors.
Comment on lines
+339
to
+353
| # Transport-level exception. Forward it to the incoming | ||
| # handler for logging/observation, then fail all | ||
| # in-flight requests so callers don't hang forever. | ||
| await self._handle_incoming(message) | ||
| error = ErrorData(code=CONNECTION_CLOSED, message=str(message)) | ||
| # Send error to any pending request response streams immediately | ||
| for id, stream in list(self._response_streams.items()): | ||
| try: | ||
| await stream.send(JSONRPCError(jsonrpc="2.0", id=id, error=error)) | ||
| await stream.aclose() | ||
| except Exception: | ||
| pass | ||
| self._response_streams.clear() | ||
| # Break out of the receive loop; connection is no longer usable. | ||
| break |
Contributor
There was a problem hiding this comment.
This seems like a duplication of all the logic we have at the bottom of _receive_loop (L440-450) (except for the break)
The changes here also don't seem necessary to make the tests you added pass - do we need this?
Contributor
|
Closing this for inactivity as we're working through our backlog at the moment to reduce our review queue size - feel free to reopen if you plan to come back to this at a later time @EiffL |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Propagates transport disconnects (e.g. server killed mid tool call) as CONNECTION_CLOSED errors instead of hanging indefinitely. Adds regression test.
Motivation and Context
A call_tool over the StreamableHTTP transport would hang forever if the server closed the SSE/HTTP connection mid-response (e.g. process crash/kill). The underlying RemoteProtocolError was logged but in‑flight requests never completed. This change converts that low-level failure into a deterministic JSON-RPC error for every pending request, matching expected resilience semantics.
How Has This Been Tested?
Breaking Changes
None. Behavior only changes for abnormal transport termination; normal flows unaffected. No public API changes.
Types of changes
Checklist
Additional context
Implementation details:
In _handle_sse_response (client transport) on exception: send the exception into the session read stream then close the writer to signal termination.
In BaseSession._receive_loop: on receiving an Exception, broadcast a synthesized JSONRPCError (code CONNECTION_CLOSED) to all pending request streams and clear them.
Integrated test placed alongside existing StreamableHTTP tests for consistency. Potential follow-up: introduce optional reconnection/resumption logic for recoverable disconnects.