Skip to content

fix(mcp): add timeout to background thread join in stop() to prevent hangs#1792

Open
giulio-leone wants to merge 3 commits intostrands-agents:mainfrom
giulio-leone:fix/mcp-client-stop-join-timeout
Open

fix(mcp): add timeout to background thread join in stop() to prevent hangs#1792
giulio-leone wants to merge 3 commits intostrands-agents:mainfrom
giulio-leone:fix/mcp-client-stop-join-timeout

Conversation

@giulio-leone
Copy link

Summary

When an Agent holding an MCPClient is created inside a function and goes out of scope, the Agent.__del__ finalizer calls MCPClient.stop() which invokes _background_thread.join(). If the background thread cannot exit promptly (e.g. transport subprocess teardown is slow), join() blocks indefinitely, causing the entire process to hang on exit.

Changes

  • Add a timeout (equal to startup_timeout, default 30s) to the join() call in stop()
  • If the thread does not exit within the timeout, log a warning and proceed with cleanup
  • The thread is already a daemon thread (daemon=True), so it will be cleaned up by the interpreter on process exit

Testing

  • Updated existing tests to verify join() is called with timeout
  • Added new test test_stop_does_not_hang_when_join_times_out covering the timeout path

Fixes #1732

…hangs

When an Agent holding an MCPClient goes out of scope inside a function,
the Agent.__del__ finalizer calls MCPClient.stop() which calls
_background_thread.join(). If the background thread cannot exit promptly
(e.g. transport subprocess teardown is slow), join() blocks indefinitely,
causing the entire process to hang on exit.

Add a timeout (equal to startup_timeout, default 30s) to the join() call.
If the thread does not exit within the timeout, log a warning and proceed
with cleanup. The thread is already a daemon thread (daemon=True), so it
will be cleaned up by the interpreter on process exit.

Fixes strands-agents#1732

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 1, 2026 01:53
@github-actions github-actions bot added the size/s label Mar 1, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prevents process hangs on interpreter shutdown by ensuring MCPClient.stop() doesn’t block indefinitely when joining the background thread (notably when called from Agent.__del__).

Changes:

  • Add a timeout (based on startup_timeout) to _background_thread.join() in MCPClient.stop()
  • Log a warning when the join times out and continue cleanup
  • Update/add tests to assert join(timeout=...) is used and exercise the timeout path

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/strands/tools/mcp/mcp_client.py Adds join(timeout=...) + warning when the background thread doesn’t exit promptly.
tests/strands/tools/mcp/test_mcp_client.py Updates existing stop() tests for the new join timeout behavior and adds a new timeout-path test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 555 to 559
# join should have been called with timeout
mock_thread.join.assert_called_once_with(timeout=client._startup_timeout)
# Cleanup should still proceed even though thread is alive
assert client._background_thread is None
assert client._background_thread_event_loop is None
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_stop_does_not_hang_when_join_times_out asserts that cleanup proceeds (event loop close + clearing fields) even when mock_thread.is_alive() remains True. In real execution, a still-alive thread likely means the asyncio loop is still running, and calling loop.close() from the main thread would raise RuntimeError: Cannot close a running event loop. Consider updating the test to assert stop() returns and logs a warning, while not closing the loop / not resetting state until the thread has actually exited (or otherwise ensuring a thread-safe shutdown path).

Suggested change
# join should have been called with timeout
mock_thread.join.assert_called_once_with(timeout=client._startup_timeout)
# Cleanup should still proceed even though thread is alive
assert client._background_thread is None
assert client._background_thread_event_loop is None
# join should have been called with timeout, and stop() should return promptly
mock_thread.join.assert_called_once_with(timeout=client._startup_timeout)

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread cleanup design follows Python's standard daemon thread pattern — threads are marked as daemon so they don't prevent process exit, and cleanup happens via atexit handlers.

…imeout

When join() times out and the thread is still alive, return early
without closing the event loop or resetting internal state. This
prevents RuntimeError from closing a running loop and avoids
allowing a second start() while teardown is still in progress.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giulio-leone
Copy link
Author

All CI checks pass. Ready for review.

Expand the docstring to explicitly describe the expected behavior:
skip event loop close and state reset when thread is still alive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Managed MCPClient integration hangs on exit

2 participants