Skip to content

[BUG] Tool Calls Not Applied Until End of Generation (Causes Timeout with Large Responses) #11482

@billybasass

Description

@billybasass

Problem (one or two sentences)

When running Qwen3 Coder Next in LM Studio connected to Roo Code (VS Code extension), tool calls are not executed incrementally during generation. Instead, they appear to be applied only after the model finishes generating its entire response.

If the model produces a large response, this causes Roo Code to time out before the tool calls are applied.

This is not a long prompt processing issue — the problem occurs during response generation.

Context (who is affected and when)

Here’s a clean, professional GitHub issue you can copy and paste into the Roo Code repo:


Bug Report: Tool Calls Not Applied Until End of Generation (Causes Timeout with Large Responses)

Summary

When running Qwen3 Coder Next in LM Studio connected to Roo Code , tool calls are not executed incrementally during generation. Instead, they appear to be applied only after the model finishes generating its entire response.

If the model produces a large response, this causes Roo Code to time out before the tool calls are applied.

This is not a long prompt processing issue — the problem occurs during response generation.


Environment

  • Model: Qwen3 Coder Next
  • Backend: LM Studio
  • Client: Roo Code (VS Code extension)
  • Connection: LM Studio local server → Roo Code
  • Hardware: Strix Halo 128gb
  • OS: Windows

Important Clarification

This is not caused by:

  • Slow prompt processing
  • Initial inference delay
  • Hardware limitations

The issue happens specifically while the model is generating a long response.


Additional Notes

It appears Roo Code may be:

  • Buffering the entire response before parsing tool calls, or
  • Waiting for a full completion event before executing tools

If tool calls were processed incrementally from the stream, this timeout issue would likely not occur.


Potential Area to Investigate

  • Streaming tool call parsing
  • OpenAI-compatible streaming implementation
  • Function/tool call handling in streaming mode

If logs or additional diagnostics are needed, I can provide them.

Reproduction steps

Reproduction Steps

  1. Run LM Studio with Qwen3 Coder Next.

  2. Connect Roo Code to LM Studio.

  3. Trigger a coding task that produces:

    • A long generation
    • Multiple tool calls
  4. Observe that tool calls are not applied until the model finishes generating.

  5. Large outputs result in timeout before tool execution.

Expected result

Expected Behavior * Tool calls should be detected and executed as soon as they are generated (streaming tool execution). * Roo Code should not wait for the full model completion before applying tool calls. * Large responses should not cause timeouts if tool calls are already available in the stream.

Actual result

Actual Behavior * Model begins generating a response normally. * Tool calls appear in the output stream. * Roo Code does not apply the tool calls immediately. * It waits until the model completes generation. * For large responses, this results in a timeout before tool execution occurs.

Variations tried (optional)

No response

App Version

3.47.3

API Provider (optional)

None

Model Used (optional)

Qwen3 Coder Next

Roo Code Task Links (optional)

No response

Relevant logs or errors (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions