Skip to content

RESOURCE_EXHAUSTED (429) errors when triggering ADK agents. #4323

@enesdemirag

Description

@enesdemirag

RESOURCE_EXHAUSTED (429) errors when triggering ADK agents concurrently via Vertex AI Reasoning Engine


Issue Description

Describe the Bug:
When triggering an ADK agent multiple times in quick succession, the request fails with a streaming error that ultimately resolves to a 429 RESOURCE_EXHAUSTED error from Vertex AI. The error is surfaced by ADK as a 500 during response streaming.

Observed error:

{
  "error": "500: An error occurred while streaming the response: 429 Too Many Requests." 
  "details": {
    "message": "Resource exhausted. Please try again later.",
    "status": "RESOURCE_EXHAUSTED"
  }
}

The error message points to ADK and Vertex AI 429 documentation but it’s unclear where the actual bottleneck is and how it should be handled when using ADK in production.


Steps to Reproduce:

  1. Deploy a backend service on Google Cloud Run.
  2. Use Google ADK with Vertex AI Reasoning Engine (us-central1).
  3. Trigger the same agent multiple times in a short time window (≈ concurrent or burst traffic).
  4. Observe 429 RESOURCE_EXHAUSTED errors surfaced as streaming failures.

Expected Behavior:
Requests may slow down or queue, but should not fail with a hard error during streaming. Ideally, retries or backoff would be handled gracefully.


Observed Behavior:
Requests fail with 429 RESOURCE_EXHAUSTED, wrapped as a 500 streaming error by ADK.


Environment Details:

  • ADK Library Version: latest
  • Python Version: 3.12

Model Information:

  • Are you using LiteLLM: No
  • Models used: gemini-2.5-pro, gemini-2.5-flash
  • Gemini tier: Tier 2
  • Reasoning Engine location: us-central1

❓ Questions / Clarification Needed

  1. Is this error strictly caused by the Vertex AI quota below?

    Query Reasoning Engine requests per minute per region = 30
    
  2. Will increasing this quota fully resolve the issue, or are there additional ADK-level or Reasoning Engine concurrency limits/bottlenecks?

  3. Does ADK provide any built-in retry, backoff, or queueing mechanism for 429 RESOURCE_EXHAUSTED errors?

  4. Are there recommended production patterns when using ADK + Reasoning Engine behind Cloud Run?

  5. Is it possible to self host the reasoning engine locally inside my server and use ADK, so that I only need to worry about the Gemini LLM Request quotas?

I’m planning to launch this service soon and want to ensure the setup is production-safe under burst traffic.

Metadata

Metadata

Assignees

Labels

core[Component] This issue is related to the core interface and implementationrequest clarification[Status] The maintainer need clarification or more information from the authorstale[Status] Issues which have been marked inactive since there is no user response

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions