Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions declarative-api-explainer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# WebMCP declarative API

See discussion in https://github.com/webmachinelearning/webmcp/issues/22 that led to the creation of
this proposal.

## Motivation

WebMCP lets developers expose intricate functionality backed by a website's JavaScript functions to
an agent as "tools", effectively turning the site into an "MCP server". Agents can see the list of
tools a site offers paired with natural language descriptions of what the tools do, and invoke them
with structured data.

With WebMCP, agents can perform complex actions like booking a flight or reserving a table by
hooking into a site's own code designed to perform those actions, instead of the agent having to
figure it out manually through a brittle series of screen shots, scrolls, and out-of-date screen
reads.

However, not all site functionality is exposed via JavaScript functions, and features that *are*
take some effort to rewrite with an agent invoker in mind. Much of a site's functionality is
provided via semantic HTML elements like `<form>`, and its various inputs. To **make it easier** for
developers to expose this kind of site functionality while still using thte semantic web, we
propose:

1. New attributes that augment `<form>`s and [form-associated
elements](https://html.spec.whatwg.org/#form-associated-element), that expose these as WebMCP
tools to agents.
2. Algorithms that deterministically "compile" a form and its associated inputs down to a WebMCP
"input schema", so that the agent knows how to fill out the form and submit it.
3. Two ways of getting a form response back to the agent that invoked the form tool:
1. `SubmitEvent#respondWith()`, which lets JavaScript on the page override the default form
action, and pipe a response back to the agent without navigating the page.
2. Extracting `<script type="application/json-ld">` tags on the page that the form navigated to,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some time to try out the JSON-LD cross document ergonomics this week and I'm leaning towards declarative tools not concerning themselves with cross-document outputs at all. I'd be curious to hear others' thoughts on this. The cross-document concern feels a bit self-inflicted and I found myself bypassing it in favor of read tools on the navigated-to page that the model can call on follow ups.

The model just needs to know: (a) did I fill the form correctly, if not what are the validation errors? (this does not cause a navigation) and (b) did it submit successfully
(this could just be a generic message). From there it can follow up with read tools on the destination page to verify context.

This would let us drop concerns around output schema for declarative cross document tools as well.

+1 to @bwalderman's <context> idea. That or something like it feels like the right primitive here rather than a cross-document output API bolted onto forms. This sort of thing would be useful in general rather than as a specific solution for form submissions
that cause navigations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have concerns about <script type="application/json-ld">. The <script> tag isn't exactly intended for this purpose to begin with, and it's being used only to solve for cross-document form submission. That's just one specific case of a more general problem: What happens when an agent needs to initialize or catch-up state, either because it started browsing a new page, or it used a tool (imperative or declarative) that navigated. Read tools are important for the agent to get this initial state after navigation, so I'm leaning towards figuring out the declarative approach for read tools, and then letting this be the solution for cross-document form submission. This is the reason why I brought the <context> tag from the VOIX framework to everyone's attention. We'll want something like it anyway for the declarative approach to be considered complete, and it also happens to be a reasonable solution for cross-document form submission.

It may not end up being called <context>. It would probably be <resource> to keep aligned with MCP, but the semantics would be the same.

and using that structured data as a response to the form.

## Form attributes

```html
<form
toolname="Search flights"
tooldescription="This form searches flights and displays [...]"
toolautosubmit>
```

The `toolname` attribute is analogous to the imperative API's
[`ModelContextTool#name`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-name),
while `tooldescription` is analogous to
[`ModelContextTool#description`](https://webmachinelearning.github.io/webmcp/#dom-modelcontexttool-description).

The `toolautosubmit` [boolean attribute](https://html.spec.whatwg.org/C#boolean-attribute), lets the
agent submit the form on the user's behalf after filling it out, without requiring the user to check
it manually before submitting. If this attribute is missing when the agent finishes filling out the
form, the browser brings the submit button into focus, and the agent should then tell the user to
check the form contents, and submit it manually.

When forms with these attributes are inserted, removed, or these attributes are updated, the form
creates a new declarative WebMCP tool whose input schema is generated according to
[Input schema synthesis](#input-schema-synthesis).

### Name and description

The [`name`](https://html.spec.whatwg.org/C#attr-fe-name) attribute on form control elements
supplies the name of each "property" in the input schema generated for a declarative tool.

Since there's no pre-existing description attribute we can use, we introduce the
`toolparamdescription` attribute for form control elements, which contributes the
[description](https://json-schema.org/draft/2020-12/json-schema-validation#name-title-and-description)
of each "property" in the input schema generated for a declarative tool.

With this, the following imperative structure:

```js
window.navigator.modelContext.registerTool({
name: "search-cars",
description: "Perform a car make/model search",
inputSchema: {
type: "object",
properties: {
make: { type: "string", description: "The vehicle's make (e.g., BMW, Ford)" },
model: { type: "string", description: "The vehicle's model (e.g., 330i, F-150)" },
},
required: ["make", "model"]
},
execute({make, model}, agent) { ... }
});
```

... is equivalent to the following declarative form:

```html
<form toolname="search-cars" tooldescription="Perform a car make/model search" [...]>
<input type=text name="make" toolparamdescription="The vehicle's make (i.e., BMW, Ford)" required>
<input type=text name="model" toolparamdescription="The vehicle's model (i.e., 330i, F-150)" required>
<button type=submit>Search</button>
</form>
```

## Processing model

### Changes to form reset

When a form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) **OR** its tool
declaration changes (as a result of `toolname` attribute modifications, for example), then any
in-flight invocation of the tool will be cancelled, and the agent will be notified of this
cancellation.

### Input schema synthesis

TODO: The exact algorithms reducing a form, its form-associated elements, and *their* attributes
like [`step`](https://html.spec.whatwg.org/C#the-step-attribute) and
[`min`](https://html.spec.whatwg.org/C#attr-input-min) is TBD. We need to concretely specify how
various form-associated elements like `<input>` and `<select>` reduce to a JSON Schema that includes
`anyOf`, `oneOf`, and `maximum`/`mininum` declarations.

Chromium is implementing a loose version of this and will conduct testing/trials to see if what
we've come up with should be supported by the community as a general approach.

### Getting the form response to the agent

When a form element performs a navigation, the first `<script type=application/ld+json>` tag on the
target page is used as the cross-document tool's "response" that gets sent to the model.

When no such a tag is present, probably we'll decide that the page's entire contents is sent to the
Copy link

@MiguelsPizza MiguelsPizza Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on defaulting to a simple "form submitted successfully" message as the tool response? The signal to noise ratio of raw HTML can be huge and I'd argue it's better not to send it at all rather than send all of it.

"Accurate semantic representation" and "useful to the model" aren't the same thing. A content-heavy result page could easily blow out context in a way that's hard to predict or debug.

model as the response, since that's an accurate semantic representation of the result of the tool.
However, this is technically TBD at the moment.

When the form element does *NOT* perform a navigation, JavaScript can hand-craft the response to the
agent via the `SubmitEvent#respondWith()` method described below.

### Pseudo-classes

Authors might want a way to bring to the user's attention or otherwise highlight a declarative
WebMCP form that was filled out by the agent, and is waiting for the user to check the form and
submit it. (This is essentially only relevant for forms without the `toolautosubmit` attribute). To
support this, we introduce the CSS pseudo-classes `:tool-form-active` and `:tool-submit-active`.

The `:tool-form-active` pseudo-class matches `<form>` elements whose declarative tool is "running".
The exact definition of this will be clarified in the specification, but in short, a declarative
tool is considered "running" starting when the form is being filled out with agent output, until one
of the following:

- The form is [reset](https://html.spec.whatwg.org/C#concept-form-reset) or removed from the DOM
- The Promise returned from `SubmitEvent#respondWith()` resolves with a tool output
- The form's `toolname` or `tooldescription` attributes are modified, added, or removed
- The form is automatically submitted with the agent output, due to the `toolautosubmit` attribute

The `:tool-submit-active` pseudo-class matches the submit button of a `:tool-form-active` form
element.

### Events

**Additions to `SubmitEvent`**

The `SubmitEvent` interface gets two new members, `agentInvoked` to let `submit` event handler react
to agent-invoked form submissions, and the `respondWith()` method.

This method takes a `Promise<any>` that resolves to the response that the agent will consume. This
method is used to override the default behavior of the form submission; the form's `action` will NOT
navigate, and the `preventDefault()` must be called before this method is called.

```js
[Exposed=Window]
interface SubmitEvent : Event {
// ...
readonly attribute boolean agentInvoked;
undefined respondWith(Promise<any> agentResponse);
};
```

**`toolactivated` and `toolcanceled` events

We introduce these events that get fired at the `Window` object when a WebMCP tool is run, and when
its invocation is canceled.

The `toolactivated` event gives the developer a hook to perform any actions, such as bringing the
form to the user's attention, once a declarative tool is filled out but before it is submitted.
(This presumes the absence of the `toolautosubmit` attribute). This event can be seen as the
JavaScript equivalent of the [`:tool-form-active` pseudo-class](#pseudo-classes).


Some open questions:

> [!WARNING]
> Should these events fire for imperative tool call invocations as well? Chromium
> [seems to do
> that](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/script_tools/model_context.cc;l=265-274;drc=2af6413cf36d701fdaffb09188f2ab2a5be37f4f).
> [!WARNING]
> For declarative, should they be fired at `Window` or at the `<form>` that registered the tool in
> the first place, and bubble up to the document that way? See
> https://github.com/webmachinelearning/webmcp/issues/126.
## Integration with other imperative API bits

It's an open question as to whether [an
`outputSchema`](https://github.com/webmachinelearning/webmcp/issues/9) makes sense for declarative
WebMCP tools, and therefore if the `agentResponse` Promise passed to `SubmitEvent#respondWith()`
must resolve to an object conforming to such schema.

It is TBD how *declarative* WebMCP tools will be exposed to any interface that exposes a site's
tools to JavaScript. See https://github.com/webmachinelearning/webmcp/issues/51 for context. Should
a declarative WebMCP tool be able to be invoked from such an interface, should it exist in the
future? Almost certainly, yes. But details are TBD.