Skip to content

Latest commit

 

History

History
128 lines (93 loc) · 4.9 KB

File metadata and controls

128 lines (93 loc) · 4.9 KB
name html-in-the-loop
description Use the local html-in-the-loop MCP runtime to collect structured browser choices or form inputs, then continue the actual reasoning, generation, or business processing in the agent.

HTML in the Loop

Use this skill when the user wants an interactive browser step whose result becomes structured input for the agent.

Core Rule

HTML is only the input surface. It should collect user choices, form fields, clicks, typed text, or simple UI state, then emit a structured event back to the agent.

Do not put business processing in the HTML. The agent must do the reasoning, writing, scoring, summarizing, recommendations, transformations, or final answer after receiving the event from wait_for_interaction.

Allowed in HTML:

  • Render options, forms, controls, contenteditable regions, and UI-only state.
  • Validate basic UI requirements such as "at least one option selected".
  • Count selected items or mirror selected labels for user feedback.
  • Emit the raw structured selection through AgentBridge.emit(...), data-agent-event, automatic field changes, or form submission.
  • Show a simple submitted/received confirmation.

Not allowed in HTML unless the user explicitly asks for a pure frontend demo:

  • Generate final explanations, summaries, recommendations, or plans.
  • Rank, score, classify, or transform the user's choices as the final task.
  • Call remote APIs or hide business logic in client-side JavaScript.
  • Render the final agent answer before the agent has processed the event.

Workflow

  1. Call create_session with a concise title, description, and optional event schema.
  2. Generate self-contained HTML whose job is only to collect input.
  3. Call render_html with that HTML.
  4. Send the returned session URL to the user in the chat, using the exact full URL from the tool result.
  5. Call wait_for_interaction for the expected event type.
  6. Treat the returned event payload as the next user input.
  7. Produce the final result in the agent response, or only render a second HTML result page if the user explicitly wants the agent-computed output shown in the browser.

URL Handoff Requirement

After render_html returns, the browser page is ready but the user cannot interact unless the agent sends the URL. Always include the exact returned URL in the next assistant message before waiting for interaction.

Recommended handoff:

请打开这个页面完成选择:<returned-url>
我会等待你的提交,然后继续处理。

Do not only say "open the page", "interact with the browser", or "use the returned URL" without including the actual URL.

Event Contract

Prefer explicit event names that describe the input, not the final processing:

  • fruit_selection
  • prd_direction_selected
  • preference_submitted
  • items_selected

Manual emit example:

<button onclick="AgentBridge.emit('fruit_selection', { selected: selectedFruits() }, { ui: 'fruit-list' })">
  Submit selection
</button>

Form example. Forms emit automatically on submit; without data-agent-event, the event type is form_submit:

<form data-agent-event="preference_submitted">
  <input name="audience" value="developers">
  <button type="submit">Submit</button>
</form>

Standalone input example. Text inputs, textareas, selects, checkbox/radio/date/color/range/file controls, and contenteditable regions emit committed changes automatically:

<input name="topic" placeholder="Topic">
<textarea name="notes"></textarea>
<select name="tone">
  <option>concise</option>
  <option>deep</option>
</select>
<div contenteditable data-agent-name="freeform_notes"></div>

Default field event types are field_change and content_edit. Use data-agent-input-event="custom_type" to override a field-level event type. Put data-agent-ignore on a control or parent subtree when an input should not be returned to the agent.

The event payload should be raw and reusable:

{
  "selected": [
    {
      "id": "banana",
      "name": "香蕉",
      "flavor": "绵密香甜",
      "note": "适合快速补充能量"
    }
  ],
  "count": 1
}

After receiving this event, the agent writes the explanation or performs the requested processing.

Fruit Selection Pattern

For a request like "生成一个水果 list,然后根据勾选的水果生成说明":

  1. HTML renders a fruit checklist and submit button.
  2. HTML emits fruit_selection with the selected fruit objects.
  3. HTML shows only "已提交选择,请回到 Codex 查看结果。"
  4. The agent reads the fruit_selection payload and generates the fruit explanation in chat.

Do not make the fruit page compute or render the explanation itself.

Safety

  • Keep the runtime local through 127.0.0.1.
  • Do not put secrets into generated HTML.
  • Keep event payloads small, explicit, and JSON-serializable.
  • Prefer stable ids plus human labels in payloads so the agent can reason from the result without scraping the HTML.