| name | html-in-the-loop |
|---|---|
| description | Use the local html-in-the-loop MCP runtime to collect structured browser choices or form inputs, then continue the actual reasoning, generation, or business processing in the agent. |
Use this skill when the user wants an interactive browser step whose result becomes structured input for the agent.
HTML is only the input surface. It should collect user choices, form fields, clicks, typed text, or simple UI state, then emit a structured event back to the agent.
Do not put business processing in the HTML. The agent must do the reasoning, writing, scoring, summarizing, recommendations, transformations, or final answer after receiving the event from wait_for_interaction.
Allowed in HTML:
- Render options, forms, controls, contenteditable regions, and UI-only state.
- Validate basic UI requirements such as "at least one option selected".
- Count selected items or mirror selected labels for user feedback.
- Emit the raw structured selection through
AgentBridge.emit(...),data-agent-event, automatic field changes, or form submission. - Show a simple submitted/received confirmation.
Not allowed in HTML unless the user explicitly asks for a pure frontend demo:
- Generate final explanations, summaries, recommendations, or plans.
- Rank, score, classify, or transform the user's choices as the final task.
- Call remote APIs or hide business logic in client-side JavaScript.
- Render the final agent answer before the agent has processed the event.
- Call
create_sessionwith a concise title, description, and optional event schema. - Generate self-contained HTML whose job is only to collect input.
- Call
render_htmlwith that HTML. - Send the returned session URL to the user in the chat, using the exact full URL from the tool result.
- Call
wait_for_interactionfor the expected event type. - Treat the returned event payload as the next user input.
- Produce the final result in the agent response, or only render a second HTML result page if the user explicitly wants the agent-computed output shown in the browser.
After render_html returns, the browser page is ready but the user cannot interact unless the agent sends the URL. Always include the exact returned URL in the next assistant message before waiting for interaction.
Recommended handoff:
请打开这个页面完成选择:<returned-url>
我会等待你的提交,然后继续处理。
Do not only say "open the page", "interact with the browser", or "use the returned URL" without including the actual URL.
Prefer explicit event names that describe the input, not the final processing:
fruit_selectionprd_direction_selectedpreference_submitteditems_selected
Manual emit example:
<button onclick="AgentBridge.emit('fruit_selection', { selected: selectedFruits() }, { ui: 'fruit-list' })">
Submit selection
</button>Form example. Forms emit automatically on submit; without data-agent-event, the event type is form_submit:
<form data-agent-event="preference_submitted">
<input name="audience" value="developers">
<button type="submit">Submit</button>
</form>Standalone input example. Text inputs, textareas, selects, checkbox/radio/date/color/range/file controls, and contenteditable regions emit committed changes automatically:
<input name="topic" placeholder="Topic">
<textarea name="notes"></textarea>
<select name="tone">
<option>concise</option>
<option>deep</option>
</select>
<div contenteditable data-agent-name="freeform_notes"></div>Default field event types are field_change and content_edit. Use data-agent-input-event="custom_type" to override a field-level event type. Put data-agent-ignore on a control or parent subtree when an input should not be returned to the agent.
The event payload should be raw and reusable:
{
"selected": [
{
"id": "banana",
"name": "香蕉",
"flavor": "绵密香甜",
"note": "适合快速补充能量"
}
],
"count": 1
}After receiving this event, the agent writes the explanation or performs the requested processing.
For a request like "生成一个水果 list,然后根据勾选的水果生成说明":
- HTML renders a fruit checklist and submit button.
- HTML emits
fruit_selectionwith the selected fruit objects. - HTML shows only "已提交选择,请回到 Codex 查看结果。"
- The agent reads the
fruit_selectionpayload and generates the fruit explanation in chat.
Do not make the fruit page compute or render the explanation itself.
- Keep the runtime local through
127.0.0.1. - Do not put secrets into generated HTML.
- Keep event payloads small, explicit, and JSON-serializable.
- Prefer stable ids plus human labels in payloads so the agent can reason from the result without scraping the HTML.