-
Notifications
You must be signed in to change notification settings - Fork 92
feat(gpt-apps): add tools and widget descriptors #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/gpt-apps
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR 👍
I would minimize introduction of the new tools to bare minimum, there is already plenty of tools and new tools with duplicated logic are not great for agents.
So I would remove the fetch-actor-details-widget tools as per my comment and merge the logic into the existing one. Then for the call-actor-async (I would renamed it from call-actor-widget and make it generalized, I mean the description so it is not that widget centric and possibly usable outside ?uiMode) and the related get-actor-run-status I would add the logic to the tool loader that it enforces this async tools pair for the ?uiMode instead of the original call-actor tool so we do not confuse the agent/LLM with complicated descriptions and instructions. We are already working on the new asynchronous tool calls / tasks that were introduces into the MCP protocol but I think it makes sense to keep these tools as the ChatGPT MCP client will probably not support that feature immediately (see #360 and https://modelcontextprotocol.io/specification/2025-11-25/basic/utilities/tasks).
BTW is there any way to test this at chatgpt.com?
| it.runIf(options.transport === 'stdio')('should use UI_MODE env var when CLI arg is not provided', async () => { | ||
| client = await createClientFn({ useEnv: true, uiMode: 'openai' }); | ||
| const tools = await client.listTools(); | ||
| expect(tools.tools.length).toBeGreaterThan(0); | ||
| await client.close(); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: does this test case test anything related to the ui mode? It just list tools and check if the tool list is not empty. I think it should at least check some meta fields related to openai ui mode.
| GET_HTML_SKELETON = 'get-html-skeleton', | ||
| CALL_ACTOR_WIDGET = 'call-actor-widget', | ||
| GET_ACTOR_RUN_STATUS = 'get-actor-run-status', | ||
| FETCH_ACTOR_DETAILS_WIDGET = 'fetch-actor-details-widget', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change: I would use the already existing fetch-actor-details tool and branch the logic based on the uiMode option provided instead of a new tool.
| - **Async vs sync Actor tools (${HelperTools.ACTOR_CALL} vs ${HelperTools.CALL_ACTOR_WIDGET}):** | ||
| Default to \`${HelperTools.ACTOR_CALL}\` (synchronous, no widget) when the user asks to “run/call” and does not request background/progress/UI. Use \`${HelperTools.CALL_ACTOR_WIDGET}\` only when the user wants background/progress/UI. After starting an async run and obtaining runId, do NOT start another async run—only poll with \`${HelperTools.GET_ACTOR_RUN_STATUS}\` using that runId. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: can the agent event decide what tool to use and when? Does the chatgpt agent/LLM have this information in the context that in runs with the ui mode or not?
| DOCS_SEARCH = 'search-apify-docs', | ||
| DOCS_FETCH = 'fetch-apify-docs', | ||
| GET_HTML_SKELETON = 'get-html-skeleton', | ||
| CALL_ACTOR_WIDGET = 'call-actor-widget', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change: I would rename this tool to call-actor-async and I would make it more general than strictly widget / ui oriented meaning I would change the tool description so It can be used generally. Then I would also change the tool loading logic for ?uiMode=openai (can be done in tool-loader.ts) so that the call-actor tool is automatically swapped in all cases for call-actor-async (this tool) when in ui mode so we don't need to do all the complicated instructions and every Actor call in the ui mode is async by default which I think is ok and simplifies the solution. What do you think? So this means that with this tool the get-actor-run-status one would have to also be present otherwise it would not work so the logic would need to reflect this.
| **SYNCHRONOUS / NO WIDGET**: Waits for the Actor to finish and returns results in the response. | ||
| **WHEN TO USE THIS TOOL:** | ||
| - User wants immediate results (e.g., "get results", "fetch data", "scrape now") | ||
| - User needs the output right away | ||
| - Quick-running Actors where waiting is acceptable | ||
| - User doesn't mention "start", "run in background", "monitor progress", "widget", or "UI" | ||
| **WHEN NOT TO USE THIS TOOL:** | ||
| - User explicitly wants to "start" a run or mentions "background" or "async" | ||
| - Long-running Actors where waiting would timeout | ||
| - User wants to monitor progress in a UI widget | ||
| → In these cases, use ${HelperTools.CALL_ACTOR_WIDGET} instead | ||
| This tool stays synchronous: it waits for completion and returns the results in the response. Do not pair it with ${HelperTools.CALL_ACTOR_WIDGET} for the same task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change: If we decide to go with the tool swapping logic for the call-actor-async tool I would remove these changes since the description is already too long and sometimes the agent/LLM does not respect that because of the length.
| _meta: { | ||
| 'openai/toolInvocation/invoking': 'Calling Actor synchronously...', | ||
| 'openai/toolInvocation/invoked': 'Actor run finished (sync)', | ||
| 'openai/widgetAccessible': false, | ||
| 'openai/resultCanProduceWidget': false, | ||
| // TODO: replace with real CSP domains | ||
| 'openai/widgetCSP': { | ||
| connect_domains: ['https://api.example.com'], | ||
| resource_domains: ['https://persistent.oaistatic.com'], | ||
| }, | ||
| 'openai/widgetDomain': 'https://chatgpt.com', | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change: if we decide for the tool swapping logic we should remove this.
| USAGE: | ||
| - Always use dedicated tools when available (e.g., ${actorNameToToolName('apify/rag-web-browser')}) | ||
| - Use the generic call-actor tool only if a dedicated tool does not exist for your Actor. | ||
| - PREFER this tool when user wants immediate results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change: same here
| // TODO: replace with real CSP domains | ||
| 'openai/widgetCSP': { | ||
| connect_domains: ['https://api.example.com'], | ||
| resource_domains: ['https://persistent.oaistatic.com'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be hosted at https://mcp.apify.com where the remote version of the Apify MCP server is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I can see you have put a lot of work into this 💪🏻 . I appreciate it!
I reviewed it only partially, the main concern is about the new tools.
Do we really need them? :)
getActorRunStatus – we don't need IMO, we have getActorRun already. We can:
- Always return structured content (not raw JSON)
- Conditionally include widget metadata when uiMode === 'openai'`
Similarly for call-actor vs call-actor-widget: I’ would prefer a single call-actor tool that:
- supports both sync and async modes (e.g. mode: "sync" | "async", we can over-ride defaults based on
uiMode) - note that two-step (info, call) flow is gonna be removed in the next PR by @MQ37
That way we don't multiply tools just for UI concerns.
Ideally, I would like to leverage the MCP tasks (all the Actor calls are async). Client starts task and polls for results.
But I'm afraid this won't be supported by OpenAI anytime soon.
Let's discuss it on a call, I'll discuss with @drobnikj to move this forward.
|
|
||
| private setupResourceHandlers(): void { | ||
| this.server.setRequestHandler(ListResourcesRequestSchema, async () => { | ||
| const resources = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, no action needed. The mcp/server.ts is super loong and we'll need to refactor it anyway. We'll add a new dir with resources.ts to make it manageable.
Summary
This PR introduces the initial GPT Apps backend/tools layer, including new and updated tools for running actors, along with their widget descriptors used by the GPT Apps UI.
Changes
UI mode:
uiquery parameter?ui=openaiit will show UI in following tools:call-actor-widgettool with UI (actor-widget)fetch-actor-details-widgetto fetch Actor details with UIsearch-actorstool to return UI if uiMode is set toopenai, if param doesn't exist it will return previously existing outputNOTE
Widget descriptors are included for early review, even though the UI that renders them is not yet part of this PR
Next steps