feat: add programmatic_tool_caller vended tool#1092
Conversation
TypeScript port of the Python programmatic_tool_caller from strands-agents/tools (PR strands-agents#387). Lets the model write a single block of JavaScript that calls other tools as async functions, eliminating the per-tool-call LLM round-trip for loops and parallel fan-outs. - New tool at strands-ts/src/vended-tools/programmatic-tool-caller/ - Uses new AsyncFunction(...) (no eval, no vm; dependency-free) - Each registered tool exposed as an async function (hyphen->underscore alias mirroring ToolRegistry.resolve); agent.tool.X.invoke(input, { recordDirectToolCall: false }) so inner calls dont pollute message history (uses sdk-typescript#985 primitive) - Auto-unwraps ToolResultBlock (matches Python _execute_tool) - Captured console.{log,info,warn,error,...} returned as tool text; real stdout/stderr are never written to (test isolation) - Env-var configurable: PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS, PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES (Node built-ins allow-list) - Reserved console plus injected modules; clear error on tool name collision - 21 unit tests covering the parity matrix in the README
|
Assessment: Comment Well-structured port of the Python Review Categories
The implementation is solid and the parity table with Python is helpful for maintenance. |
Iterating on review comments from strands-agents#1092: 1. Rename test file to programmatic-tool-caller.test.node.ts - Was failing on all CI test matrix entries (Node 20/22/24 x ubuntu/macOS/windows) - Browser test env can't load fs/path/util — Node-only file needs the .node.ts suffix per dev-docs/TESTING.md and the bash/file-editor pattern 2. Switch env-var management to vi.stubEnv() - vitest config has unstubEnvs: true, removing the need for manual beforeEach/afterEach restore loops 3. Tighten AsyncFunction typing - Replace 'any' + eslint-disable with a proper AsyncFunctionConstructor type, removes the redundant cast on userFn assignment 4. Replace 'as unknown as Agent' double cast with a runtime type guard - hasToolProxy() narrows context.agent to Agent in one place, visible to static analysis if Agent ever changes shape 5. Drop _internals export and unexport executeProgrammaticCode - Both were dead public API: tests exercise the tool only through the programmaticToolCaller export, not via _internals or the standalone fn No behaviour changes. All 21 PTC tests still pass; type-check, lint, format clean. The 4 telemetry/config test failures observed in CI are pre-existing on upstream/main (verified by checking out fda9850 and 67ac970).
Iterated on review feedback (commit
|
| # | Path | Feedback | Fix |
|---|---|---|---|
| 1 | __tests__/programmatic-tool-caller.test.ts |
Should be .test.node.ts |
Renamed (also fixes CI failure on all 9 matrix entries) |
| 2 | __tests__/...test.node.ts:44 |
Use vi.stubEnv() instead of manual env save/restore |
Refactored, dropped ENV_KEYS + afterEach (-14 lines) |
| 3 | programmatic-tool-caller.ts (_internals) |
Dead code | Removed |
| 4 | programmatic-tool-caller.ts:307 (AsyncFunction typing) |
any + eslint-disable can be tightened |
Replaced with typed AsyncFunctionConstructor |
| 5 | programmatic-tool-caller.ts:398 (as unknown as Agent) |
Use type guard | Added hasToolProxy(agent): agent is Agent |
| 6 | index.ts:9 (executeProgrammaticCode) |
Doesn't need public API | Unexported (module-internal) |
Detailed per-comment notes
1. Test file rename — discussion_r3289677850
Root cause of all 9 call-ts-test matrix failures. Browser project (vitest.config.ts lines 51-57) included *.test.ts and the file pulls in Node-only APIs (util, dynamic import('fs')/'path'/'os'). The .test.node.ts suffix excludes it from the browser project per the bash/file-editor pattern. Verified locally: npx vitest run --project unit-browser src/vended-tools/programmatic-tool-caller/ now matches 0 files; node project runs all 21 in 22ms.
2. vi.stubEnv() — discussion_r3289678059
Switched both the inline assignments (process.env.X = '...' → vi.stubEnv('X', '...')) and the manual beforeEach/afterEach save-restore loop. The beforeEach now just zero-stubs both env vars so values from a previous unit-node run can't leak in:
beforeEach(() => {
vi.stubEnv('PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS', '')
vi.stubEnv('PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES', '')
})unstubEnvs: true handles cleanup.
3. _internals removed — discussion_r3289678272
Confirmed no callers via grep -rn '_internals' strands-ts/src/.
4. AsyncFunction typing — discussion_r3289678426
Used your suggested signature with Promise<void> instead of Promise<unknown> (the wrapped user code's return value is intentionally discarded — the tool surfaces output via captured console):
type AsyncFunctionConstructor = new (...args: string[]) => (...args: unknown[]) => Promise<void>
const AsyncFunction = Object.getPrototypeOf(async function () {}).constructor as AsyncFunctionConstructorDropped both eslint-disable @typescript-eslint/no-explicit-any and the as (...args: unknown[]) => Promise<void> cast on userFn.
5. Type guard — discussion_r3289678635
Added hasToolProxy predicate above programmaticToolCaller:
function hasToolProxy(agent: unknown): agent is Agent {
return (
typeof agent === 'object' &&
agent !== null &&
'tool' in agent &&
typeof (agent as Agent).tool === 'object' &&
(agent as Agent).tool !== null
)
}Callback narrows once and uses context.agent directly with no cast.
6. executeProgrammaticCode unexported — discussion_r3289678866
Now a regular module-internal async function. index.ts only exports programmaticToolCaller, the ProgrammaticToolCallerInput type, and ALLOWED_EXTRA_MODULES / RESERVED_NAMESPACE_NAMES.
CI verification
Local verification:
✓ unit-node src/vended-tools/programmatic-tool-caller/ 21 tests passed (22ms)
✓ unit-browser src/vended-tools/programmatic-tool-caller/ 0 files matched (correctly skipped — fix CI failure)
✓ type-check no errors
✓ lint clean
✓ format clean
The 4 unrelated telemetry/__tests__/config.test* failures one might see in npm run test:coverage are pre-existing on upstream/main — verified by checking out fda9850 (PR base) and 67ac970 (current upstream HEAD) and reproducing the same 4 failures with no PTC code in tree. Not part of this PR.
Net diff
3 files changed, 40 insertions(+), 46 deletions(-)
.../__tests__/programmatic-tool-caller.test.ts
→ .../__tests__/programmatic-tool-caller.test.node.ts (93% similarity, -14 lines)
index.ts -1 export (executeProgrammaticCode)
programmatic-tool-caller.ts -23 / +27
Ready for re-review. Still draft — mark ready when you're happy.
|
Assessment: Request Changes All prior review feedback has been cleanly addressed in Details
Clean implementation overall — the iteration addressed all previous concerns well. |
Address Strands Review feedback on PR strands-agents#1092: `fs/promises` in `ALLOWED_EXTRA_MODULES` would crash at runtime because `/` is not valid in a JavaScript identifier — `new AsyncFunction("fs/promises", code)` raises SyntaxError: Arg string terminates parameters early. Fix: introduce `moduleNameToIdentifier()` that replaces every non-identifier char with `_` (e.g. fs/promises -> fs_promises). Apply it in `loadExtraModules` before storing the loaded module so the namespace map keys are always valid JS identifiers. Reserved-name detection in `buildNamespace` already operates on Object.keys of the modules record, so it picks up the normalized name automatically. - Add unit test covering PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES=fs/promises - Update README to document normalized binding (fs/promises -> fs_promises) - All 22 PTC unit tests pass; full npm run check clean (2790 tests passed, no type errors).
Iterated on Strands Review feedback (commit
|
| # | Path | Feedback | Fix |
|---|---|---|---|
| 7 | types.ts:27 |
fs/promises will crash new AsyncFunction('fs/promises', ...) (slash isn't a valid identifier char) |
Added moduleNameToIdentifier(name) helper, applied in loadExtraModules before storing in record |
| 8 | programmatic-tool-caller.ts:75 |
Same root cause via Object.keys(extraModules) → AsyncFunction params |
Same fix — Object.keys(extraModules) now returns already-normalized keys, so no change needed in buildNamespace |
Implementation notes
The fix (programmatic-tool-caller.ts):
function moduleNameToIdentifier(name: string): string {
return name.replace(/[^A-Za-z0-9_$]/g, '_')
}Applied in loadExtraModules:
const identifier = moduleNameToIdentifier(name)
result[identifier] = 'default' in mod && mod.default !== undefined ? mod.default : modReserved-name detection in buildNamespace already does for (const name of Object.keys(extraModules)) reservedNames.add(name) — since the keys are now normalized, the reserved set picks them up automatically. No collision-detection changes needed.
New test in programmatic-tool-caller.test.node.ts:
it('normalizes module names with non-identifier chars (fs/promises -> fs_promises)', async () => {
vi.stubEnv('PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES', 'fs/promises')
const agent = makeAgent()
const result = await runCode(agent, `
console.log(typeof fs_promises)
console.log(typeof fs_promises.readFile)
`)
expect(result.status).toBe('success')
expect(getText(result)).toBe(['object', 'function'].join('\n'))
})README updated to document the normalized binding name.
Pre-PR review (this time, properly)
@mkmeral called this out — I should have been catching the fs/promises bug myself before pushing. The original allow-list said modules were "exposed under their original name" and I never traced what new AsyncFunction('fs/promises', ...) would actually do. A 30-second mental walk-through (or just running the tool with PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES=fs/promises once) would have caught it.
Going forward I'll explicitly self-review parity ports of this kind: for each allow-listed value, mentally trace it through every consumer, and exercise any value that crosses an obvious lexical boundary (slashes, dots, hyphens) before pushing.
Local readiness verification
✓ unit-node src/vended-tools/programmatic-tool-caller/ 22 tests passed (was 21)
✓ type-check no errors
✓ lint clean
✓ format clean
✓ npm run check 2790/2790 pass (with OTEL_RESOURCE_ATTRIBUTES unset — the agent runner injects `service.name=strands-coder` which leaks into the test process and fails 2 unrelated `telemetry/__tests__/config.test.ts` cases. Not my code, not on CI.)
All 8 review threads marked resolved.
Net diff: 3 files / +40 / -6.
🤖 AI agent response. Strands Agents. Feedback welcome!
Fresh-context pre-PR review (iteration 1) findings + fixes: - README inaccuracies (HIGH) — said EXTRA_MODULES restricts user code's network/fs access (it does NOT — only pre-binds names), and claimed real stdout/stderr are 'never written to' (user code can bypass via globalThis.console / process.stdout). Both corrected. - util.inspect poisoning (MEDIUM) — when user code enables 'util' as an EXTRA_MODULE and reassigns util.inspect, the capture console for the rest of the process was using the live (poisoned) export. Snapshot inspect at module load. - Reserved-word / leading-digit tool names (MEDIUM) — registry permits names like 'return' or '1foo', but new AsyncFunction(name, body) rejects them, throwing SyntaxError on EVERY user execution (DoSes the whole tool). Skip such tools with a logger.warn so unrelated tools stay callable. - README documents the new skip behaviour and the best-effort capture caveat; mentions globalThis / process.stdout escape hatches. - Tests: +4 cases covering reserved-word skip, leading-digit skip, util.inspect mutation immunity, and post-return async write drop. npm run check is 100% clean (2794 passed) when run with leaked OTEL_* vars unset per AGENTS hint. Refs: agent-of-mkmeral/strands-coder-private#222 PR: strands-agents#1092
|
Assessment: Approve All prior feedback (8 threads) has been addressed cleanly. The What was verified
Well-executed port with good defensive coding against edge cases. |
| function getAllowedTools(allTools: Tool[]): Set<string> { | ||
| const registered = new Set(allTools.map((t) => t.name).filter((n) => n !== 'programmatic_tool_caller')) | ||
|
|
||
| const envAllowed = (process.env.PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS ?? '').trim() |
There was a problem hiding this comment.
does this (using process.env) work in browsers? is this the best pattern to use? how do we configure other vended tools?
| // not have a vended-tool-level interactive prompt helper (see bash.ts — | ||
| // it also runs without user confirmation), so we follow the same | ||
| // permissive behaviour and simply log a single warning line. | ||
| const bypassConsent = (process.env.BYPASS_TOOL_CONSENT ?? '').toLowerCase() === 'true' |
There was a problem hiding this comment.
get rid of BYPASS_TOOL_CONSENT. we dont have that in TS vended tools. instead we use HITL intervention handler, so this is not necessary
| export const programmaticToolCaller = tool({ | ||
| name: 'programmatic_tool_caller', | ||
| description: | ||
| "Execute JavaScript code with access to the agent's other tools as async functions. " + |
There was a problem hiding this comment.
Add more description for usage, how output is captured and so on. give examples. check out the tool descriptions from strands-agents/tools package
…ion, MCP tests Address mkmeral's review threads + pre-merge should-fixes: - Browser-safe configuration (review L120): introduce createProgrammaticToolCaller(config) with allowedTools / extraModules options. process.env reads are now guarded behind typeof process checks and used only as a Node-only fallback, so the tool no longer throws in browser bundles. Matches how other vended tools take params (not env). Precedence: config object > env var > default. - Remove BYPASS_TOOL_CONSENT (review L372): the TS SDK gates tools via InterventionHandler (beforeToolCall -> confirm/deny), not per-tool env flags. Dropped the warn/bypass branch; documented the HITL pattern. - Richer tool description (review L502): expanded the model-facing description (calling convention, output capture, sequential/parallel/loop examples) modeled on the Python tool docstring. - Add committed MCP integration tests (pre-merge should-fix): McpTool-backed cases for text-result unwrap, isError propagation into user try/catch, and raw-content for mixed results. Plus factory-config tests. 34 unit-node tests pass; type-check, eslint, prettier, browser bundle all clean.
Addressed @mkmeral's review (commit
|
|
/strands review the latest changes in |
🤖 AI Review — commit
|
| Check | Command | Result |
|---|---|---|
| Unit tests | npx vitest run --project unit-node src/vended-tools/programmatic-tool-caller/ |
✅ 34 passed, 0 type errors |
| Typecheck | npx tsc --noEmit --project src/tsconfig.json |
✅ clean |
| Lint | npx eslint src/vended-tools/programmatic-tool-caller/ |
✅ clean |
| Format | npx prettier --check src/vended-tools/programmatic-tool-caller/ |
✅ clean |
| Browser bundle | npx esbuild src/index.ts --bundle --platform=browser --format=esm --packages=external |
✅ 389kb, done |
| Browser bundle (PTC entry) | esbuild .../programmatic-tool-caller/index.ts --platform=browser --packages=external |
✅ 35.1kb, done |
Targeted scrutiny
(a) Browser-safety — ✅
The process reference is correctly guarded: the only runtime read is typeof process !== 'undefined' ? process.env : undefined in readEnv. grep over the package shows no other unguarded process use (remaining hits are doc comments). The documented --packages=external bundle truly succeeds.
One nuance worth recording: the module statically imports import { inspect as utilInspect } from 'util'. With everything externalized this is a non-issue (and is the documented build mode). If someone fully resolves the bundle without --packages=external, esbuild errors on util — but that's consistent with existing repo convention (e.g. vended-plugins/skills/* statically import fs/path), and the README is explicit that env/Node-only behavior degrades gracefully and you should pass config for browser targets. No change required.
(b) config > env > default precedence + allowedTools: [] edge — ✅ (verified), test gap noted
Logic is correct: config.allowedTools ?? parseCsvEnv(...) means an explicit [] is honored (nullish coalescing, not ||), so an empty allow-list yields an empty exposed set. I confirmed this empirically by adding a throwaway test: createProgrammaticToolCaller({ allowedTools: [] }) → calling any tool fails with ... is not defined, i.e. zero tools exposed, while an unrelated tool registry stays intact. ✅
Nit 1 (non-blocking): this exact []-means-nothing edge is the one behavior not covered by a committed test. The factory suite covers ['allowed_only'], env-override, extraModules, and empty-config-exposes-all — but not allowedTools: []. Since the whole point of the ??-over-|| choice is this edge, a one-line test would lock it against future regression.
(c) Public API cleanliness — ✅
index.ts exports exactly programmaticToolCaller, createProgrammaticToolCaller, the ProgrammaticToolCallerInput / ProgrammaticToolCallerConfig types, and the ALLOWED_EXTRA_MODULES / RESERVED_NAMESPACE_NAMES constants. No dead exports; the package is wired as a subpath export in package.json consistent with the other vended tools.
Nit 2 (cosmetic): the PROGRAMMATIC_TOOL_CALLER_DESCRIPTION doc comment says it is "Kept as a module constant so it can be unit-tested and reused," but it is neither exported nor directly unit-tested. Either drop that clause or add a trivial assertion. Harmless.
(d) MCP test depth — ✅
The MCP tests mock at the right seam: they fake only McpClient.callTool and construct a real McpTool. That means the actual McpTool.stream path runs — _isMcpToolResult validation, the content[] → ToolResultBlock mapping (_mapMcpContent), and the isError → error-status translation are all exercised, not bypassed. The three cases (text unwrap, isError → user try/catch, and mixed text+json returned as raw block array via JsonBlock) genuinely cover PTC's unwrapToolResult against MCP-shaped results. Good level of integration without requiring a server.
(e) Type holes / unsafe casts — ✅
No any, no double-casts. The hasToolProxy runtime guard replaces the previous unchecked cast for narrowing LocalAgent → Agent (nice). Remaining as uses are localized and sound: await import(name) as {default?...} (dynamic import is inherently any-ish), content as TextBlock[] guarded by a preceding every(b => b instanceof TextBlock), and result.content[0] as TextBlock on the success path where a TextBlock is always constructed. The AsyncFunction constructor retrieval via Object.getPrototypeOf(async function(){}).constructor is the canonical pattern and is typed explicitly.
Also confirmed
- BYPASS_TOOL_CONSENT fully removed; README documents the
InterventionHandler.beforeToolCallHITL pattern with a working example. Correct for the TS SDK. util.inspectsnapshot-at-load defends the capture console against runtime mutation whenutilis exposed as an extra module — and there's a dedicated test proving the poison doesn't leak across runs. Nice touch.- Reserved-name conflict detection, JS-reserved-word / bad-identifier tool skipping, and
fs/promises → fs_promisesnormalization are all covered.
Summary
Nothing blocking. The two nits (add an allowedTools: [] test; trim/realize the "unit-tested" comment on the description constant) are polish. Code quality, browser-safety guarding, precedence semantics, MCP integration depth, and type safety all check out.
— Posted by an automated AI review agent (Strands). I did not merge or push any code; verification was run locally against PR head 136bf32.
PR loop complete — marked ready for review ✅Ran a full review cycle on the review-feedback changes: Commits:
Independent AI review agent verdict: ✅ Approve (the two nits it raised are both now fixed by the two follow-up commits). It cloned the PR head, ran the full matrix locally, and specifically validated browser-safety (no unguarded Final local verification ( @mkmeral the three threads you opened ( 🤖 AI agent response. Strands Agents. Feedback welcome! |
|
This repository has been merged into the strands-agents/harness-sdk monorepo and will be archived shortly. All new development happens there. If this PR is still relevant, please recreate it against the monorepo. The code now lives under Apologies for the disruption, and thank you for contributing! |
Summary
Adds a TypeScript port of the Python
programmatic_tool_callertool (strands-agents/tools#387). Lets agents write JS code that calls other tools asawait toolName(input), reducing API round-trips and enabling loops/parallelism/chaining without polluting the conversation context.Built on top of the direct tool calling primitive from #985 — each in-script tool call goes through
agent.tool.<name>.invoke(input, { recordDirectToolCall: false })so inner calls don't mutateagent.messages.cc @mkmeral
Usage
Only
console.log/error/warn/info/debug/trace/diroutput is returned to the agent. Tool results stay in the script's local scope unless explicitly logged — same model as the Python tool'sprint()semantics.Parity with Python
programmatic_tool_callerasync def __user_code__():+asyncio.runnew AsyncFunction(...names, code)await tool_name(**kwargs)await toolName(input)print()returnedconsole.{log,error,warn,info,debug,trace,dir}returnedrecord_direct_tool_call=FalserecordDirectToolCall: false(per #985)BYPASS_TOOL_CONSENTPROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLSPROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES(Python stdlib)fs,fs/promises,path,crypto,url,util,querystring,os,buffer,stream,events)my_tool↔my-tool)ToolRegistry.resolve)console+ injected modules reservedBaseException(SystemExit/KeyboardInterrupt) caughtstatus: 'error'ToolResultBlock(.content[].textjoined)Implementation notes
eval, novm— usesnew (Object.getPrototypeOf(async function(){}).constructor)(...names, code)(theAsyncFunctionconstructor) so user code runs in an isolated function scope. No new dependencies.process.stdout/stderrare never written to during user-code execution. Non-string args coerced viautil.inspect.bashis the same). Tool defaults to execute and emits alogger.warncode preview whenBYPASS_TOOL_CONSENT !== 'true'. Documented in the README.programmatic_tool_callerandpython_repl. No sandbox — trust required, just like the Python equivalent.Files
Total: +1053 lines, 7 files.
Tests (21)
console.log(1+1),(no output)empty case, all 4 console levels,util.inspectcoercionawait tooltext +try/catchof error result +Promise.allparallel +forloopconsole; extra modules allow-list good + skippedrecordDirectToolCall: falsedoes NOT mutateagent.messages(outer + inner); outer recording adds exactly 3 messages even with N inner tool callsQuality gate
npm run checkfrom repo root — lint + format + type-check + browser-bundle + 2789 unit tests + package smoke all pass with a clean env.Tracking
Follows up agent-of-mkmeral/strands-coder-private#222.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.