Skip to content

Migrate to Vercel AI SDK and enable experimental multi-step tool execution#28

Merged
Kulikowski merged 15 commits intomainfrom
evals-multiple-expected-calls
Feb 25, 2026
Merged

Migrate to Vercel AI SDK and enable experimental multi-step tool execution#28
Kulikowski merged 15 commits intomainfrom
evals-multiple-expected-calls

Conversation

@Kulikowski
Copy link
Contributor

@Kulikowski Kulikowski commented Feb 25, 2026

Replaced direct API bindings with the AI SDK by Vercel, unlocking native structured tool-calling support.

Updated the evaluation loop and JSON format to support sequentially validating arrays of expectedCalls.

Modularized Evaluator: Split the monolithic evaluator.ts into a clean src/evaluator/ directory:

  • models.ts: Instantiates LLM backends.

  • browser.ts: Extracts Puppeteer web integration.

  • mappers.ts: Normalizes raw extensions schemas into strict AI SDK interfaces.

  • prompts.ts: Extracts static system context.

  • index.ts: The core orchestrator.

@Kulikowski Kulikowski changed the title Evals multiple expected calls Migrate to Vercel AI SDK and enable experimental multi-step tool execution Feb 25, 2026
@Kulikowski Kulikowski marked this pull request as ready for review February 25, 2026 11:29
@Kulikowski
Copy link
Contributor Author

As discussed with @andreban we will add possibility of extending possible backends treating Vercel AI SDK as one of the possible backends, coming with next PRs :)

Copy link
Member

@andreban andreban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with comment on reintroducing the Backend abstraction in the next PR, so it's easy to add LLMs not supported by the Vercel SDK.

@Kulikowski Kulikowski merged commit 03d8319 into main Feb 25, 2026
2 checks passed
@beaufortfrancois beaufortfrancois deleted the evals-multiple-expected-calls branch February 25, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants