Some AI widget/runtime improvements I’ve been experimenting with #3409

simoncherry9 · 2026-05-27T00:00:38Z

simoncherry9
May 27, 2026

Hi, I’ve been spending quite a bit of time working on improvements around the AI widget/runtime stack and wanted to see if maintainers would be interested in upstreaming some of this work at some point.

Most of these changes are already running in my own environment and autonomous workflows, so they’re not just ideas or mockups anymore. That said, I’m still polishing things, stress testing, and cleaning up edge cases before considering larger contributions or PRs.

Main areas I’ve been working on

OpenAI-style tool calling compatibility

I added a compatibility layer so providers/models following the OpenAI format can handle tool calls in a more consistent way.

Mainly this was to reduce provider-specific hacks and make backend integration less painful.

Automatic local llama.cpp discovery

I added support for automatically detecting local GGUF models and active llama.cpp runtimes.

I’m already using this locally for self-hosted/offline workflows without needing to manually register every model/runtime.

Streaming tool execution

I reworked part of the execution flow so tool output can stream in real time while tasks are still running instead of waiting for full completion.

This made a pretty noticeable difference for things like:

shell commands
searches
multi-step workflows
longer automation tasks

Runtime/provider abstraction

I started separating provider/inference logic from orchestration/runtime execution logic.

This made it easier to:

switch between local models and remote APIs
run hybrid setups
add new backends
avoid duplicated provider-specific code

Experimental memory/personalization layer

I also added a small experimental memory system so the assistant can remember information the user explicitly wants persisted across sessions/interactions.

The goal here is to make the agent feel less stateless and more contextual over time.

Things like:

user preferences
recurring instructions
operational context
common workflows
expected assistant behavior

Autonomous workflow improvements

I’ve also been testing more complex autonomous loops, multi-step execution, context handling, and tool orchestration in more realistic workflows rather than only basic chat scenarios.

General cleanup/refactoring

A lot of this also involved cleaning up internal runtime flow in order to:

reduce duplicated logic
simplify debugging
improve maintainability
make future extensions easier

Everything is still evolving and I’m continuing to refine behavior and reliability, but I figured I’d ask first in case maintainers are interested in any of these directions.

If so, I’d be happy to clean things up further and split specific parts into upstream-friendly PRs.