Skip to content

Latest commit

 

History

History
314 lines (237 loc) · 9.36 KB

File metadata and controls

314 lines (237 loc) · 9.36 KB
title Runtime Resolution
sidebarTitle Runtime Resolution
description General runtime-resolution subsystem for resolving agent runtimes at turn-time
icon rotate

The runtime-resolution subsystem maps agent and model references to concrete runtime instances at turn-time, enabling dynamic model switching and custom routing logic.

**Handoffs no longer use this subsystem.** As of [PR #2178](MervinPraison/PraisonAI#2178), handoffs always delegate directly to `agent.chat()` / `agent.achat()`. The runtime-resolution layer is used by other parts of the SDK for agent-level runtime configuration. If you came here for handoff execution behaviour, see [Agent Handoffs](/docs/features/handoffs).
graph LR
    subgraph "Runtime Resolution"
        A[🤖 Agent] --> B[🔍 Resolve Runtime]
        B --> C[(Cache)]
        C -->|hit| E[✅ Execute]
        C -->|miss| F[🏗️ Create Runtime]
        F --> C
        F --> E
    end

    classDef agent fill:#8B0000,stroke:#7C90A0,color:#fff
    classDef process fill:#F59E0B,stroke:#7C90A0,color:#fff
    classDef cache fill:#189AB4,stroke:#7C90A0,color:#fff
    classDef success fill:#10B981,stroke:#7C90A0,color:#fff

    class A agent
    class B,F process
    class C cache
    class E success
Loading

Quick Start

Update the target model at any time — the next invocation automatically picks up the change because agent.chat() reads the live llm value.

from praisonaiagents import Agent

researcher = Agent(
    name="Researcher",
    instructions="Research the topic and summarise it",
    llm="gpt-4o-mini",
)

writer = Agent(
    name="Writer",
    instructions="Write a polished article",
    llm="gpt-4o-mini",
    handoffs=[researcher],
)

# Swap the researcher to a different model at any point
researcher.llm = "claude-3-sonnet"

# The next invocation automatically uses claude-3-sonnet for the researcher
writer.start("Research and write about ocean currents")

Use get_runtime_cache and clear_runtime_cache to debug or force a fresh resolution.

from praisonaiagents.runtime import get_runtime_cache, clear_runtime_cache

# See what runtimes are cached across sessions
cache = get_runtime_cache()
for session_id, entries in cache.items():
    for cache_key, (runtime, cached_at) in entries.items():
        print(f"{cache_key}: {runtime.provider}/{runtime.model_ref}")

# Force fresh resolution for a specific session
clear_runtime_cache(session_id="session_123")

# Clear all cached runtimes
clear_runtime_cache()

Override the built-in resolver to control how models map to runtimes.

from praisonaiagents import Agent
from praisonaiagents.runtime import (
    set_global_resolver,
    SessionContext,
)
from praisonaiagents.runtime.resolve import (
    RuntimeResolver,
    AgentRuntimeProtocol,
    LLMRuntimeWrapper,
)
from praisonaiagents.llm.llm import LLM

class MyResolver(RuntimeResolver):
    def supports_model(self, model_ref: str) -> bool:
        return model_ref.startswith(("gpt-", "claude-", "my-model-"))

    def resolve(self, agent_id, model_ref, session_ctx, **kwargs):
        # Route "my-model-*" to a custom endpoint
        if model_ref.startswith("my-model-"):
            llm = LLM(model="gpt-4o-mini", api_base="https://my.endpoint/v1")
        else:
            llm = LLM(model=model_ref)
        return LLMRuntimeWrapper(llm=llm, model_ref=model_ref, agent_id=agent_id)

set_global_resolver(MyResolver())

agent = Agent(name="MyAgent", instructions="Help users", llm="my-model-fast")
agent.start("Hello!")

How It Works

The subsystem reads the agent's current llm (or model) attribute at invocation time, not at construction time.

sequenceDiagram
    participant User
    participant Agent
    participant RuntimeSubsystem
    participant Cache

    User->>Agent: start("Research and write...")
    Agent->>RuntimeSubsystem: resolve(agent_id, current_llm)
    RuntimeSubsystem->>Cache: check TTL cache
    alt cache hit (< 5 min)
        Cache-->>RuntimeSubsystem: cached runtime
    else cache miss or expired
        RuntimeSubsystem->>RuntimeSubsystem: create LLMRuntimeWrapper
        RuntimeSubsystem->>Cache: store & return
    end
    RuntimeSubsystem-->>Agent: runtime instance
    Agent->>Agent: agent.chat(prompt)
    Agent-->>User: response
Loading
Handoffs always execute via `agent.chat()` / `agent.achat()` directly — they do not call into this subsystem. Runtime resolution is used for agent-level model configuration and caching, not for handoff execution.

Configuration Options

SessionContext

Passed to resolve_runtime to scope caching and track depth.

Field Type Default Description
session_id str Required. Used as the first segment of the cache key
timestamp float time.time() if <= 0 Session start time
parent_agent_id Optional[str] None Name of the agent that triggered resolution
handoff_depth int 0 Current nesting depth

Cache constants

Constant Value Meaning
_cache_ttl_seconds 300 Each cached runtime lives for 5 minutes
_cleanup_interval 600 Background cleanup daemon runs every 10 minutes

Cache keys use the format "{session_id}:{agent_id}:{model_ref}" — caches are session-isolated so different conversations never share runtimes.


Common Patterns

Mid-conversation model swap

from praisonaiagents import Agent

analyst = Agent(name="Analyst", instructions="Analyse data", llm="gpt-4o-mini")
coordinator = Agent(name="Coordinator", instructions="Coordinate", handoffs=[analyst])

# First few turns use gpt-4o-mini
coordinator.start("Quick summary of Q1 sales")

# Upgrade to a more capable model for a detailed report
analyst.llm = "gpt-4o"
coordinator.start("Full analysis of Q1 vs Q2 with trend forecasting")

Force cache refresh

from praisonaiagents.runtime import clear_runtime_cache

# After rotating API keys or changing model config
clear_runtime_cache()

agent.start("Continue with the updated model settings")

Introspect resolved runtimes

from praisonaiagents.runtime import get_runtime_cache

cache = get_runtime_cache()
total = sum(len(entries) for entries in cache.values())
print(f"Active runtimes: {total} across {len(cache)} sessions")

Best Practices

Model re-resolution happens at each invocation boundary. Changing `agent.llm` is effective immediately for the next call — no restart needed. The 5-minute TTL means old runtimes may linger after you rotate API keys. Call `clear_runtime_cache()` to evict all entries and force fresh connections. Return `False` from `supports_model` for models you do not handle. The built-in `DefaultRuntimeResolver` acts as the final fallback, so returning `False` simply delegates back to it. If you need to change how a handoff target executes, configure the target agent itself (instructions, tools, llm). The target agent's full `chat()` pipeline runs on every handoff — this subsystem is not in that path.

Public API

All names are importable from praisonaiagents.runtime:

from praisonaiagents.runtime import (
    resolve_runtime,
    SessionContext,
    RuntimeProtocol,
    get_runtime_cache,
    clear_runtime_cache,
    set_global_resolver,
)

resolve_runtime

def resolve_runtime(
    agent_id: str,
    model_ref: str,
    session_ctx: SessionContext,
    **kwargs,
) -> AgentRuntimeProtocol:
    ...

The main entry point. Checks the TTL cache first; creates and caches a new runtime if none exists or the entry expired.

RuntimeProtocol / AgentRuntimeProtocol

Protocols that custom runtimes must satisfy. Only relevant when building a custom resolver.

class RuntimeProtocol(Protocol):
    def execute(self, prompt: str, **kwargs) -> Any: ...
    async def aexecute(self, prompt: str, **kwargs) -> Any: ...
    @property
    def model_ref(self) -> str: ...
    @property
    def provider(self) -> str: ...

class AgentRuntimeProtocol(RuntimeProtocol):
    @property
    def supports_streaming(self) -> bool: ...
    @property
    def supports_tools(self) -> bool: ...

Related

Agent-to-agent delegation — handoffs always use agent.chat() HandoffConfig reference Secure tool boundaries during handoff Filter context passed during handoff