Skip to content

Optimize agent learnings: summarizer LLM, tool-call compression, orchestrator-only reflections #1419

@senamakel

Description

@senamakel

Summary

Optimize the agent learning / reflection pipeline so it can run frequently
without burning orchestrator-tier inference. Introduce a dedicated cheap
summarizer LLM for reflection + transcript-ingest synthesis, compress
tool-call history before reflection sees it, and scope reflections to the
orchestrator agent only.

Follow-up to the heuristic transcript-ingest pipeline shipped in
#1406 — that PR keeps the seam clean for an LLM-driven extractor to plug
in without touching callers; this issue covers the "real" LLM path.

Problem

The current learning pipeline has three pressure points:

  1. Reflections need a model, but not the orchestrator's. Reflection /
    ingest is going to fire often (per session-memory threshold crossing,
    on transcript close, on segment close). Running it on the same
    high-tier model the orchestrator uses is wasteful — most of what
    reflection produces is short, structured summaries that a cheap model
    handles fine. A separate summarizer tier keeps the hot path expensive
    and the cold path cheap.

  2. Tool-call history is the dominant token cost. Reflection over a
    raw transcript drags every tool call's full output through the
    summarizer, which (a) inflates cost and (b) blows past the
    summarizer's context window. We need a compression / concatenation
    pass that collapses tool calls into per-tool digests (count, success
    rate, key outputs) before reflection sees them.

  3. Reflection should be orchestrator-only. Today the hooks fire on
    any agent that crosses the threshold, including sub-agents and
    specialists. Sub-agent transcripts are short-lived, scoped to a
    single delegation, and almost never carry durable user context worth
    surfacing in future chats. Restricting reflection to the orchestrator
    removes a class of low-signal extractions and matches the user's
    mental model of "what the assistant remembers across chats."

Constraints worth calling out:

  • Summarizer context window is significantly smaller than the
    orchestrator's
    , so any pre-summarizer compression has to be aggressive
    enough to fit a multi-turn transcript into the smaller window.
  • Reflection / ingest must remain background-first — the orchestrator
    turn's user-visible latency must not regress.
  • The summarizer must be swappable (local Ollama vs cloud) and
    opt-out-able so users without a configured summarizer fall back to
    the existing heuristic path from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406.

Solution (optional)

Sketch — happy to iterate:

  • New SummarizerProvider trait + config knob. Pluggable model,
    separate from the orchestrator provider. Cloud default + Ollama
    fallback. Carries its own context-window cap so callers can budget.
  • Tool-call digest layer. Before any reflection / ingest LLM call,
    collapse tool messages into a ToolCallDigest { name, count, success_rate, p95_duration_ms, sample_inputs, sample_outputs }
    shape. Drop raw outputs past a small per-tool cap.
  • Orchestrator-only gating. Add an is_orchestrator() (or
    agent_role) check on the reflection / ingest hooks; skip silently
    for sub-agents. Keep the existing turn-level ReflectionHook for the
    orchestrator path.
  • Two-stage extract for transcript-ingest. Stage 1 (heuristic, what
    feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406 shipped) generates candidates. Stage 2 (summarizer) merges
    near-duplicate candidates, scores importance, and writes the merged
    output back to conversation_memory / conversation_reflections.
  • Telemetry: surface summarizer cost / latency / context-fill alongside
    the existing reflection metrics so we can tune the trigger thresholds.

Acceptance criteria

  • Summarizer provider integration — a dedicated cheap summarizer
    is configurable per workspace, distinct from the orchestrator provider,
    and used by the reflection + transcript-ingest paths.
  • Tool-call compression — reflection / ingest never feeds raw
    multi-call tool history to the summarizer; calls are collapsed into
    per-tool digests that fit the summarizer's smaller context window.
  • Orchestrator-only reflections — sub-agents and specialists no
    longer trigger reflection / transcript ingest; only the
    user-facing orchestrator does.
  • Background-first preserved — orchestrator turn latency does not
    regress; summarizer calls run on the same fire-and-forget surface used
    by spawn_session_memory_extraction and spawn_transcript_ingestion.
  • Heuristic fallback intact — when no summarizer is configured,
    the heuristic path from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406 remains the source of truth so reflection
    doesn't silently break for offline users.
  • Observability — debug logs surface summarizer cost, latency,
    context-fill, and how many tool calls were compressed in each pass.
  • Diff coverage ≥ 80% — the implementing PR meets the changed-lines
    coverage gate (Vitest + cargo-llvm-cov, enforced by `.github/workflows/coverage.yml`).

Related

  • Builds on feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406 (transcript-to-memory ingestion pipeline) — extends
    src/openhuman/learning/transcript_ingest/ with an LLM-driven extractor.
  • Related modules: `src/openhuman/learning/reflection.rs`,
    `src/openhuman/agent/harness/session/turn.rs::spawn_session_memory_extraction`,
    `src/openhuman/context/session_memory.rs`.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentBuilt-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/.featureNet-new user-facing capability or product behavior.memoryMemory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions