feat(learning): summarizer LLM, tool-call digests, orchestrator-only reflection (#1419) by senamakel · Pull Request #1519 · tinyhumansai/openhuman

senamakel · 2026-05-12T06:41:12Z

Summary

New SummarizerProvider trait + ConfiguredSummarizer wrapper. Reflection and transcript-ingest can route through a cheap dedicated model instead of burning orchestrator-tier inference. Opt-in via learning.summarizer.enabled — defaults preserve the heuristic-only path from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406.
Tool-call digest layer collapses raw multi-call tool history into per-tool aggregates (count, success rate, p95 duration, bounded samples) so the summarizer's smaller context window never sees raw payloads.
Reflection hook + spawn_transcript_ingestion gated to the user-facing orchestrator only — sub-agent transcripts no longer trigger reflection writes.
Two-stage transcript ingest: heuristic stage (unchanged) + Stage 2 near-duplicate merge over Jaccard similarity. Paraphrases of the same preference / decision collapse into one merged candidate with combined provenance before dedupe + persist.
Telemetry: summarizer dispatches log label, model hint, input chars, cap, fill ratio, latency. Merge pass logs compressed-pair counts.

Problem

Reflection and transcript-ingest fire often (per session-memory threshold crossing, on transcript close, on segment close). Running them on the same high-tier model the orchestrator uses is wasteful — most of what reflection produces is short, structured summaries that a cheap model handles fine. Two other pressure points compound the cost: raw tool-call history is the dominant token sink, and reflection currently fires on every agent (orchestrator + sub-agents + specialists) even though sub-agent transcripts almost never carry durable user context worth surfacing across chats.

See #1419 for the full motivation and constraints (background-first, swappable cloud vs Ollama, opt-out-able with heuristic fallback).

Solution

src/openhuman/learning/summarizer/: trait + cloud-backed impl. ConfiguredSummarizer wraps any Provider and routes via a model hint (hint:fast by default), with a clamped context-window cap so misconfiguration can't produce zero-byte truncation.
summarizer::digest: compress_tool_calls groups by tool name, keeps SAMPLES_PER_TOOL (2) bounded samples per tool, truncates outputs to 160 chars. render_tool_digests produces a compact prompt block.
learning::reflection: when a summarizer is supplied via ReflectionHook::with_summarizer, the reflection LLM call is routed through it (with chars-cap truncation as a safety net). Otherwise the legacy local/cloud routing applies. The reflection prompt now compresses repeated tool calls into digests; the legacy per-call line is preserved when every tool was invoked at most once.
agent::harness::session::builder: only registers ReflectionHook when agent_id == "orchestrator". Optionally wraps the routed cloud provider in ConfiguredSummarizer when learning.summarizer.enabled && source == Cloud.
agent::harness::session::turn::spawn_transcript_ingestion: early-return when the agent isn't the orchestrator.
learning::transcript_ingest::merge: Jaccard-similarity (0.55 threshold) merge step before dedupe/persist. Collapses same-kind near-duplicates, picks the longest snippet as representative, unions provenance message indices, promotes importance to High if any member of the group was High.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — new modules ship with unit tests; reflection routing has an end-to-end test through the summarizer trait.
Coverage matrix updated — N/A: behaviour-only change extending existing learning subsystem coverage rows.
All affected feature IDs from the matrix are listed in the PR description under ## Related
No new external network dependencies introduced
Manual smoke checklist updated if this touches release-cut surfaces — N/A: background pipeline; no release-cut surface.
Linked issue closed via Closes #NNN in the ## Related section

Impact

Runtime: Rust core only (desktop). No frontend, no Tauri shell changes. Hooks remain tokio::spawn fire-and-forget, so orchestrator turn latency is unaffected.
Compatibility: opt-in. With learning.summarizer.enabled = false (default) the only behavioural changes are (a) reflection prompts use per-tool digests when a turn has repeated calls, and (b) sub-agents no longer trigger reflection / transcript-ingest. Heuristic fallback from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406 stays intact for offline users.
Cost: when enabled, frequent reflection / transcript-ingest passes route to the cheap-tier model via hint:fast instead of the orchestrator's reasoning model.

Test plan

cargo test --lib openhuman::learning -- --test-threads=1 — 132 passed
cargo check --manifest-path Cargo.toml — clean
cargo check --manifest-path app/src-tauri/Cargo.toml — clean
cargo fmt — clean

Summary by CodeRabbit

New Features
- Added optional dedicated summarizer configuration for learning and reflection operations with cloud or local routing options.
- Implemented tool call compression in reflection prompts for improved efficiency.
- Added transcript near-duplicate detection and merging to reduce redundant memory extraction.
Bug Fixes
- Restricted reflection and transcript ingestion to orchestrator agents, preventing unintended execution on sub-agents.

…reflection (tinyhumansai#1419) - New `SummarizerProvider` trait + `ConfiguredSummarizer` wrapper so reflection + transcript-ingest can run on a cheap dedicated model instead of the orchestrator-tier provider. Opt-in via `learning.summarizer.enabled`; heuristic fallback from tinyhumansai#1406 stays intact for offline users. - Tool-call digest layer (`learning::summarizer::digest`) collapses multi-call tool history into per-tool aggregates (count, success rate, p95 duration, bounded input/output samples) so the smaller summarizer context window never sees raw payloads. - Reflection prompt now compresses repeated tool calls into digests before the LLM call; preserves the legacy per-call rendering when every tool was invoked at most once (keeps existing assertions green). - Reflection hook + transcript-ingestion spawn are gated to the orchestrator agent only — sub-agent transcripts no longer trigger reflection or conversational-memory writes. - Two-stage transcript ingest: heuristic Stage 1 (unchanged) + Stage 2 near-duplicate merge over Jaccard similarity on tokens. Collapses paraphrases of the same preference / decision before dedupe + persist. - Telemetry: summarizer dispatches log label, model hint, input chars, cap, fill ratio, latency. Merge pass logs compressed-pair counts. - Tests: per-tool digest grouping/truncation, summarizer routing end-to-end, prompt compression on repeated tool calls, near-duplicate preference collapse with provenance merging.

coderabbitai · 2026-05-12T06:41:26Z

📝 Walkthrough

Walkthrough

This PR implements a dedicated cheap summarizer LLM for agent learning: adding a pluggable SummarizerProvider abstraction configurable per workspace, compressing tool-call history into per-tool digests before reflection/ingestion processes transcripts, and restricting both reflection and transcript ingestion to the orchestrator agent to reduce low-signal extractions from sub-agents.

Changes

Dedicated Summarizer for Reflection & Transcript Ingest

Layer / File(s)	Summary
Summarizer configuration contract `src/openhuman/config/schema/learning.rs`, `src/openhuman/config/schema/mod.rs`, `src/openhuman/config/mod.rs`, `src/openhuman/learning/mod.rs`	`SummarizerConfig` struct and `SummarizerSource` enum are added to the learning schema (cloud vs. local, disabled by default, "hint:fast" model, 6000 char context) and re-exported through config and learning module hierarchies.
SummarizerProvider trait and ConfiguredSummarizer implementation `src/openhuman/learning/summarizer/mod.rs`, `src/openhuman/learning/summarizer/tests.rs`	`SummarizerProvider` trait defines context-window budget, label, and one-shot async prompt API; `ConfiguredSummarizer` wraps an existing provider, dispatches via `simple_chat`, logs input/output sizes and latency, and clamps context window to a minimum.
Tool-call digest compression `src/openhuman/learning/summarizer/digest.rs`	`ToolCallDigest` aggregates tool name, invocation/success counts, p95 duration, and bounded truncated input/output samples; `compress_tool_calls` groups by tool name deterministically and accumulates statistics; `render_tool_digests` formats digests into compact bullet-style prompt blocks.
Reflection hook summarizer integration `src/openhuman/learning/reflection.rs`, `src/openhuman/learning/reflection_tests.rs`	`ReflectionHook` gains optional `SummarizerProvider` field via new `with_summarizer` constructor; reflection prompt generation compresses repeated tool calls into per-tool digests when appropriate; `run_reflection` routes through summarizer when present, truncating to its context window before dispatching.
Transcript ingest near-duplicate merge stage `src/openhuman/learning/transcript_ingest/merge.rs`	New stage-2 merge module collapses near-duplicate `MemoryCandidate`s using Jaccard-similarity threshold over tokenized, stopword-filtered content; selects longest content as representative and merges provenance indices.
Transcript ingest API and merge stage wiring `src/openhuman/learning/transcript_ingest/mod.rs`	New public APIs `ingest_transcript_path_with_summarizer` and `ingest_session_transcript_with_summarizer` accept optional `SummarizerProvider` and pass it into stage-2 merge; existing functions delegate to new summarizer-aware versions with `None`.
Session agent orchestrator-only gating `src/openhuman/agent/harness/session/builder.rs`, `src/openhuman/agent/harness/session/turn.rs`	Reflection hook registration is gated to `is_orchestrator` agent check and conditionally wires optional dedicated summarizer provider (resolving/creating routed provider when cloud source enabled); transcript ingestion spawn is gated to orchestrator agent only. Both log debug messages when skipped.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

tinyhumansai/openhuman#1406: Predecessor transcript-ingest heuristic pipeline that this PR extends with LLM-driven extraction via the new stage-2 merge and summarizer APIs.
tinyhumansai/openhuman#1344: Modifies agent session builder reflection wiring to inject reflection chunks; this PR modifies the same builder path to gate reflection to orchestrator and wire optional summarizer provider.
tinyhumansai/openhuman#1460: Introduces summarizer types and config that align with and extend the new summarizer abstraction, provider, and configuration introduced in this PR.

Poem

🐰 A rabbit hops through transcripts, snippets, tools—
Compressing calls with summarizer rules,
While orchestrators alone may dream and grow,
Sub-agents stay silent, swift, and low.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a summarizer LLM, tool-call digests, and orchestrator-only reflection, which directly aligns with the primary objectives in the linked issue `#1419`.
Linked Issues check	✅ Passed	All major objectives from issue `#1419` are met: SummarizerProvider trait and ConfiguredSummarizer wrapper added [summarizer/mod.rs]; tool-call compression via compress_tool_calls and render_tool_digests implemented [summarizer/digest.rs]; reflection/ingest limited to orchestrator agent [builder.rs, turn.rs]; background-first preserved with fire-and-forget hooks; heuristic fallback intact when no summarizer configured; telemetry logs added for summarizer dispatch/latency/context-fill; comprehensive test coverage added [reflection_tests.rs, summarizer/tests.rs].
Out of Scope Changes check	✅ Passed	All code changes are directly scoped to the objectives in issue `#1419`: new summarizer infrastructure, tool-call digest compression, orchestrator-gating, transcript-ingest near-duplicate merge stage, and observability. Configuration schema updates (learning.rs, schema/mod.rs) and re-export adjustments (config/mod.rs, learning/mod.rs) properly support the feature. No unrelated changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/harness/session/builder.rs`:
- Around line 825-857: The current builder only constructs a SummarizerProvider
when learning.summarizer.source == SummarizerSource::Cloud, leaving a
local-enabled config ineffective because ReflectionHook::run_reflection checks
self.summarizer.is_some(); update the builder to either (A) instantiate and
assign a local implementation of crate::openhuman::learning::SummarizerProvider
when config.learning.summarizer.source == SummarizerSource::Local (e.g., wrap
the existing local summarizer implementation used by ReflectionHook into an
Arc<dyn SummarizerProvider> similar to ConfiguredSummarizer) or (B) explicitly
reject/return an Err when config.learning.summarizer.enabled == true and source
== Local so the config cannot silently be no-op; modify the Summarizer creation
block (the let summarizer: Option<Arc<dyn
crate::openhuman::learning::SummarizerProvider>> = ...) to implement one of
these behaviors so ReflectionHook::run_reflection sees a Some(...) for local
setups or the config fails fast.

In `@src/openhuman/learning/summarizer/digest.rs`:
- Around line 62-99: The code currently records max durations in max_dur and
assigns that to ToolCallDigest.p95_duration_ms; instead collect a Vec<u64> of
durations per tool while iterating tool_calls (e.g., create a HashMap<String,
Vec<u64>> durations_by_name), push record.duration_ms into it for each record,
then after the loop compute the 95th percentile per tool by sorting the vector
and selecting the value at index = ((n as f64 * 0.95).ceil() as
usize).saturating_sub(1) (or equivalent) and assign that value to
entry.p95_duration_ms on the corresponding ToolCallDigest; remove the max_dur
logic and ensure types match u64 when assigning to p95_duration_ms.
- Around line 83-87: The code is pushing raw serialized tool arguments
(record.arguments.to_string()) into entry.sample_inputs which can leak secrets;
instead sanitize/redact sensitive fields before truncation and pushing. Update
the logic around the sample push (where
entry.sample_inputs.push(truncate(&record.arguments.to_string(), ...))) to call
a sanitizer function (e.g., sanitize_arguments or redact_sensitive_fields) that
removes or masks known secret keys/patterns (authorization, api_key,
access_token, bearer tokens, emails, long hex strings) and returns a safe
string, then pass that sanitized string into truncate and push; ensure the
sanitizer is deterministic and documented and reuse it anywhere else arguments
are sampled.

In `@src/openhuman/learning/summarizer/mod.rs`:
- Around line 18-145: Move the operational summarizer implementation out of
mod.rs into a new ops.rs (or summarizer_ops.rs) and keep mod.rs export-focused:
leave the trait SummarizerProvider, public constants
MIN_SUMMARIZER_CONTEXT_CHARS and DEFAULT_SUMMARIZER_CONTEXT_CHARS, and pub use
exports in mod.rs, then relocate ConfiguredSummarizer struct, its impl/new() and
the async_trait impl (context_window_chars, label, prompt) plus runtime logging
and tests to the new ops file; update mod.rs to pub mod ops; and adjust any
imports to reference ops::ConfiguredSummarizer (or re-export it from mod.rs) so
callers keep the same public API.

In `@src/openhuman/learning/transcript_ingest/merge.rs`:
- Around line 61-70: collapse_near_duplicates currently only toggles
MergeReport.used_summarizer based on summarizer.is_some() but never actually
calls the SummarizerProvider, so ingest_*_with_summarizer is a no-op; fix by
invoking the summarizer when you collapse/merge a group: where you compute the
merged content for a group inside collapse_near_duplicates (and the similar
block around the other occurrence noted), call summarizer.unwrap().prompt(...)
(or the appropriate SummarizerProvider::prompt method) with the group texts,
replace the merged text with the prompt result, and ensure
MergeReport.used_summarizer reflects actual use only when the prompt is invoked.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 423267af-bfe6-4a34-a673-62cca549b2ef

📥 Commits

Reviewing files that changed from the base of the PR and between 99d2015 and 015b851.

📒 Files selected for processing (13)

src/openhuman/agent/harness/session/builder.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/config/mod.rs
src/openhuman/config/schema/learning.rs
src/openhuman/config/schema/mod.rs
src/openhuman/learning/mod.rs
src/openhuman/learning/reflection.rs
src/openhuman/learning/reflection_tests.rs
src/openhuman/learning/summarizer/digest.rs
src/openhuman/learning/summarizer/mod.rs
src/openhuman/learning/summarizer/tests.rs
src/openhuman/learning/transcript_ingest/merge.rs
src/openhuman/learning/transcript_ingest/mod.rs

coderabbitai · 2026-05-12T06:46:28Z

+                // #1419: opt-in dedicated summarizer. When the cloud
+                // source is selected we reuse the same routed provider
+                // and just route the call through a different model
+                // hint. Local summarizers are handled inside
+                // ReflectionHook itself via the legacy local-AI path,
+                // so we skip the trait wiring there.
+                let summarizer: Option<Arc<dyn crate::openhuman::learning::SummarizerProvider>> =
+                    if config.learning.summarizer.enabled
+                        && config.learning.summarizer.source
+                            == crate::openhuman::config::SummarizerSource::Cloud
+                    {
+                        let provider: Arc<dyn crate::openhuman::providers::Provider> =
+                            match reflection_provider.clone() {
+                                Some(p) => p,
+                                None => Arc::from(providers::create_routed_provider(
+                                    config.api_url.as_deref(),
+                                    config.api_key.as_deref(),
+                                    &config.reliability,
+                                    &config.model_routes,
+                                    &model_name,
+                                )?),
+                            };
+                        Some(Arc::new(
+                            crate::openhuman::learning::ConfiguredSummarizer::new(
+                                provider,
+                                config.learning.summarizer.model_hint.clone(),
+                                config.learning.summarizer.max_context_chars,
+                                "reflection-summarizer",
+                            ),
+                        ))
+                    } else {
+                        None
+                    };


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Local summarizer config is still a no-op.

This only builds a dedicated SummarizerProvider for SummarizerSource::Cloud, but ReflectionHook::run_reflection only takes the dedicated path when self.summarizer.is_some(). With learning.summarizer.enabled = true and source = Local, reflection falls back to the legacy route and ignores the summarizer model hint/context cap entirely. Please either wire a local SummarizerProvider here or reject that config explicitly so it cannot silently do nothing.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/harness/session/builder.rs` around lines 825 - 857, The current builder only constructs a SummarizerProvider when learning.summarizer.source == SummarizerSource::Cloud, leaving a local-enabled config ineffective because ReflectionHook::run_reflection checks self.summarizer.is_some(); update the builder to either (A) instantiate and assign a local implementation of crate::openhuman::learning::SummarizerProvider when config.learning.summarizer.source == SummarizerSource::Local (e.g., wrap the existing local summarizer implementation used by ReflectionHook into an Arc<dyn SummarizerProvider> similar to ConfiguredSummarizer) or (B) explicitly reject/return an Err when config.learning.summarizer.enabled == true and source == Local so the config cannot silently be no-op; modify the Summarizer creation block (the let summarizer: Option<Arc<dyn crate::openhuman::learning::SummarizerProvider>> = ...) to implement one of these behaviors so ReflectionHook::run_reflection sees a Some(...) for local setups or the config fails fast.

coderabbitai · 2026-05-12T06:46:28Z

+    let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new();
+
+    for record in tool_calls {
+        let entry = by_name
+            .entry(record.name.clone())
+            .or_insert_with(|| ToolCallDigest {
+                name: record.name.clone(),
+                count: 0,
+                success_count: 0,
+                p95_duration_ms: 0,
+                sample_inputs: Vec::new(),
+                sample_outputs: Vec::new(),
+            });
+        entry.count += 1;
+        if record.success {
+            entry.success_count += 1;
+        }
+        let cap = max_dur.entry(record.name.clone()).or_insert(0);
+        if record.duration_ms > *cap {
+            *cap = record.duration_ms;
+        }
+        if entry.sample_inputs.len() < SAMPLES_PER_TOOL {
+            entry
+                .sample_inputs
+                .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
+        }
+        if entry.sample_outputs.len() < SAMPLES_PER_TOOL {
+            entry
+                .sample_outputs
+                .push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS));
+        }
+    }
+
+    for (name, dur) in max_dur {
+        if let Some(entry) = by_name.get_mut(&name) {
+            entry.p95_duration_ms = dur;
+        }
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

p95_duration_ms is currently computed as max, not p95.

Line 79 and Line 95 make this a max-duration metric for every tool, which mislabels telemetry/prompt stats and can skew downstream reflection logic.

💡 Proposed fix (compute true p95 per tool)

- let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new(); + let mut durations: std::collections::HashMap<String, Vec<u64>> = + std::collections::HashMap::new(); @@ - let cap = max_dur.entry(record.name.clone()).or_insert(0); - if record.duration_ms > *cap { - *cap = record.duration_ms; - } + durations + .entry(record.name.clone()) + .or_default() + .push(record.duration_ms); @@ - for (name, dur) in max_dur { + for (name, mut vals) in durations { + vals.sort_unstable(); + let idx = (((vals.len() as f64) * 0.95).ceil() as usize).saturating_sub(1); + let p95 = vals[idx.min(vals.len() - 1)]; if let Some(entry) = by_name.get_mut(&name) { - entry.p95_duration_ms = dur; + entry.p95_duration_ms = p95; } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new();

for record in tool_calls {

let entry = by_name

.entry(record.name.clone())

.or_insert_with(|| ToolCallDigest {

name: record.name.clone(),

count: 0,

success_count: 0,

p95_duration_ms: 0,

sample_inputs: Vec::new(),

sample_outputs: Vec::new(),

});

entry.count += 1;

if record.success {

entry.success_count += 1;

}

let cap = max_dur.entry(record.name.clone()).or_insert(0);

if record.duration_ms > *cap {

*cap = record.duration_ms;

}

if entry.sample_inputs.len() < SAMPLES_PER_TOOL {

entry

.sample_inputs

.push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));

}

if entry.sample_outputs.len() < SAMPLES_PER_TOOL {

entry

.sample_outputs

.push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS));

}

}

for (name, dur) in max_dur {

if let Some(entry) = by_name.get_mut(&name) {

entry.p95_duration_ms = dur;

}

}

let mut durations: std::collections::HashMap<String, Vec<u64>> =

std::collections::HashMap::new();

for record in tool_calls {

let entry = by_name

.entry(record.name.clone())

.or_insert_with(|| ToolCallDigest {

name: record.name.clone(),

count: 0,

success_count: 0,

p95_duration_ms: 0,

sample_inputs: Vec::new(),

sample_outputs: Vec::new(),

});

entry.count += 1;

if record.success {

entry.success_count += 1;

}

durations

.entry(record.name.clone())

.or_default()

.push(record.duration_ms);

if entry.sample_inputs.len() < SAMPLES_PER_TOOL {

entry

.sample_inputs

.push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));

}

if entry.sample_outputs.len() < SAMPLES_PER_TOOL {

entry

.sample_outputs

.push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS));

}

}

for (name, mut vals) in durations {

vals.sort_unstable();

let idx = (((vals.len() as f64) * 0.95).ceil() as usize).saturating_sub(1);

let p95 = vals[idx.min(vals.len() - 1)];

if let Some(entry) = by_name.get_mut(&name) {

entry.p95_duration_ms = p95;

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/learning/summarizer/digest.rs` around lines 62 - 99, The code currently records max durations in max_dur and assigns that to ToolCallDigest.p95_duration_ms; instead collect a Vec<u64> of durations per tool while iterating tool_calls (e.g., create a HashMap<String, Vec<u64>> durations_by_name), push record.duration_ms into it for each record, then after the loop compute the 95th percentile per tool by sorting the vector and selecting the value at index = ((n as f64 * 0.95).ceil() as usize).saturating_sub(1) (or equivalent) and assign that value to entry.p95_duration_ms on the corresponding ToolCallDigest; remove the max_dur logic and ensure types match u64 when assigning to p95_duration_ms.

coderabbitai · 2026-05-12T06:46:28Z

+        if entry.sample_inputs.len() < SAMPLES_PER_TOOL {
+            entry
+                .sample_inputs
+                .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raw tool arguments are copied into digest samples without sanitization.

Line 86 serializes full argument JSON into summarizer prompts. This can leak API keys, bearer tokens, emails, or other sensitive payloads when cloud summarizer routing is enabled.

💡 Proposed hardening (sanitize before sampling)

- entry - .sample_inputs - .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS)); + let safe_input = sanitize_input_sample(&record.arguments.to_string()); + entry + .sample_inputs + .push(truncate(&safe_input, SAMPLE_OUTPUT_CHARS));

+fn sanitize_input_sample(raw: &str) -> String { + let lower = raw.to_ascii_lowercase(); + let secret_markers = [ + "authorization", "bearer ", "token", "api_key", "apikey", "secret", "password", "sk-", "ghp_", + ]; + if secret_markers.iter().any(|m| lower.contains(m)) { + "[redacted: potential secret]".to_string() + } else { + raw.to_string() + } +}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/learning/summarizer/digest.rs` around lines 83 - 87, The code is pushing raw serialized tool arguments (record.arguments.to_string()) into entry.sample_inputs which can leak secrets; instead sanitize/redact sensitive fields before truncation and pushing. Update the logic around the sample push (where entry.sample_inputs.push(truncate(&record.arguments.to_string(), ...))) to call a sanitizer function (e.g., sanitize_arguments or redact_sensitive_fields) that removes or masks known secret keys/patterns (authorization, api_key, access_token, bearer tokens, emails, long hex strings) and returns a safe string, then pass that sanitized string into truncate and push; ensure the sanitizer is deterministic and documented and reuse it anywhere else arguments are sampled.

coderabbitai · 2026-05-12T06:46:28Z

+pub mod digest;
+
+use crate::openhuman::providers::Provider;
+use async_trait::async_trait;
+use std::sync::Arc;
+
+pub use digest::{compress_tool_calls, render_tool_digests, ToolCallDigest};
+
+/// Minimum context-window cap (in characters) we'll honour. Anything
+/// smaller would clip even a one-paragraph reflection prompt, so we
+/// floor at this value rather than failing silently.
+pub const MIN_SUMMARIZER_CONTEXT_CHARS: usize = 1024;
+
+/// Default character cap for the summarizer's context window. Tuned to
+/// fit comfortably inside a small local model (e.g. 4k-token Ollama
+/// chat models) after prompt scaffolding and reserved completion space.
+pub const DEFAULT_SUMMARIZER_CONTEXT_CHARS: usize = 6_000;
+
+/// Trait for a cheap summarizer model, distinct from the orchestrator
+/// provider. Implementations promise:
+///
+/// - **Cheap and fast** — callers fire this in the background on every
+///   threshold crossing, so latency / cost matters more than peak
+///   quality.
+/// - **Bounded context** — [`context_window_chars`] is the *callable*
+///   budget for `prompt`. Callers must compress inputs (see [`digest`])
+///   to fit.
+/// - **Short outputs** — the trait targets sub-paragraph summaries.
+///   `prompt` returns the raw model response; parsing is the caller's
+///   responsibility.
+#[async_trait]
+pub trait SummarizerProvider: Send + Sync {
+    /// Approximate character budget for `prompt` input. Callers must
+    /// truncate / digest inputs to stay under this cap; the
+    /// implementation makes no guarantee about behaviour when the cap
+    /// is exceeded.
+    fn context_window_chars(&self) -> usize;
+
+    /// Human-readable identifier used in logs / telemetry. Should not
+    /// include credentials or absolute paths.
+    fn label(&self) -> &str;
+
+    /// Run a one-shot summarization. Returns the model's raw response.
+    async fn prompt(&self, prompt: &str) -> anyhow::Result<String>;
+}
+
+/// Thin [`SummarizerProvider`] backed by an existing
+/// [`Provider`]. Used by the cloud path; the local path is wired
+/// separately by the caller (see `learning::reflection`).
+pub struct ConfiguredSummarizer {
+    provider: Arc<dyn Provider>,
+    model_hint: String,
+    context_window_chars: usize,
+    label: String,
+}
+
+impl ConfiguredSummarizer {
+    /// Construct a configured summarizer.
+    ///
+    /// `context_window_chars` is clamped to at least
+    /// [`MIN_SUMMARIZER_CONTEXT_CHARS`] so a misconfigured value cannot
+    /// produce zero-byte truncation downstream.
+    pub fn new(
+        provider: Arc<dyn Provider>,
+        model_hint: impl Into<String>,
+        context_window_chars: usize,
+        label: impl Into<String>,
+    ) -> Self {
+        Self {
+            provider,
+            model_hint: model_hint.into(),
+            context_window_chars: context_window_chars.max(MIN_SUMMARIZER_CONTEXT_CHARS),
+            label: label.into(),
+        }
+    }
+}
+
+#[async_trait]
+impl SummarizerProvider for ConfiguredSummarizer {
+    fn context_window_chars(&self) -> usize {
+        self.context_window_chars
+    }
+
+    fn label(&self) -> &str {
+        &self.label
+    }
+
+    async fn prompt(&self, prompt: &str) -> anyhow::Result<String> {
+        let started = std::time::Instant::now();
+        let input_chars = prompt.chars().count();
+        let fill_ratio = (input_chars as f32) / (self.context_window_chars.max(1) as f32);
+        log::debug!(
+            "[summarizer] dispatch label={} model={} input_chars={} cap={} fill={:.2}",
+            self.label,
+            self.model_hint,
+            input_chars,
+            self.context_window_chars,
+            fill_ratio,
+        );
+        let result = self
+            .provider
+            .simple_chat(prompt, &self.model_hint, 0.2)
+            .await;
+        let elapsed = started.elapsed();
+        match &result {
+            Ok(out) => log::info!(
+                "[summarizer] label={} model={} input_chars={} output_chars={} latency_ms={}",
+                self.label,
+                self.model_hint,
+                input_chars,
+                out.chars().count(),
+                elapsed.as_millis(),
+            ),
+            Err(e) => log::warn!(
+                "[summarizer] label={} model={} input_chars={} latency_ms={} error={e}",
+                self.label,
+                self.model_hint,
+                input_chars,
+                elapsed.as_millis(),
+            ),
+        }
+        result
+    }
+}
+
+#[cfg(test)]
+#[path = "tests.rs"]
+mod tests;


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Split the operational summarizer code out of mod.rs.

This new module puts the trait, implementation, constants, and runtime logging directly in src/openhuman/learning/summarizer/mod.rs. It would be better to keep mod.rs export-focused and move the concrete code into sibling files now, while the surface is still small.

As per coding guidelines, "src/openhuman/**/mod.rs: Keep domain mod.rs export-focused with light exports; place operational code in ops.rs, store.rs, types.rs, etc."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/learning/summarizer/mod.rs` around lines 18 - 145, Move the operational summarizer implementation out of mod.rs into a new ops.rs (or summarizer_ops.rs) and keep mod.rs export-focused: leave the trait SummarizerProvider, public constants MIN_SUMMARIZER_CONTEXT_CHARS and DEFAULT_SUMMARIZER_CONTEXT_CHARS, and pub use exports in mod.rs, then relocate ConfiguredSummarizer struct, its impl/new() and the async_trait impl (context_window_chars, label, prompt) plus runtime logging and tests to the new ops file; update mod.rs to pub mod ops; and adjust any imports to reference ops::ConfiguredSummarizer (or re-export it from mod.rs) so callers keep the same public API.

coderabbitai · 2026-05-12T06:46:28Z

+pub async fn collapse_near_duplicates(
+    candidates: Vec<MemoryCandidate>,
+    summarizer: Option<&Arc<dyn SummarizerProvider>>,
+) -> (Vec<MemoryCandidate>, MergeReport) {
+    let mut report = MergeReport {
+        input: candidates.len(),
+        output: candidates.len(),
+        merged_pairs: 0,
+        used_summarizer: summarizer.is_some(),
+    };


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The transcript-ingest summarizer path never actually runs.

collapse_near_duplicates only records summarizer.is_some() in used_summarizer; the merge behavior is identical either way and no SummarizerProvider::prompt(...) call happens. That makes ingest_*_with_summarizer a behavioral no-op while the report/logging claims a summarizer was used. Either invoke the summarizer when collapsing a group, or keep this API/reporting purely heuristic until the LLM rewrite path is implemented.

Also applies to: 137-144

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/learning/transcript_ingest/merge.rs` around lines 61 - 70, collapse_near_duplicates currently only toggles MergeReport.used_summarizer based on summarizer.is_some() but never actually calls the SummarizerProvider, so ingest_*_with_summarizer is a no-op; fix by invoking the summarizer when you collapse/merge a group: where you compute the merged content for a group inside collapse_near_duplicates (and the similar block around the other occurrence noted), call summarizer.unwrap().prompt(...) (or the appropriate SummarizerProvider::prompt method) with the group texts, replace the merged text with the prompt result, and ensure MergeReport.used_summarizer reflects actual use only when the prompt is invoked.

senamakel requested a review from a team May 12, 2026 06:41

coderabbitai Bot requested changes May 12, 2026

View reviewed changes

senamakel marked this pull request as draft May 12, 2026 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(learning): summarizer LLM, tool-call digests, orchestrator-only reflection (#1419)#1519

feat(learning): summarizer LLM, tool-call digests, orchestrator-only reflection (#1419)#1519
senamakel wants to merge 1 commit into
tinyhumansai:mainfrom
senamakel:issue/1419-optimize-agent-learnings-summarizer-llm

senamakel commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading