Skip to content

feat(learning): summarizer LLM, tool-call digests, orchestrator-only reflection (#1419)#1519

Draft
senamakel wants to merge 1 commit into
tinyhumansai:mainfrom
senamakel:issue/1419-optimize-agent-learnings-summarizer-llm
Draft

feat(learning): summarizer LLM, tool-call digests, orchestrator-only reflection (#1419)#1519
senamakel wants to merge 1 commit into
tinyhumansai:mainfrom
senamakel:issue/1419-optimize-agent-learnings-summarizer-llm

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 12, 2026

Summary

  • New SummarizerProvider trait + ConfiguredSummarizer wrapper. Reflection and transcript-ingest can route through a cheap dedicated model instead of burning orchestrator-tier inference. Opt-in via learning.summarizer.enabled — defaults preserve the heuristic-only path from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406.
  • Tool-call digest layer collapses raw multi-call tool history into per-tool aggregates (count, success rate, p95 duration, bounded samples) so the summarizer's smaller context window never sees raw payloads.
  • Reflection hook + spawn_transcript_ingestion gated to the user-facing orchestrator only — sub-agent transcripts no longer trigger reflection writes.
  • Two-stage transcript ingest: heuristic stage (unchanged) + Stage 2 near-duplicate merge over Jaccard similarity. Paraphrases of the same preference / decision collapse into one merged candidate with combined provenance before dedupe + persist.
  • Telemetry: summarizer dispatches log label, model hint, input chars, cap, fill ratio, latency. Merge pass logs compressed-pair counts.

Problem

Reflection and transcript-ingest fire often (per session-memory threshold crossing, on transcript close, on segment close). Running them on the same high-tier model the orchestrator uses is wasteful — most of what reflection produces is short, structured summaries that a cheap model handles fine. Two other pressure points compound the cost: raw tool-call history is the dominant token sink, and reflection currently fires on every agent (orchestrator + sub-agents + specialists) even though sub-agent transcripts almost never carry durable user context worth surfacing across chats.

See #1419 for the full motivation and constraints (background-first, swappable cloud vs Ollama, opt-out-able with heuristic fallback).

Solution

  • src/openhuman/learning/summarizer/: trait + cloud-backed impl. ConfiguredSummarizer wraps any Provider and routes via a model hint (hint:fast by default), with a clamped context-window cap so misconfiguration can't produce zero-byte truncation.
  • summarizer::digest: compress_tool_calls groups by tool name, keeps SAMPLES_PER_TOOL (2) bounded samples per tool, truncates outputs to 160 chars. render_tool_digests produces a compact prompt block.
  • learning::reflection: when a summarizer is supplied via ReflectionHook::with_summarizer, the reflection LLM call is routed through it (with chars-cap truncation as a safety net). Otherwise the legacy local/cloud routing applies. The reflection prompt now compresses repeated tool calls into digests; the legacy per-call line is preserved when every tool was invoked at most once.
  • agent::harness::session::builder: only registers ReflectionHook when agent_id == "orchestrator". Optionally wraps the routed cloud provider in ConfiguredSummarizer when learning.summarizer.enabled && source == Cloud.
  • agent::harness::session::turn::spawn_transcript_ingestion: early-return when the agent isn't the orchestrator.
  • learning::transcript_ingest::merge: Jaccard-similarity (0.55 threshold) merge step before dedupe/persist. Collapses same-kind near-duplicates, picks the longest snippet as representative, unions provenance message indices, promotes importance to High if any member of the group was High.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — new modules ship with unit tests; reflection routing has an end-to-end test through the summarizer trait.
  • Coverage matrix updated — N/A: behaviour-only change extending existing learning subsystem coverage rows.
  • All affected feature IDs from the matrix are listed in the PR description under ## Related
  • No new external network dependencies introduced
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: background pipeline; no release-cut surface.
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Runtime: Rust core only (desktop). No frontend, no Tauri shell changes. Hooks remain tokio::spawn fire-and-forget, so orchestrator turn latency is unaffected.
  • Compatibility: opt-in. With learning.summarizer.enabled = false (default) the only behavioural changes are (a) reflection prompts use per-tool digests when a turn has repeated calls, and (b) sub-agents no longer trigger reflection / transcript-ingest. Heuristic fallback from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406 stays intact for offline users.
  • Cost: when enabled, frequent reflection / transcript-ingest passes route to the cheap-tier model via hint:fast instead of the orchestrator's reasoning model.

Related

Closes #1419
Builds on #1406 (heuristic transcript-to-memory ingestion pipeline).

Test plan

  • cargo test --lib openhuman::learning -- --test-threads=1 — 132 passed
  • cargo check --manifest-path Cargo.toml — clean
  • cargo check --manifest-path app/src-tauri/Cargo.toml — clean
  • cargo fmt — clean

Summary by CodeRabbit

  • New Features

    • Added optional dedicated summarizer configuration for learning and reflection operations with cloud or local routing options.
    • Implemented tool call compression in reflection prompts for improved efficiency.
    • Added transcript near-duplicate detection and merging to reduce redundant memory extraction.
  • Bug Fixes

    • Restricted reflection and transcript ingestion to orchestrator agents, preventing unintended execution on sub-agents.

Review Change Stack

…reflection (tinyhumansai#1419)

- New `SummarizerProvider` trait + `ConfiguredSummarizer` wrapper so
  reflection + transcript-ingest can run on a cheap dedicated model
  instead of the orchestrator-tier provider. Opt-in via
  `learning.summarizer.enabled`; heuristic fallback from tinyhumansai#1406 stays
  intact for offline users.
- Tool-call digest layer (`learning::summarizer::digest`) collapses
  multi-call tool history into per-tool aggregates (count, success
  rate, p95 duration, bounded input/output samples) so the smaller
  summarizer context window never sees raw payloads.
- Reflection prompt now compresses repeated tool calls into digests
  before the LLM call; preserves the legacy per-call rendering when
  every tool was invoked at most once (keeps existing assertions
  green).
- Reflection hook + transcript-ingestion spawn are gated to the
  orchestrator agent only — sub-agent transcripts no longer trigger
  reflection or conversational-memory writes.
- Two-stage transcript ingest: heuristic Stage 1 (unchanged) +
  Stage 2 near-duplicate merge over Jaccard similarity on tokens.
  Collapses paraphrases of the same preference / decision before
  dedupe + persist.
- Telemetry: summarizer dispatches log label, model hint, input
  chars, cap, fill ratio, latency. Merge pass logs compressed-pair
  counts.
- Tests: per-tool digest grouping/truncation, summarizer routing
  end-to-end, prompt compression on repeated tool calls, near-duplicate
  preference collapse with provenance merging.
@senamakel senamakel requested a review from a team May 12, 2026 06:41
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

This PR implements a dedicated cheap summarizer LLM for agent learning: adding a pluggable SummarizerProvider abstraction configurable per workspace, compressing tool-call history into per-tool digests before reflection/ingestion processes transcripts, and restricting both reflection and transcript ingestion to the orchestrator agent to reduce low-signal extractions from sub-agents.

Changes

Dedicated Summarizer for Reflection & Transcript Ingest

Layer / File(s) Summary
Summarizer configuration contract
src/openhuman/config/schema/learning.rs, src/openhuman/config/schema/mod.rs, src/openhuman/config/mod.rs, src/openhuman/learning/mod.rs
SummarizerConfig struct and SummarizerSource enum are added to the learning schema (cloud vs. local, disabled by default, "hint:fast" model, 6000 char context) and re-exported through config and learning module hierarchies.
SummarizerProvider trait and ConfiguredSummarizer implementation
src/openhuman/learning/summarizer/mod.rs, src/openhuman/learning/summarizer/tests.rs
SummarizerProvider trait defines context-window budget, label, and one-shot async prompt API; ConfiguredSummarizer wraps an existing provider, dispatches via simple_chat, logs input/output sizes and latency, and clamps context window to a minimum.
Tool-call digest compression
src/openhuman/learning/summarizer/digest.rs
ToolCallDigest aggregates tool name, invocation/success counts, p95 duration, and bounded truncated input/output samples; compress_tool_calls groups by tool name deterministically and accumulates statistics; render_tool_digests formats digests into compact bullet-style prompt blocks.
Reflection hook summarizer integration
src/openhuman/learning/reflection.rs, src/openhuman/learning/reflection_tests.rs
ReflectionHook gains optional SummarizerProvider field via new with_summarizer constructor; reflection prompt generation compresses repeated tool calls into per-tool digests when appropriate; run_reflection routes through summarizer when present, truncating to its context window before dispatching.
Transcript ingest near-duplicate merge stage
src/openhuman/learning/transcript_ingest/merge.rs
New stage-2 merge module collapses near-duplicate MemoryCandidates using Jaccard-similarity threshold over tokenized, stopword-filtered content; selects longest content as representative and merges provenance indices.
Transcript ingest API and merge stage wiring
src/openhuman/learning/transcript_ingest/mod.rs
New public APIs ingest_transcript_path_with_summarizer and ingest_session_transcript_with_summarizer accept optional SummarizerProvider and pass it into stage-2 merge; existing functions delegate to new summarizer-aware versions with None.
Session agent orchestrator-only gating
src/openhuman/agent/harness/session/builder.rs, src/openhuman/agent/harness/session/turn.rs
Reflection hook registration is gated to is_orchestrator agent check and conditionally wires optional dedicated summarizer provider (resolving/creating routed provider when cloud source enabled); transcript ingestion spawn is gated to orchestrator agent only. Both log debug messages when skipped.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • tinyhumansai/openhuman#1406: Predecessor transcript-ingest heuristic pipeline that this PR extends with LLM-driven extraction via the new stage-2 merge and summarizer APIs.
  • tinyhumansai/openhuman#1344: Modifies agent session builder reflection wiring to inject reflection chunks; this PR modifies the same builder path to gate reflection to orchestrator and wire optional summarizer provider.
  • tinyhumansai/openhuman#1460: Introduces summarizer types and config that align with and extend the new summarizer abstraction, provider, and configuration introduced in this PR.

Poem

🐰 A rabbit hops through transcripts, snippets, tools—
Compressing calls with summarizer rules,
While orchestrators alone may dream and grow,
Sub-agents stay silent, swift, and low.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a summarizer LLM, tool-call digests, and orchestrator-only reflection, which directly aligns with the primary objectives in the linked issue #1419.
Linked Issues check ✅ Passed All major objectives from issue #1419 are met: SummarizerProvider trait and ConfiguredSummarizer wrapper added [summarizer/mod.rs]; tool-call compression via compress_tool_calls and render_tool_digests implemented [summarizer/digest.rs]; reflection/ingest limited to orchestrator agent [builder.rs, turn.rs]; background-first preserved with fire-and-forget hooks; heuristic fallback intact when no summarizer configured; telemetry logs added for summarizer dispatch/latency/context-fill; comprehensive test coverage added [reflection_tests.rs, summarizer/tests.rs].
Out of Scope Changes check ✅ Passed All code changes are directly scoped to the objectives in issue #1419: new summarizer infrastructure, tool-call digest compression, orchestrator-gating, transcript-ingest near-duplicate merge stage, and observability. Configuration schema updates (learning.rs, schema/mod.rs) and re-export adjustments (config/mod.rs, learning/mod.rs) properly support the feature. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/harness/session/builder.rs`:
- Around line 825-857: The current builder only constructs a SummarizerProvider
when learning.summarizer.source == SummarizerSource::Cloud, leaving a
local-enabled config ineffective because ReflectionHook::run_reflection checks
self.summarizer.is_some(); update the builder to either (A) instantiate and
assign a local implementation of crate::openhuman::learning::SummarizerProvider
when config.learning.summarizer.source == SummarizerSource::Local (e.g., wrap
the existing local summarizer implementation used by ReflectionHook into an
Arc<dyn SummarizerProvider> similar to ConfiguredSummarizer) or (B) explicitly
reject/return an Err when config.learning.summarizer.enabled == true and source
== Local so the config cannot silently be no-op; modify the Summarizer creation
block (the let summarizer: Option<Arc<dyn
crate::openhuman::learning::SummarizerProvider>> = ...) to implement one of
these behaviors so ReflectionHook::run_reflection sees a Some(...) for local
setups or the config fails fast.

In `@src/openhuman/learning/summarizer/digest.rs`:
- Around line 62-99: The code currently records max durations in max_dur and
assigns that to ToolCallDigest.p95_duration_ms; instead collect a Vec<u64> of
durations per tool while iterating tool_calls (e.g., create a HashMap<String,
Vec<u64>> durations_by_name), push record.duration_ms into it for each record,
then after the loop compute the 95th percentile per tool by sorting the vector
and selecting the value at index = ((n as f64 * 0.95).ceil() as
usize).saturating_sub(1) (or equivalent) and assign that value to
entry.p95_duration_ms on the corresponding ToolCallDigest; remove the max_dur
logic and ensure types match u64 when assigning to p95_duration_ms.
- Around line 83-87: The code is pushing raw serialized tool arguments
(record.arguments.to_string()) into entry.sample_inputs which can leak secrets;
instead sanitize/redact sensitive fields before truncation and pushing. Update
the logic around the sample push (where
entry.sample_inputs.push(truncate(&record.arguments.to_string(), ...))) to call
a sanitizer function (e.g., sanitize_arguments or redact_sensitive_fields) that
removes or masks known secret keys/patterns (authorization, api_key,
access_token, bearer tokens, emails, long hex strings) and returns a safe
string, then pass that sanitized string into truncate and push; ensure the
sanitizer is deterministic and documented and reuse it anywhere else arguments
are sampled.

In `@src/openhuman/learning/summarizer/mod.rs`:
- Around line 18-145: Move the operational summarizer implementation out of
mod.rs into a new ops.rs (or summarizer_ops.rs) and keep mod.rs export-focused:
leave the trait SummarizerProvider, public constants
MIN_SUMMARIZER_CONTEXT_CHARS and DEFAULT_SUMMARIZER_CONTEXT_CHARS, and pub use
exports in mod.rs, then relocate ConfiguredSummarizer struct, its impl/new() and
the async_trait impl (context_window_chars, label, prompt) plus runtime logging
and tests to the new ops file; update mod.rs to pub mod ops; and adjust any
imports to reference ops::ConfiguredSummarizer (or re-export it from mod.rs) so
callers keep the same public API.

In `@src/openhuman/learning/transcript_ingest/merge.rs`:
- Around line 61-70: collapse_near_duplicates currently only toggles
MergeReport.used_summarizer based on summarizer.is_some() but never actually
calls the SummarizerProvider, so ingest_*_with_summarizer is a no-op; fix by
invoking the summarizer when you collapse/merge a group: where you compute the
merged content for a group inside collapse_near_duplicates (and the similar
block around the other occurrence noted), call summarizer.unwrap().prompt(...)
(or the appropriate SummarizerProvider::prompt method) with the group texts,
replace the merged text with the prompt result, and ensure
MergeReport.used_summarizer reflects actual use only when the prompt is invoked.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 423267af-bfe6-4a34-a673-62cca549b2ef

📥 Commits

Reviewing files that changed from the base of the PR and between 99d2015 and 015b851.

📒 Files selected for processing (13)
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/config/mod.rs
  • src/openhuman/config/schema/learning.rs
  • src/openhuman/config/schema/mod.rs
  • src/openhuman/learning/mod.rs
  • src/openhuman/learning/reflection.rs
  • src/openhuman/learning/reflection_tests.rs
  • src/openhuman/learning/summarizer/digest.rs
  • src/openhuman/learning/summarizer/mod.rs
  • src/openhuman/learning/summarizer/tests.rs
  • src/openhuman/learning/transcript_ingest/merge.rs
  • src/openhuman/learning/transcript_ingest/mod.rs

Comment on lines +825 to +857
// #1419: opt-in dedicated summarizer. When the cloud
// source is selected we reuse the same routed provider
// and just route the call through a different model
// hint. Local summarizers are handled inside
// ReflectionHook itself via the legacy local-AI path,
// so we skip the trait wiring there.
let summarizer: Option<Arc<dyn crate::openhuman::learning::SummarizerProvider>> =
if config.learning.summarizer.enabled
&& config.learning.summarizer.source
== crate::openhuman::config::SummarizerSource::Cloud
{
let provider: Arc<dyn crate::openhuman::providers::Provider> =
match reflection_provider.clone() {
Some(p) => p,
None => Arc::from(providers::create_routed_provider(
config.api_url.as_deref(),
config.api_key.as_deref(),
&config.reliability,
&config.model_routes,
&model_name,
)?),
};
Some(Arc::new(
crate::openhuman::learning::ConfiguredSummarizer::new(
provider,
config.learning.summarizer.model_hint.clone(),
config.learning.summarizer.max_context_chars,
"reflection-summarizer",
),
))
} else {
None
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Local summarizer config is still a no-op.

This only builds a dedicated SummarizerProvider for SummarizerSource::Cloud, but ReflectionHook::run_reflection only takes the dedicated path when self.summarizer.is_some(). With learning.summarizer.enabled = true and source = Local, reflection falls back to the legacy route and ignores the summarizer model hint/context cap entirely. Please either wire a local SummarizerProvider here or reject that config explicitly so it cannot silently do nothing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/harness/session/builder.rs` around lines 825 - 857, The
current builder only constructs a SummarizerProvider when
learning.summarizer.source == SummarizerSource::Cloud, leaving a local-enabled
config ineffective because ReflectionHook::run_reflection checks
self.summarizer.is_some(); update the builder to either (A) instantiate and
assign a local implementation of crate::openhuman::learning::SummarizerProvider
when config.learning.summarizer.source == SummarizerSource::Local (e.g., wrap
the existing local summarizer implementation used by ReflectionHook into an
Arc<dyn SummarizerProvider> similar to ConfiguredSummarizer) or (B) explicitly
reject/return an Err when config.learning.summarizer.enabled == true and source
== Local so the config cannot silently be no-op; modify the Summarizer creation
block (the let summarizer: Option<Arc<dyn
crate::openhuman::learning::SummarizerProvider>> = ...) to implement one of
these behaviors so ReflectionHook::run_reflection sees a Some(...) for local
setups or the config fails fast.

Comment on lines +62 to +99
let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new();

for record in tool_calls {
let entry = by_name
.entry(record.name.clone())
.or_insert_with(|| ToolCallDigest {
name: record.name.clone(),
count: 0,
success_count: 0,
p95_duration_ms: 0,
sample_inputs: Vec::new(),
sample_outputs: Vec::new(),
});
entry.count += 1;
if record.success {
entry.success_count += 1;
}
let cap = max_dur.entry(record.name.clone()).or_insert(0);
if record.duration_ms > *cap {
*cap = record.duration_ms;
}
if entry.sample_inputs.len() < SAMPLES_PER_TOOL {
entry
.sample_inputs
.push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
}
if entry.sample_outputs.len() < SAMPLES_PER_TOOL {
entry
.sample_outputs
.push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS));
}
}

for (name, dur) in max_dur {
if let Some(entry) = by_name.get_mut(&name) {
entry.p95_duration_ms = dur;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

p95_duration_ms is currently computed as max, not p95.

Line 79 and Line 95 make this a max-duration metric for every tool, which mislabels telemetry/prompt stats and can skew downstream reflection logic.

💡 Proposed fix (compute true p95 per tool)
-    let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new();
+    let mut durations: std::collections::HashMap<String, Vec<u64>> =
+        std::collections::HashMap::new();
@@
-        let cap = max_dur.entry(record.name.clone()).or_insert(0);
-        if record.duration_ms > *cap {
-            *cap = record.duration_ms;
-        }
+        durations
+            .entry(record.name.clone())
+            .or_default()
+            .push(record.duration_ms);
@@
-    for (name, dur) in max_dur {
+    for (name, mut vals) in durations {
+        vals.sort_unstable();
+        let idx = (((vals.len() as f64) * 0.95).ceil() as usize).saturating_sub(1);
+        let p95 = vals[idx.min(vals.len() - 1)];
         if let Some(entry) = by_name.get_mut(&name) {
-            entry.p95_duration_ms = dur;
+            entry.p95_duration_ms = p95;
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new();
for record in tool_calls {
let entry = by_name
.entry(record.name.clone())
.or_insert_with(|| ToolCallDigest {
name: record.name.clone(),
count: 0,
success_count: 0,
p95_duration_ms: 0,
sample_inputs: Vec::new(),
sample_outputs: Vec::new(),
});
entry.count += 1;
if record.success {
entry.success_count += 1;
}
let cap = max_dur.entry(record.name.clone()).or_insert(0);
if record.duration_ms > *cap {
*cap = record.duration_ms;
}
if entry.sample_inputs.len() < SAMPLES_PER_TOOL {
entry
.sample_inputs
.push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
}
if entry.sample_outputs.len() < SAMPLES_PER_TOOL {
entry
.sample_outputs
.push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS));
}
}
for (name, dur) in max_dur {
if let Some(entry) = by_name.get_mut(&name) {
entry.p95_duration_ms = dur;
}
}
let mut durations: std::collections::HashMap<String, Vec<u64>> =
std::collections::HashMap::new();
for record in tool_calls {
let entry = by_name
.entry(record.name.clone())
.or_insert_with(|| ToolCallDigest {
name: record.name.clone(),
count: 0,
success_count: 0,
p95_duration_ms: 0,
sample_inputs: Vec::new(),
sample_outputs: Vec::new(),
});
entry.count += 1;
if record.success {
entry.success_count += 1;
}
durations
.entry(record.name.clone())
.or_default()
.push(record.duration_ms);
if entry.sample_inputs.len() < SAMPLES_PER_TOOL {
entry
.sample_inputs
.push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
}
if entry.sample_outputs.len() < SAMPLES_PER_TOOL {
entry
.sample_outputs
.push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS));
}
}
for (name, mut vals) in durations {
vals.sort_unstable();
let idx = (((vals.len() as f64) * 0.95).ceil() as usize).saturating_sub(1);
let p95 = vals[idx.min(vals.len() - 1)];
if let Some(entry) = by_name.get_mut(&name) {
entry.p95_duration_ms = p95;
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/learning/summarizer/digest.rs` around lines 62 - 99, The code
currently records max durations in max_dur and assigns that to
ToolCallDigest.p95_duration_ms; instead collect a Vec<u64> of durations per tool
while iterating tool_calls (e.g., create a HashMap<String, Vec<u64>>
durations_by_name), push record.duration_ms into it for each record, then after
the loop compute the 95th percentile per tool by sorting the vector and
selecting the value at index = ((n as f64 * 0.95).ceil() as
usize).saturating_sub(1) (or equivalent) and assign that value to
entry.p95_duration_ms on the corresponding ToolCallDigest; remove the max_dur
logic and ensure types match u64 when assigning to p95_duration_ms.

Comment on lines +83 to +87
if entry.sample_inputs.len() < SAMPLES_PER_TOOL {
entry
.sample_inputs
.push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Raw tool arguments are copied into digest samples without sanitization.

Line 86 serializes full argument JSON into summarizer prompts. This can leak API keys, bearer tokens, emails, or other sensitive payloads when cloud summarizer routing is enabled.

💡 Proposed hardening (sanitize before sampling)
-            entry
-                .sample_inputs
-                .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
+            let safe_input = sanitize_input_sample(&record.arguments.to_string());
+            entry
+                .sample_inputs
+                .push(truncate(&safe_input, SAMPLE_OUTPUT_CHARS));
+fn sanitize_input_sample(raw: &str) -> String {
+    let lower = raw.to_ascii_lowercase();
+    let secret_markers = [
+        "authorization", "bearer ", "token", "api_key", "apikey", "secret", "password", "sk-", "ghp_",
+    ];
+    if secret_markers.iter().any(|m| lower.contains(m)) {
+        "[redacted: potential secret]".to_string()
+    } else {
+        raw.to_string()
+    }
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/learning/summarizer/digest.rs` around lines 83 - 87, The code
is pushing raw serialized tool arguments (record.arguments.to_string()) into
entry.sample_inputs which can leak secrets; instead sanitize/redact sensitive
fields before truncation and pushing. Update the logic around the sample push
(where entry.sample_inputs.push(truncate(&record.arguments.to_string(), ...)))
to call a sanitizer function (e.g., sanitize_arguments or
redact_sensitive_fields) that removes or masks known secret keys/patterns
(authorization, api_key, access_token, bearer tokens, emails, long hex strings)
and returns a safe string, then pass that sanitized string into truncate and
push; ensure the sanitizer is deterministic and documented and reuse it anywhere
else arguments are sampled.

Comment on lines +18 to +145
pub mod digest;

use crate::openhuman::providers::Provider;
use async_trait::async_trait;
use std::sync::Arc;

pub use digest::{compress_tool_calls, render_tool_digests, ToolCallDigest};

/// Minimum context-window cap (in characters) we'll honour. Anything
/// smaller would clip even a one-paragraph reflection prompt, so we
/// floor at this value rather than failing silently.
pub const MIN_SUMMARIZER_CONTEXT_CHARS: usize = 1024;

/// Default character cap for the summarizer's context window. Tuned to
/// fit comfortably inside a small local model (e.g. 4k-token Ollama
/// chat models) after prompt scaffolding and reserved completion space.
pub const DEFAULT_SUMMARIZER_CONTEXT_CHARS: usize = 6_000;

/// Trait for a cheap summarizer model, distinct from the orchestrator
/// provider. Implementations promise:
///
/// - **Cheap and fast** — callers fire this in the background on every
/// threshold crossing, so latency / cost matters more than peak
/// quality.
/// - **Bounded context** — [`context_window_chars`] is the *callable*
/// budget for `prompt`. Callers must compress inputs (see [`digest`])
/// to fit.
/// - **Short outputs** — the trait targets sub-paragraph summaries.
/// `prompt` returns the raw model response; parsing is the caller's
/// responsibility.
#[async_trait]
pub trait SummarizerProvider: Send + Sync {
/// Approximate character budget for `prompt` input. Callers must
/// truncate / digest inputs to stay under this cap; the
/// implementation makes no guarantee about behaviour when the cap
/// is exceeded.
fn context_window_chars(&self) -> usize;

/// Human-readable identifier used in logs / telemetry. Should not
/// include credentials or absolute paths.
fn label(&self) -> &str;

/// Run a one-shot summarization. Returns the model's raw response.
async fn prompt(&self, prompt: &str) -> anyhow::Result<String>;
}

/// Thin [`SummarizerProvider`] backed by an existing
/// [`Provider`]. Used by the cloud path; the local path is wired
/// separately by the caller (see `learning::reflection`).
pub struct ConfiguredSummarizer {
provider: Arc<dyn Provider>,
model_hint: String,
context_window_chars: usize,
label: String,
}

impl ConfiguredSummarizer {
/// Construct a configured summarizer.
///
/// `context_window_chars` is clamped to at least
/// [`MIN_SUMMARIZER_CONTEXT_CHARS`] so a misconfigured value cannot
/// produce zero-byte truncation downstream.
pub fn new(
provider: Arc<dyn Provider>,
model_hint: impl Into<String>,
context_window_chars: usize,
label: impl Into<String>,
) -> Self {
Self {
provider,
model_hint: model_hint.into(),
context_window_chars: context_window_chars.max(MIN_SUMMARIZER_CONTEXT_CHARS),
label: label.into(),
}
}
}

#[async_trait]
impl SummarizerProvider for ConfiguredSummarizer {
fn context_window_chars(&self) -> usize {
self.context_window_chars
}

fn label(&self) -> &str {
&self.label
}

async fn prompt(&self, prompt: &str) -> anyhow::Result<String> {
let started = std::time::Instant::now();
let input_chars = prompt.chars().count();
let fill_ratio = (input_chars as f32) / (self.context_window_chars.max(1) as f32);
log::debug!(
"[summarizer] dispatch label={} model={} input_chars={} cap={} fill={:.2}",
self.label,
self.model_hint,
input_chars,
self.context_window_chars,
fill_ratio,
);
let result = self
.provider
.simple_chat(prompt, &self.model_hint, 0.2)
.await;
let elapsed = started.elapsed();
match &result {
Ok(out) => log::info!(
"[summarizer] label={} model={} input_chars={} output_chars={} latency_ms={}",
self.label,
self.model_hint,
input_chars,
out.chars().count(),
elapsed.as_millis(),
),
Err(e) => log::warn!(
"[summarizer] label={} model={} input_chars={} latency_ms={} error={e}",
self.label,
self.model_hint,
input_chars,
elapsed.as_millis(),
),
}
result
}
}

#[cfg(test)]
#[path = "tests.rs"]
mod tests;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Split the operational summarizer code out of mod.rs.

This new module puts the trait, implementation, constants, and runtime logging directly in src/openhuman/learning/summarizer/mod.rs. It would be better to keep mod.rs export-focused and move the concrete code into sibling files now, while the surface is still small.

As per coding guidelines, "src/openhuman/**/mod.rs: Keep domain mod.rs export-focused with light exports; place operational code in ops.rs, store.rs, types.rs, etc."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/learning/summarizer/mod.rs` around lines 18 - 145, Move the
operational summarizer implementation out of mod.rs into a new ops.rs (or
summarizer_ops.rs) and keep mod.rs export-focused: leave the trait
SummarizerProvider, public constants MIN_SUMMARIZER_CONTEXT_CHARS and
DEFAULT_SUMMARIZER_CONTEXT_CHARS, and pub use exports in mod.rs, then relocate
ConfiguredSummarizer struct, its impl/new() and the async_trait impl
(context_window_chars, label, prompt) plus runtime logging and tests to the new
ops file; update mod.rs to pub mod ops; and adjust any imports to reference
ops::ConfiguredSummarizer (or re-export it from mod.rs) so callers keep the same
public API.

Comment on lines +61 to +70
pub async fn collapse_near_duplicates(
candidates: Vec<MemoryCandidate>,
summarizer: Option<&Arc<dyn SummarizerProvider>>,
) -> (Vec<MemoryCandidate>, MergeReport) {
let mut report = MergeReport {
input: candidates.len(),
output: candidates.len(),
merged_pairs: 0,
used_summarizer: summarizer.is_some(),
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The transcript-ingest summarizer path never actually runs.

collapse_near_duplicates only records summarizer.is_some() in used_summarizer; the merge behavior is identical either way and no SummarizerProvider::prompt(...) call happens. That makes ingest_*_with_summarizer a behavioral no-op while the report/logging claims a summarizer was used. Either invoke the summarizer when collapsing a group, or keep this API/reporting purely heuristic until the LLM rewrite path is implemented.

Also applies to: 137-144

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/learning/transcript_ingest/merge.rs` around lines 61 - 70,
collapse_near_duplicates currently only toggles MergeReport.used_summarizer
based on summarizer.is_some() but never actually calls the SummarizerProvider,
so ingest_*_with_summarizer is a no-op; fix by invoking the summarizer when you
collapse/merge a group: where you compute the merged content for a group inside
collapse_near_duplicates (and the similar block around the other occurrence
noted), call summarizer.unwrap().prompt(...) (or the appropriate
SummarizerProvider::prompt method) with the group texts, replace the merged text
with the prompt result, and ensure MergeReport.used_summarizer reflects actual
use only when the prompt is invoked.

@senamakel senamakel marked this pull request as draft May 12, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize agent learnings: summarizer LLM, tool-call compression, orchestrator-only reflections

1 participant