feat(learning): summarizer LLM, tool-call digests, orchestrator-only reflection (#1419)#1519
Conversation
…reflection (tinyhumansai#1419) - New `SummarizerProvider` trait + `ConfiguredSummarizer` wrapper so reflection + transcript-ingest can run on a cheap dedicated model instead of the orchestrator-tier provider. Opt-in via `learning.summarizer.enabled`; heuristic fallback from tinyhumansai#1406 stays intact for offline users. - Tool-call digest layer (`learning::summarizer::digest`) collapses multi-call tool history into per-tool aggregates (count, success rate, p95 duration, bounded input/output samples) so the smaller summarizer context window never sees raw payloads. - Reflection prompt now compresses repeated tool calls into digests before the LLM call; preserves the legacy per-call rendering when every tool was invoked at most once (keeps existing assertions green). - Reflection hook + transcript-ingestion spawn are gated to the orchestrator agent only — sub-agent transcripts no longer trigger reflection or conversational-memory writes. - Two-stage transcript ingest: heuristic Stage 1 (unchanged) + Stage 2 near-duplicate merge over Jaccard similarity on tokens. Collapses paraphrases of the same preference / decision before dedupe + persist. - Telemetry: summarizer dispatches log label, model hint, input chars, cap, fill ratio, latency. Merge pass logs compressed-pair counts. - Tests: per-tool digest grouping/truncation, summarizer routing end-to-end, prompt compression on repeated tool calls, near-duplicate preference collapse with provenance merging.
📝 WalkthroughWalkthroughThis PR implements a dedicated cheap summarizer LLM for agent learning: adding a pluggable ChangesDedicated Summarizer for Reflection & Transcript Ingest
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent/harness/session/builder.rs`:
- Around line 825-857: The current builder only constructs a SummarizerProvider
when learning.summarizer.source == SummarizerSource::Cloud, leaving a
local-enabled config ineffective because ReflectionHook::run_reflection checks
self.summarizer.is_some(); update the builder to either (A) instantiate and
assign a local implementation of crate::openhuman::learning::SummarizerProvider
when config.learning.summarizer.source == SummarizerSource::Local (e.g., wrap
the existing local summarizer implementation used by ReflectionHook into an
Arc<dyn SummarizerProvider> similar to ConfiguredSummarizer) or (B) explicitly
reject/return an Err when config.learning.summarizer.enabled == true and source
== Local so the config cannot silently be no-op; modify the Summarizer creation
block (the let summarizer: Option<Arc<dyn
crate::openhuman::learning::SummarizerProvider>> = ...) to implement one of
these behaviors so ReflectionHook::run_reflection sees a Some(...) for local
setups or the config fails fast.
In `@src/openhuman/learning/summarizer/digest.rs`:
- Around line 62-99: The code currently records max durations in max_dur and
assigns that to ToolCallDigest.p95_duration_ms; instead collect a Vec<u64> of
durations per tool while iterating tool_calls (e.g., create a HashMap<String,
Vec<u64>> durations_by_name), push record.duration_ms into it for each record,
then after the loop compute the 95th percentile per tool by sorting the vector
and selecting the value at index = ((n as f64 * 0.95).ceil() as
usize).saturating_sub(1) (or equivalent) and assign that value to
entry.p95_duration_ms on the corresponding ToolCallDigest; remove the max_dur
logic and ensure types match u64 when assigning to p95_duration_ms.
- Around line 83-87: The code is pushing raw serialized tool arguments
(record.arguments.to_string()) into entry.sample_inputs which can leak secrets;
instead sanitize/redact sensitive fields before truncation and pushing. Update
the logic around the sample push (where
entry.sample_inputs.push(truncate(&record.arguments.to_string(), ...))) to call
a sanitizer function (e.g., sanitize_arguments or redact_sensitive_fields) that
removes or masks known secret keys/patterns (authorization, api_key,
access_token, bearer tokens, emails, long hex strings) and returns a safe
string, then pass that sanitized string into truncate and push; ensure the
sanitizer is deterministic and documented and reuse it anywhere else arguments
are sampled.
In `@src/openhuman/learning/summarizer/mod.rs`:
- Around line 18-145: Move the operational summarizer implementation out of
mod.rs into a new ops.rs (or summarizer_ops.rs) and keep mod.rs export-focused:
leave the trait SummarizerProvider, public constants
MIN_SUMMARIZER_CONTEXT_CHARS and DEFAULT_SUMMARIZER_CONTEXT_CHARS, and pub use
exports in mod.rs, then relocate ConfiguredSummarizer struct, its impl/new() and
the async_trait impl (context_window_chars, label, prompt) plus runtime logging
and tests to the new ops file; update mod.rs to pub mod ops; and adjust any
imports to reference ops::ConfiguredSummarizer (or re-export it from mod.rs) so
callers keep the same public API.
In `@src/openhuman/learning/transcript_ingest/merge.rs`:
- Around line 61-70: collapse_near_duplicates currently only toggles
MergeReport.used_summarizer based on summarizer.is_some() but never actually
calls the SummarizerProvider, so ingest_*_with_summarizer is a no-op; fix by
invoking the summarizer when you collapse/merge a group: where you compute the
merged content for a group inside collapse_near_duplicates (and the similar
block around the other occurrence noted), call summarizer.unwrap().prompt(...)
(or the appropriate SummarizerProvider::prompt method) with the group texts,
replace the merged text with the prompt result, and ensure
MergeReport.used_summarizer reflects actual use only when the prompt is invoked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 423267af-bfe6-4a34-a673-62cca549b2ef
📒 Files selected for processing (13)
src/openhuman/agent/harness/session/builder.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/config/mod.rssrc/openhuman/config/schema/learning.rssrc/openhuman/config/schema/mod.rssrc/openhuman/learning/mod.rssrc/openhuman/learning/reflection.rssrc/openhuman/learning/reflection_tests.rssrc/openhuman/learning/summarizer/digest.rssrc/openhuman/learning/summarizer/mod.rssrc/openhuman/learning/summarizer/tests.rssrc/openhuman/learning/transcript_ingest/merge.rssrc/openhuman/learning/transcript_ingest/mod.rs
| // #1419: opt-in dedicated summarizer. When the cloud | ||
| // source is selected we reuse the same routed provider | ||
| // and just route the call through a different model | ||
| // hint. Local summarizers are handled inside | ||
| // ReflectionHook itself via the legacy local-AI path, | ||
| // so we skip the trait wiring there. | ||
| let summarizer: Option<Arc<dyn crate::openhuman::learning::SummarizerProvider>> = | ||
| if config.learning.summarizer.enabled | ||
| && config.learning.summarizer.source | ||
| == crate::openhuman::config::SummarizerSource::Cloud | ||
| { | ||
| let provider: Arc<dyn crate::openhuman::providers::Provider> = | ||
| match reflection_provider.clone() { | ||
| Some(p) => p, | ||
| None => Arc::from(providers::create_routed_provider( | ||
| config.api_url.as_deref(), | ||
| config.api_key.as_deref(), | ||
| &config.reliability, | ||
| &config.model_routes, | ||
| &model_name, | ||
| )?), | ||
| }; | ||
| Some(Arc::new( | ||
| crate::openhuman::learning::ConfiguredSummarizer::new( | ||
| provider, | ||
| config.learning.summarizer.model_hint.clone(), | ||
| config.learning.summarizer.max_context_chars, | ||
| "reflection-summarizer", | ||
| ), | ||
| )) | ||
| } else { | ||
| None | ||
| }; |
There was a problem hiding this comment.
Local summarizer config is still a no-op.
This only builds a dedicated SummarizerProvider for SummarizerSource::Cloud, but ReflectionHook::run_reflection only takes the dedicated path when self.summarizer.is_some(). With learning.summarizer.enabled = true and source = Local, reflection falls back to the legacy route and ignores the summarizer model hint/context cap entirely. Please either wire a local SummarizerProvider here or reject that config explicitly so it cannot silently do nothing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/agent/harness/session/builder.rs` around lines 825 - 857, The
current builder only constructs a SummarizerProvider when
learning.summarizer.source == SummarizerSource::Cloud, leaving a local-enabled
config ineffective because ReflectionHook::run_reflection checks
self.summarizer.is_some(); update the builder to either (A) instantiate and
assign a local implementation of crate::openhuman::learning::SummarizerProvider
when config.learning.summarizer.source == SummarizerSource::Local (e.g., wrap
the existing local summarizer implementation used by ReflectionHook into an
Arc<dyn SummarizerProvider> similar to ConfiguredSummarizer) or (B) explicitly
reject/return an Err when config.learning.summarizer.enabled == true and source
== Local so the config cannot silently be no-op; modify the Summarizer creation
block (the let summarizer: Option<Arc<dyn
crate::openhuman::learning::SummarizerProvider>> = ...) to implement one of
these behaviors so ReflectionHook::run_reflection sees a Some(...) for local
setups or the config fails fast.
| let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new(); | ||
|
|
||
| for record in tool_calls { | ||
| let entry = by_name | ||
| .entry(record.name.clone()) | ||
| .or_insert_with(|| ToolCallDigest { | ||
| name: record.name.clone(), | ||
| count: 0, | ||
| success_count: 0, | ||
| p95_duration_ms: 0, | ||
| sample_inputs: Vec::new(), | ||
| sample_outputs: Vec::new(), | ||
| }); | ||
| entry.count += 1; | ||
| if record.success { | ||
| entry.success_count += 1; | ||
| } | ||
| let cap = max_dur.entry(record.name.clone()).or_insert(0); | ||
| if record.duration_ms > *cap { | ||
| *cap = record.duration_ms; | ||
| } | ||
| if entry.sample_inputs.len() < SAMPLES_PER_TOOL { | ||
| entry | ||
| .sample_inputs | ||
| .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS)); | ||
| } | ||
| if entry.sample_outputs.len() < SAMPLES_PER_TOOL { | ||
| entry | ||
| .sample_outputs | ||
| .push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS)); | ||
| } | ||
| } | ||
|
|
||
| for (name, dur) in max_dur { | ||
| if let Some(entry) = by_name.get_mut(&name) { | ||
| entry.p95_duration_ms = dur; | ||
| } | ||
| } |
There was a problem hiding this comment.
p95_duration_ms is currently computed as max, not p95.
Line 79 and Line 95 make this a max-duration metric for every tool, which mislabels telemetry/prompt stats and can skew downstream reflection logic.
💡 Proposed fix (compute true p95 per tool)
- let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new();
+ let mut durations: std::collections::HashMap<String, Vec<u64>> =
+ std::collections::HashMap::new();
@@
- let cap = max_dur.entry(record.name.clone()).or_insert(0);
- if record.duration_ms > *cap {
- *cap = record.duration_ms;
- }
+ durations
+ .entry(record.name.clone())
+ .or_default()
+ .push(record.duration_ms);
@@
- for (name, dur) in max_dur {
+ for (name, mut vals) in durations {
+ vals.sort_unstable();
+ let idx = (((vals.len() as f64) * 0.95).ceil() as usize).saturating_sub(1);
+ let p95 = vals[idx.min(vals.len() - 1)];
if let Some(entry) = by_name.get_mut(&name) {
- entry.p95_duration_ms = dur;
+ entry.p95_duration_ms = p95;
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut max_dur: std::collections::HashMap<String, u64> = std::collections::HashMap::new(); | |
| for record in tool_calls { | |
| let entry = by_name | |
| .entry(record.name.clone()) | |
| .or_insert_with(|| ToolCallDigest { | |
| name: record.name.clone(), | |
| count: 0, | |
| success_count: 0, | |
| p95_duration_ms: 0, | |
| sample_inputs: Vec::new(), | |
| sample_outputs: Vec::new(), | |
| }); | |
| entry.count += 1; | |
| if record.success { | |
| entry.success_count += 1; | |
| } | |
| let cap = max_dur.entry(record.name.clone()).or_insert(0); | |
| if record.duration_ms > *cap { | |
| *cap = record.duration_ms; | |
| } | |
| if entry.sample_inputs.len() < SAMPLES_PER_TOOL { | |
| entry | |
| .sample_inputs | |
| .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS)); | |
| } | |
| if entry.sample_outputs.len() < SAMPLES_PER_TOOL { | |
| entry | |
| .sample_outputs | |
| .push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS)); | |
| } | |
| } | |
| for (name, dur) in max_dur { | |
| if let Some(entry) = by_name.get_mut(&name) { | |
| entry.p95_duration_ms = dur; | |
| } | |
| } | |
| let mut durations: std::collections::HashMap<String, Vec<u64>> = | |
| std::collections::HashMap::new(); | |
| for record in tool_calls { | |
| let entry = by_name | |
| .entry(record.name.clone()) | |
| .or_insert_with(|| ToolCallDigest { | |
| name: record.name.clone(), | |
| count: 0, | |
| success_count: 0, | |
| p95_duration_ms: 0, | |
| sample_inputs: Vec::new(), | |
| sample_outputs: Vec::new(), | |
| }); | |
| entry.count += 1; | |
| if record.success { | |
| entry.success_count += 1; | |
| } | |
| durations | |
| .entry(record.name.clone()) | |
| .or_default() | |
| .push(record.duration_ms); | |
| if entry.sample_inputs.len() < SAMPLES_PER_TOOL { | |
| entry | |
| .sample_inputs | |
| .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS)); | |
| } | |
| if entry.sample_outputs.len() < SAMPLES_PER_TOOL { | |
| entry | |
| .sample_outputs | |
| .push(truncate(&record.output_summary, SAMPLE_OUTPUT_CHARS)); | |
| } | |
| } | |
| for (name, mut vals) in durations { | |
| vals.sort_unstable(); | |
| let idx = (((vals.len() as f64) * 0.95).ceil() as usize).saturating_sub(1); | |
| let p95 = vals[idx.min(vals.len() - 1)]; | |
| if let Some(entry) = by_name.get_mut(&name) { | |
| entry.p95_duration_ms = p95; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/learning/summarizer/digest.rs` around lines 62 - 99, The code
currently records max durations in max_dur and assigns that to
ToolCallDigest.p95_duration_ms; instead collect a Vec<u64> of durations per tool
while iterating tool_calls (e.g., create a HashMap<String, Vec<u64>>
durations_by_name), push record.duration_ms into it for each record, then after
the loop compute the 95th percentile per tool by sorting the vector and
selecting the value at index = ((n as f64 * 0.95).ceil() as
usize).saturating_sub(1) (or equivalent) and assign that value to
entry.p95_duration_ms on the corresponding ToolCallDigest; remove the max_dur
logic and ensure types match u64 when assigning to p95_duration_ms.
| if entry.sample_inputs.len() < SAMPLES_PER_TOOL { | ||
| entry | ||
| .sample_inputs | ||
| .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS)); | ||
| } |
There was a problem hiding this comment.
Raw tool arguments are copied into digest samples without sanitization.
Line 86 serializes full argument JSON into summarizer prompts. This can leak API keys, bearer tokens, emails, or other sensitive payloads when cloud summarizer routing is enabled.
💡 Proposed hardening (sanitize before sampling)
- entry
- .sample_inputs
- .push(truncate(&record.arguments.to_string(), SAMPLE_OUTPUT_CHARS));
+ let safe_input = sanitize_input_sample(&record.arguments.to_string());
+ entry
+ .sample_inputs
+ .push(truncate(&safe_input, SAMPLE_OUTPUT_CHARS));+fn sanitize_input_sample(raw: &str) -> String {
+ let lower = raw.to_ascii_lowercase();
+ let secret_markers = [
+ "authorization", "bearer ", "token", "api_key", "apikey", "secret", "password", "sk-", "ghp_",
+ ];
+ if secret_markers.iter().any(|m| lower.contains(m)) {
+ "[redacted: potential secret]".to_string()
+ } else {
+ raw.to_string()
+ }
+}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/learning/summarizer/digest.rs` around lines 83 - 87, The code
is pushing raw serialized tool arguments (record.arguments.to_string()) into
entry.sample_inputs which can leak secrets; instead sanitize/redact sensitive
fields before truncation and pushing. Update the logic around the sample push
(where entry.sample_inputs.push(truncate(&record.arguments.to_string(), ...)))
to call a sanitizer function (e.g., sanitize_arguments or
redact_sensitive_fields) that removes or masks known secret keys/patterns
(authorization, api_key, access_token, bearer tokens, emails, long hex strings)
and returns a safe string, then pass that sanitized string into truncate and
push; ensure the sanitizer is deterministic and documented and reuse it anywhere
else arguments are sampled.
| pub mod digest; | ||
|
|
||
| use crate::openhuman::providers::Provider; | ||
| use async_trait::async_trait; | ||
| use std::sync::Arc; | ||
|
|
||
| pub use digest::{compress_tool_calls, render_tool_digests, ToolCallDigest}; | ||
|
|
||
| /// Minimum context-window cap (in characters) we'll honour. Anything | ||
| /// smaller would clip even a one-paragraph reflection prompt, so we | ||
| /// floor at this value rather than failing silently. | ||
| pub const MIN_SUMMARIZER_CONTEXT_CHARS: usize = 1024; | ||
|
|
||
| /// Default character cap for the summarizer's context window. Tuned to | ||
| /// fit comfortably inside a small local model (e.g. 4k-token Ollama | ||
| /// chat models) after prompt scaffolding and reserved completion space. | ||
| pub const DEFAULT_SUMMARIZER_CONTEXT_CHARS: usize = 6_000; | ||
|
|
||
| /// Trait for a cheap summarizer model, distinct from the orchestrator | ||
| /// provider. Implementations promise: | ||
| /// | ||
| /// - **Cheap and fast** — callers fire this in the background on every | ||
| /// threshold crossing, so latency / cost matters more than peak | ||
| /// quality. | ||
| /// - **Bounded context** — [`context_window_chars`] is the *callable* | ||
| /// budget for `prompt`. Callers must compress inputs (see [`digest`]) | ||
| /// to fit. | ||
| /// - **Short outputs** — the trait targets sub-paragraph summaries. | ||
| /// `prompt` returns the raw model response; parsing is the caller's | ||
| /// responsibility. | ||
| #[async_trait] | ||
| pub trait SummarizerProvider: Send + Sync { | ||
| /// Approximate character budget for `prompt` input. Callers must | ||
| /// truncate / digest inputs to stay under this cap; the | ||
| /// implementation makes no guarantee about behaviour when the cap | ||
| /// is exceeded. | ||
| fn context_window_chars(&self) -> usize; | ||
|
|
||
| /// Human-readable identifier used in logs / telemetry. Should not | ||
| /// include credentials or absolute paths. | ||
| fn label(&self) -> &str; | ||
|
|
||
| /// Run a one-shot summarization. Returns the model's raw response. | ||
| async fn prompt(&self, prompt: &str) -> anyhow::Result<String>; | ||
| } | ||
|
|
||
| /// Thin [`SummarizerProvider`] backed by an existing | ||
| /// [`Provider`]. Used by the cloud path; the local path is wired | ||
| /// separately by the caller (see `learning::reflection`). | ||
| pub struct ConfiguredSummarizer { | ||
| provider: Arc<dyn Provider>, | ||
| model_hint: String, | ||
| context_window_chars: usize, | ||
| label: String, | ||
| } | ||
|
|
||
| impl ConfiguredSummarizer { | ||
| /// Construct a configured summarizer. | ||
| /// | ||
| /// `context_window_chars` is clamped to at least | ||
| /// [`MIN_SUMMARIZER_CONTEXT_CHARS`] so a misconfigured value cannot | ||
| /// produce zero-byte truncation downstream. | ||
| pub fn new( | ||
| provider: Arc<dyn Provider>, | ||
| model_hint: impl Into<String>, | ||
| context_window_chars: usize, | ||
| label: impl Into<String>, | ||
| ) -> Self { | ||
| Self { | ||
| provider, | ||
| model_hint: model_hint.into(), | ||
| context_window_chars: context_window_chars.max(MIN_SUMMARIZER_CONTEXT_CHARS), | ||
| label: label.into(), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #[async_trait] | ||
| impl SummarizerProvider for ConfiguredSummarizer { | ||
| fn context_window_chars(&self) -> usize { | ||
| self.context_window_chars | ||
| } | ||
|
|
||
| fn label(&self) -> &str { | ||
| &self.label | ||
| } | ||
|
|
||
| async fn prompt(&self, prompt: &str) -> anyhow::Result<String> { | ||
| let started = std::time::Instant::now(); | ||
| let input_chars = prompt.chars().count(); | ||
| let fill_ratio = (input_chars as f32) / (self.context_window_chars.max(1) as f32); | ||
| log::debug!( | ||
| "[summarizer] dispatch label={} model={} input_chars={} cap={} fill={:.2}", | ||
| self.label, | ||
| self.model_hint, | ||
| input_chars, | ||
| self.context_window_chars, | ||
| fill_ratio, | ||
| ); | ||
| let result = self | ||
| .provider | ||
| .simple_chat(prompt, &self.model_hint, 0.2) | ||
| .await; | ||
| let elapsed = started.elapsed(); | ||
| match &result { | ||
| Ok(out) => log::info!( | ||
| "[summarizer] label={} model={} input_chars={} output_chars={} latency_ms={}", | ||
| self.label, | ||
| self.model_hint, | ||
| input_chars, | ||
| out.chars().count(), | ||
| elapsed.as_millis(), | ||
| ), | ||
| Err(e) => log::warn!( | ||
| "[summarizer] label={} model={} input_chars={} latency_ms={} error={e}", | ||
| self.label, | ||
| self.model_hint, | ||
| input_chars, | ||
| elapsed.as_millis(), | ||
| ), | ||
| } | ||
| result | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| #[path = "tests.rs"] | ||
| mod tests; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Split the operational summarizer code out of mod.rs.
This new module puts the trait, implementation, constants, and runtime logging directly in src/openhuman/learning/summarizer/mod.rs. It would be better to keep mod.rs export-focused and move the concrete code into sibling files now, while the surface is still small.
As per coding guidelines, "src/openhuman/**/mod.rs: Keep domain mod.rs export-focused with light exports; place operational code in ops.rs, store.rs, types.rs, etc."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/learning/summarizer/mod.rs` around lines 18 - 145, Move the
operational summarizer implementation out of mod.rs into a new ops.rs (or
summarizer_ops.rs) and keep mod.rs export-focused: leave the trait
SummarizerProvider, public constants MIN_SUMMARIZER_CONTEXT_CHARS and
DEFAULT_SUMMARIZER_CONTEXT_CHARS, and pub use exports in mod.rs, then relocate
ConfiguredSummarizer struct, its impl/new() and the async_trait impl
(context_window_chars, label, prompt) plus runtime logging and tests to the new
ops file; update mod.rs to pub mod ops; and adjust any imports to reference
ops::ConfiguredSummarizer (or re-export it from mod.rs) so callers keep the same
public API.
| pub async fn collapse_near_duplicates( | ||
| candidates: Vec<MemoryCandidate>, | ||
| summarizer: Option<&Arc<dyn SummarizerProvider>>, | ||
| ) -> (Vec<MemoryCandidate>, MergeReport) { | ||
| let mut report = MergeReport { | ||
| input: candidates.len(), | ||
| output: candidates.len(), | ||
| merged_pairs: 0, | ||
| used_summarizer: summarizer.is_some(), | ||
| }; |
There was a problem hiding this comment.
The transcript-ingest summarizer path never actually runs.
collapse_near_duplicates only records summarizer.is_some() in used_summarizer; the merge behavior is identical either way and no SummarizerProvider::prompt(...) call happens. That makes ingest_*_with_summarizer a behavioral no-op while the report/logging claims a summarizer was used. Either invoke the summarizer when collapsing a group, or keep this API/reporting purely heuristic until the LLM rewrite path is implemented.
Also applies to: 137-144
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/learning/transcript_ingest/merge.rs` around lines 61 - 70,
collapse_near_duplicates currently only toggles MergeReport.used_summarizer
based on summarizer.is_some() but never actually calls the SummarizerProvider,
so ingest_*_with_summarizer is a no-op; fix by invoking the summarizer when you
collapse/merge a group: where you compute the merged content for a group inside
collapse_near_duplicates (and the similar block around the other occurrence
noted), call summarizer.unwrap().prompt(...) (or the appropriate
SummarizerProvider::prompt method) with the group texts, replace the merged text
with the prompt result, and ensure MergeReport.used_summarizer reflects actual
use only when the prompt is invoked.
Summary
SummarizerProvidertrait +ConfiguredSummarizerwrapper. Reflection and transcript-ingest can route through a cheap dedicated model instead of burning orchestrator-tier inference. Opt-in vialearning.summarizer.enabled— defaults preserve the heuristic-only path from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406.spawn_transcript_ingestiongated to the user-facing orchestrator only — sub-agent transcripts no longer trigger reflection writes.Problem
Reflection and transcript-ingest fire often (per session-memory threshold crossing, on transcript close, on segment close). Running them on the same high-tier model the orchestrator uses is wasteful — most of what reflection produces is short, structured summaries that a cheap model handles fine. Two other pressure points compound the cost: raw tool-call history is the dominant token sink, and reflection currently fires on every agent (orchestrator + sub-agents + specialists) even though sub-agent transcripts almost never carry durable user context worth surfacing across chats.
See #1419 for the full motivation and constraints (background-first, swappable cloud vs Ollama, opt-out-able with heuristic fallback).
Solution
src/openhuman/learning/summarizer/: trait + cloud-backed impl.ConfiguredSummarizerwraps anyProviderand routes via a model hint (hint:fastby default), with a clamped context-window cap so misconfiguration can't produce zero-byte truncation.summarizer::digest:compress_tool_callsgroups by tool name, keepsSAMPLES_PER_TOOL(2) bounded samples per tool, truncates outputs to 160 chars.render_tool_digestsproduces a compact prompt block.learning::reflection: when a summarizer is supplied viaReflectionHook::with_summarizer, the reflection LLM call is routed through it (with chars-cap truncation as a safety net). Otherwise the legacy local/cloud routing applies. The reflection prompt now compresses repeated tool calls into digests; the legacy per-call line is preserved when every tool was invoked at most once.agent::harness::session::builder: only registersReflectionHookwhenagent_id == "orchestrator". Optionally wraps the routed cloud provider inConfiguredSummarizerwhenlearning.summarizer.enabled && source == Cloud.agent::harness::session::turn::spawn_transcript_ingestion: early-return when the agent isn't the orchestrator.learning::transcript_ingest::merge: Jaccard-similarity (0.55 threshold) merge step before dedupe/persist. Collapses same-kind near-duplicates, picks the longest snippet as representative, unions provenance message indices, promotes importance toHighif any member of the group wasHigh.Submission Checklist
## RelatedCloses #NNNin the## RelatedsectionImpact
tokio::spawnfire-and-forget, so orchestrator turn latency is unaffected.learning.summarizer.enabled = false(default) the only behavioural changes are (a) reflection prompts use per-tool digests when a turn has repeated calls, and (b) sub-agents no longer trigger reflection / transcript-ingest. Heuristic fallback from feat(memory): transcript-to-memory ingestion pipeline (#1399) #1406 stays intact for offline users.hint:fastinstead of the orchestrator's reasoning model.Related
Closes #1419
Builds on #1406 (heuristic transcript-to-memory ingestion pipeline).
Test plan
cargo test --lib openhuman::learning -- --test-threads=1— 132 passedcargo check --manifest-path Cargo.toml— cleancargo check --manifest-path app/src-tauri/Cargo.toml— cleancargo fmt— cleanSummary by CodeRabbit
New Features
Bug Fixes