address codex review: gate pruning on adaptive_memory, cap tool calls per response, canonicalize skill paths

michaelneale · michaelneale · commit 37962b3d6597 · 2026-04-16T12:35:49.000+10:00
Signed-off-by: Michael Neale &lt;michael.neale@gmail.com&gt;
diff --git a/crates/goose/src/agents/knowledge_review.rs b/crates/goose/src/agents/knowledge_review.rs
@@ -339,6 +339,9 @@ async fn run_knowledge_extraction(
         messages.push(response_message);
 
         for tool_request in &tool_requests {
+            if tool_calls_made >= MAX_REVIEW_TOOL_CALLS {
+                break;
+            }
             tool_calls_made += 1;
 
             let tool_call = match &tool_request.tool_call {
diff --git a/crates/goose/src/agents/platform_extensions/skills.rs b/crates/goose/src/agents/platform_extensions/skills.rs
@@ -357,7 +357,10 @@ pub async fn handle_patch_skill(
     }
 
     let goose_skills_dir = Paths::config_dir().join("skills");
-    if !skill.path.starts_with(&goose_skills_dir) {
+    let canonical_skills_dir = std::fs::canonicalize(&goose_skills_dir).unwrap_or(goose_skills_dir);
+    let canonical_skill_path =
+        std::fs::canonicalize(&skill.path).unwrap_or_else(|_| skill.path.clone());
+    if !canonical_skill_path.starts_with(&canonical_skills_dir) {
         return Ok(CallToolResult::error(vec![Content::text(
             "Cannot patch externally installed skills. Create a new skill in goose's directory instead.",
         )]));
diff --git a/crates/goose/src/agents/snapshots/goose__agents__prompt_manager__tests__all_platform_extensions.snap.new b/crates/goose/src/agents/snapshots/goose__agents__prompt_manager__tests__all_platform_extensions.snap.new
@@ -0,0 +1,204 @@
+---
+source: crates/goose/src/agents/prompt_manager.rs
+assertion_line: 458
+expression: system_prompt
+---
+You are a general-purpose AI agent called goose, created by AAIF (Agentic AI Foundation).
+goose is being developed as an open-source software project.
+
+# Extensions
+
+Extensions provide additional tools and context from different data sources and applications.
+You can dynamically enable or disable extensions as needed to help complete tasks.
+
+Because you dynamically load extensions, your conversation history may refer
+to interactions with extensions that are not currently active. The currently
+active extensions are below. Each of these extensions provides tools that are
+in your tool specification.
+
+
+## Extension Manager
+
+### Instructions
+Extension Management
+
+Use these tools to discover, enable, and disable extensions, as well as review resources.
+
+Available tools:
+- search_available_extensions: Find extensions available to enable/disable
+- manage_extensions: Enable or disable extensions
+- list_resources: List resources from extensions
+- read_resource: Read specific resources from extensions
+
+When you lack the tools needed to complete a task, use search_available_extensions first
+to discover what extensions can help.
+
+Use manage_extensions to enable or disable specific extensions by name.
+Use list_resources and read_resource to work with extension data and resources.
+
+## adaptive_memory
+
+### Instructions
+You have persistent adaptive memory across sessions.
+The most valuable memory prevents the user from having to repeat themselves.
+Save proactively — don't wait to be asked.
+
+WHEN TO SAVE:
+- User corrects you or says 'remember this' / 'don't do that again' → save immediately
+- User shares a preference, habit, or personal detail (name, role, timezone, coding style) → target: user
+- You discover something about the environment (OS, installed tools, project structure, build commands) → target: memory
+- You learn a convention, API quirk, or workflow specific to this user's setup → target: memory
+- You identify a stable fact useful in future sessions → target: memory
+
+PRIORITY: User preferences and corrections > environment facts > procedural knowledge.
+
+Do NOT save: task progress, session outcomes, temporary state, things easily re-discovered.
+
+ACTIONS: add, replace (old_text identifies entry), remove (old_text identifies entry)
+
+Memory has hard size limits. Adds that exceed the limit are REJECTED.
+Replace or remove existing entries to make room first.
+
+══════════════════════════════════════════════
+USER PROFILE (who the user is) [60% — 837/1375 chars]
+══════════════════════════════════════════════
+micn strongly dislikes git force pushing — prefers clean history practices like revert commits over rewriting history
+§
+micn works on the goose project — currently active on the micn/goose-memory-learning branch (adaptive memory feature)
+§
+Strongly prefers clean git practices: avoids force pushing, prefers revert commits over rewriting history. Values code hygiene and proper commit discipline.
+§
+Prefers pragmatic solutions over perfect ones. When implementing security/privacy features (like secret redaction), willing to err on the side of over-redaction (false positives) rather than risk leaking secrets (false negatives).
+§
+Prefers minimal PR descriptions: when creating PRs, avoids mentioning sensitive details (e.g., "secrets", "masking") and keeps language high-level/preventative rather than specific about the vulnerability.
+══════════════════════════════════════════════
+MEMORY (your personal notes) [87% — 1914/2200 chars]
+══════════════════════════════════════════════
+Goose issue #8475: User got 401 "User not found" from OpenRouter. Root cause was transient account propagation delay (~23 min), not a goose bug. Same config in both diagnostic bundles (diag7 failed 05:51 UTC, diag8 worked 06:14 UTC). Secondary issue: API key was exposed in public diagnostics zip because it was stored in config.yaml as plaintext instead of in keyring.
+§
+Goose diagnostics redaction approach: Use Shannon entropy (>3.5 bits/char) + character composition heuristics to detect secrets. Secrets are long (≥20 chars), high-entropy, and contain only alphanumeric + hyphens/underscores. Special case: JWTs have exactly 3 dot-separated base64 segments (each ≥4 chars). This catches API keys, bearer tokens, JWTs while preserving URLs, model names, descriptions, hostnames, versions.
+§
+Windows Credential Manager has 2560-byte blob size limit (UTF-16 encoded). Goose stores all secrets as single JSON blob in keyring. If blob exceeds limit, keyring returns Error::TooLong, but is_keyring_availability_error() doesn't catch it (only checks for "keyring", "dbus", "platform secure storage" keywords). Result: write fails entirely, no fallback to file storage. This is a potential bug on Windows with many configured providers.
+§
+Goose config secret storage: Secrets go to keyring via set_secret() (never to config.yaml). If keyring fails with availability error, falls back to secrets.yaml. Normal UI/CLI flows correctly route secrets via set_secret(). If a secret appears in config.yaml, it was either manually edited by user or written by a tool outside goose.
+§
+PR #8567 review feedback from Codex: (1) JWT tokens with dots weren't caught because '.' was in denylist — fixed by special-casing JWT shape (3 dot-separated base64 segments). (2) unwrap_or_default() silently swallowed I/O errors — fixed by using fs::read()? + String::from_utf8_lossy() to propagate errors while handling non-UTF8 gracefully.
+## analyze
+
+### Instructions
+Analyze code structure using tree-sitter AST parsing. Three auto-selected modes:
+- Directory path → structure overview (file tree with function/class counts)
+- File path → semantic details (functions, classes, imports, call counts)
+- Any path + focus parameter → symbol call graph (incoming/outgoing chains)
+
+For large codebases, delegate analysis to a subagent and retain only the summary.
+
+## apps
+
+apps supports resources.
+### Instructions
+Use this extension to create, manage, and iterate on custom HTML/CSS/JavaScript apps.
+## chatrecall
+
+### Instructions
+Chat Recall
+
+Search past conversations and load session summaries when the user expects some memory or context.
+
+Two modes:
+- Search mode: Use query with keywords/synonyms to find relevant messages
+- Load mode: Use session_id to get first and last messages of a specific session
+
+## code_execution
+
+### Instructions
+General:
+    - BATCH MULTIPLE TOOL CALLS INTO ONE `execute_typescript` CALL.
+    - These tools exists to reduce round-trips. When a task requires multiple tool calls:
+        - WRONG: Multiple `execute_typescript` calls, each with one tool
+        - RIGHT: One `execute_typescript` call with a script that calls all needed tools
+    - Only `return` and `console.log` data you need, tools could have very large responses.
+    - IMPORTANT: All tool calls are ASYNC. Use await for each call.
+WORKFLOW:
+    1. Use the `list_functions` and `get_function_details` tools to discover tools signatures and input/output types.
+    2. Write ONE script that calls ALL tools needed for the task and execute that script with `execute_typescript`, no need to import anything, all the namespaces returned by `list_functions` and `get_function_details` will be available globally.
+## developer
+
+### Instructions
+Use the developer extension to build software and operate a terminal.
+
+Make sure to use the tools *efficiently* - reading all the content you need in as few
+iterations as possible and then making the requested edits or running commands. You are
+responsible for managing your context window, and to minimize unnecessary turns which
+cost the user money.
+
+For editing software, prefer the flow of using tree to understand the codebase structure
+and file sizes. When you need to search, prefer rg which correctly respects gitignored
+content. Then use cat or sed to gather the context you need, always reading before editing.
+Use write and edit to efficiently make changes. Test and verify as appropriate.
+
+## orchestrator
+
+### Instructions
+Manage agent sessions: list, view, start, send messages, and interrupt agents.
+## skills
+
+### Instructions
+
+
+You have these skills at your disposal, when it is clear they can help you solve a problem or you are asked to use them:
+• agent-tools - Use when interacting with Block services — Slack, Google Drive, Google Calendar, Gmail, Snowflake, Jira, GitHub, Glean, Salesforce, Datadog, Linear, Airtable, PagerDuty, Sentry, Notion, Workday, Asana, and more. Always load this skill before accessing any Block service.
+• goose-doc-guide - Reference goose documentation to create, configure, or explain goose-specific features like recipes, extensions, sessions, and providers. You MUST fetch relevant goose docs before answering. You MUST NOT rely on training data or assumptions for any goose-specific fields, values, names, syntax, or commands.
+## summarize
+
+
+## summon
+
+
+## todo
+
+### Instructions
+Your todo content is automatically available in your context.
+
+Workflow:
+- Start: write initial checklist
+- During: update progress
+- End: verify all complete
+
+Template:
+- [x] Requirement 1
+- [ ] Task
+  - [ ] Sub-task
+- [ ] Requirement 2
+- [ ] Another task
+
+## tom
+
+
+
+
+# Response Guidelines
+
+Use Markdown formatting for all responses.
+
+# Knowledge Management
+
+When working with memory and skills extensions:
+
+## Memory
+The most valuable memory prevents the user from having to repeat themselves.
+Save proactively — don't wait to be asked:
+- User corrects you or says "remember this" / "don't do that again" → save immediately
+- User shares a preference, habit, or personal detail → save to target "user"
+- You discover something about the environment (OS, tools, project structure, build commands) → save to target "memory"
+- You learn a convention, API quirk, or workflow specific to this user's setup → save to target "memory"
+- Do NOT save: task progress, session outcomes, temporary state, things easily re-discovered
+
+Priority: User preferences and corrections > environment facts > procedural knowledge.
+When memory is at capacity, curate: replace outdated entries, remove low-value ones, consolidate related entries.
+
+## Skills
+After completing complex work (many tool calls, error recovery, or non-obvious workflows),
+consider saving a reusable skill with create_skill.
+If you loaded a skill and found it wrong or incomplete, patch it immediately with patch_skill.
+Skills that aren't maintained become liabilities.
diff --git a/crates/goose/src/context_mgmt/mod.rs b/crates/goose/src/context_mgmt/mod.rs
@@ -148,11 +148,22 @@ pub async fn compact_messages(
 
     let messages_to_compact = messages.as_slice();
 
+    let adaptive_memory_active = match extension_manager {
+        Some(ext_mgr) => {
+            ext_mgr
+                .is_extension_enabled(
+                    crate::agents::platform_extensions::adaptive_memory::EXTENSION_NAME,
+                )
+                .await
+        }
+        None => false,
+    };
+
     let (summary_message, summarization_usage) = do_compact(
         provider,
         session_id,
         messages_to_compact,
-        extension_manager.is_some(),
+        adaptive_memory_active,
     )
     .await?;