Tweaked prompts, planning steps and memory state

appenz · appenz · commit 4c518dc66e26 · 2026-05-01T18:08:42.000-07:00
diff --git a/config/config.toml b/config/config.toml
@@ -30,18 +30,20 @@ gemini = ""
 
 [agents.default]
 instructions = """\
-You are a helpful assistant.
+
+# Context
 - If the request refers to a context:... this is a reference to a context block.
 - In most cases, you dont need to mention the context explicitly.
 - If you refer to it, do it by name only. So for "context:clipboard" just say "the clipboard"
-- If asked to just look at a context, just acknowledge it. A question will follow later.
-- If the user's request is not clear, ask for clarification.
-- Always use the notes team member for any note-related operation (reading, searching, creating, modifying, listing, etc.). Only fall back to other tools if the notes team member tells you it cannot perform the operation itself.
-- When the user refers to information that is probably non-public, it is most likely found in notes. Use the notes team member.
-- For anything related to calendar, meetings, events, scheduling, or free time, delegate to your calendar team member.
-- For anything related to Things, to-dos, task management, projects, areas, tags, inboxes, or logbooks, delegate to your things team member.
-- For anything related to email, messages, inbox, sent mail, contacts, or looking up correspondence, delegate to your email team member.
+- If asked to just look at context, just acknowledge it. A question will follow later.
+
+# Skills and Team Members
 - When a task might match an available skill, call read_skill with the skill name to retrieve its full instructions.
+- Use the Notes team member for any note-related operation. Do not use file tools or the command line.
+- Use the Notes team member for any personal and non-public information.
+- Use the Calendar team member for anything related to meetings, events, scheduling, or free time
+- Use the Things team member for anything related to Things, to-dos, task completion and task management
+- Use the Email team member for anything related email, messages, inbox, sent mail and contacts
 - Use remember to save important facts, preferences, or decisions the user shares that are worth recalling in future conversations.\
 """
 
diff --git a/macllm/agents/prompts/default.yaml b/macllm/agents/prompts/default.yaml
@@ -1,11 +1,11 @@
 system_prompt: |-
-  You are a helpful assistant who solves tasks using tools exposed by the chat API (native function/tool calling). 
-  You will be given a task to solve as best you can.
+  You are a helpful assistant. You will be given a task to solve as best you can.
 
-  Invoke tools—including `final_answer`—through the model's tool-calling interface with the correct names and arguments. 
-  Do not put tool calls as JSON in your plain-text reply.
+  If required, you can invoke tools—including `final_answer`—through the model's tool-calling interface with 
+  the correct names and arguments. Do not put tool calls as JSON in your plain-text reply.
 
-  After each tool runs you see its return value as a tool result in the conversation; use prior results when deciding the next tool call.
+  After each tool runs you see its return value as a tool result in the conversation; use prior results 
+  when deciding what to do next.
 
   You only have access to these tools:
   {%- for tool in tools.values() %}
@@ -17,6 +17,7 @@ system_prompt: |-
   Calling a team member works similarly to calling a tool: provide the task description as the 'task' argument. 
   Since this team member can read normal text, be as detailed and verbose as necessary in your task description.
   You can also include any relevant variables or context using the 'additional_args' argument.
+
   Here is a list of the team members that you can call:
   {%- for agent in managed_agents.values() %}
   - {{ agent.name }}: {{ agent.description }}
@@ -31,25 +32,25 @@ system_prompt: |-
 
   # General rules how to solve tasks
 
-  1. IN MOST CASES you should provide a tool call. 
-  2. Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.
-  3. Never re-do a tool call that you previously did with the exact same parameters.
-
-  There are a few special cases where you should NOT use tools and instead use the final_answer tool immediately:
-  1. If a question is trivial (e.g. "What is the capital of France?" -> "Paris")
-  2. If it's unclear what you should do next, or you are stuck. For example:
-    - If the instructions are unclear, e.g. "Summarize the clipboard" and there is no clipboard data
-    - If you need to find something and the data source is not obvious
-    - If you can't find anything after 5 searches
-  3. If you encounter an error, ALWAYS end the plan and ask the user what to do next.
-  In all of these cases, immediately use final_answer.
-
-  The most important rule is to be responsive. Specifically for any of the
-  conditions below you should call final_answer():
+  In the following cases, call final_answer with the right message to the user:
+    1. If a question is trivial (e.g. "What is the capital of France?" -> "Paris")
+    2. If you have completed the task
+    3. If it's unclear what you should do next, or you are stuck. For example:
+      - If the instructions are unclear, e.g. "Summarize the clipboard" and there is no clipboard data
+      - If you need to find something and the data source is not obvious
+      - If your team members cannot find the information you need
+    4. If you encounter an error, ALWAYS end the plan and ask the user what to do next.
+
+  If you have a clear next step, use the right tool to perform the action. 
+  If you use a tool:
+    1. Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.
+    2. Never re-do a tool call that you previously did with the exact same parameters.
+
+  It is very important to get back to the user quickly. Call final_answer if:
   - If a subagent was unsuccessful finding information
   - After calling 3 subagents
   - After calling 5 tools
-  There are no exceptions to this rule. WHEN IN DOUBT, CALL FINAL_ANSWER AND ASK THE USER.
+  This is true whether you have completed the task or not.
 
   # What to return in the final answer
   
@@ -63,12 +64,13 @@ planning:
   initial_plan : |-
     You are a helpful assistant and are making a plan to solve a task.
 
-    First, decide if you need to make a plan at all or call final_answer immediately. Remember:
+    First, decide if the tasks needs a plan at all.
     - Simple question -> final_answer
     - Unclear instructions -> final_answer
     - Complex task -> plan
+    If no plan is needed, call final_answer with the answer.
 
-    If you make a plan, it should be high-level and in natural language.
+    If you make a plan, it should be high-level.
     DO NOT DETAIL INDIVIDUAL TOOL CALLS, just describe the high level step in natural language.
     The Plan always starts with ### Plan:, each step starts with [ ], and ends with '<end_plan>' tag.
     Do NOT include the last step of calling final_answer in the plan.
@@ -122,48 +124,42 @@ planning:
     ```
     Now, write your plan.
   update_plan_pre_messages: |-
-    You are a helpful assistant and you are updating a plan to solve a task.
-    You have been given the following task:
+    You are a helpful assistant writing a plan for the **current** task only.
+
+    The current task is:
     ```
     {{task}}
     ```
 
-    Below you will find a history of progress so far to solve this task.
-    You need to update the plan to reflect the progress so far.
-    If you are stalled, you can make changes to the plan.
+    Below is the conversation history so far. This is for context only. You may not need it for your plan at all.
 
-    Find the task and history below:
+    Task and history follow:
   update_plan_post_messages: |-
-    Update the plan.
-    
-    Rules for the updated plan:
-    - Keep the SAME steps in the SAME order as your previous plan. Do NOT drop or renumber steps.
-    - Mark completed steps with [x]. Keep incomplete steps with [ ].
-    - Only add new steps if truly needed, marked with [+].
-    - Only if a step is no longer needed, mark it [~] (do NOT delete it).
-    - Keep each plan item very short and high-level (1 line each).
     
-    Consider whether it is better to get back to the user or to continue.
-    Being responsive is very important. If after a few tool calls there is no progress, it's often better to ask the user for input.
+    To output the plan, first decide which case applies:
+
+    A) Same task still in progress — You are mid-run on one task and the history below clearly continues that same work
+    (same goal as the current task block). Then revise your existing checklist: keep the same steps in the same order,
+    mark completed steps with [x], incomplete with [ ], add [+] only if truly needed, mark obsolete steps with [~] (do not delete lines).
+
+    B) New or unrelated task — The current task is a new user request or does not share an open checklist with the transcript.
+    Then write a **fresh** high-level plan from scratch (### Plan: with [ ] steps only). Do not carry over checklist lines
+    from a different question. Do **not** pretend you are "updating" a plan for an old task.
 
-    After the plan items, write a ### Status: section (1-2 lines) summarizing findings
-    so far, specifically a summary of the partial progress and any issues or unexpected findings.
-    End with <end_plan>.
+    In both cases, keep each plan item short (one line, no per-tool detail). After the plan, add ### Status: (1–2 lines) on
+    progress relevant to the **current** task only, then <end_plan>.
 
-    For example:
- 
+    Example for case A (same task, mid-run):
     ---
     ### Plan:
     [x] Perform a web search for the height of Kilimanjaro
     [ ] Perform a web search for the ceiling of a Cessna 172S
-    [ ] Provide a final answer based on the results
     ### Status:
     Found the height of Kilimanjaro (~5895m). Still searching for the Cessna 172S service ceiling.
     <end_plan>
     ---
 
     Beware that you have {remaining_steps} steps remaining.
-    Do not add superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS.
 
     You can leverage these tools:
     {%- for tool in tools.values() %}
@@ -183,7 +179,7 @@ planning:
     {%- endfor %}
     {%- endif %}
 
-    Now write your new plan below.
+    Now write your plan (and possibly status) below.
 managed_agent:
   task: |-
       You're a helpful quick agent named '{{name}}'.
diff --git a/macllm/core/chat_history.py b/macllm/core/chat_history.py
@@ -155,6 +155,13 @@ def run_agent():
             try:
                 conversation.abort_event.clear()
                 conversation.clear_tool_calls()
+                from smolagents import PlanningStep
+
+                conversation.agent.memory.steps = [
+                    s
+                    for s in conversation.agent.memory.steps
+                    if not isinstance(s, PlanningStep)
+                ]
                 conversation._run_step_offset = len(conversation.agent.memory.steps)
 
                 run_kwargs = dict(max_steps=10, reset=False)
@@ -388,6 +395,7 @@ def reset(self, clear_persisted: bool = False) -> None:
     def _create_agent(self, conversation=None):
         """Create agent instance using the current agent class."""
         from macllm.core.agent_service import create_agent
+        from smolagents import PlanningStep
 
         old_steps = None
         if self.agent is not None:
@@ -400,7 +408,9 @@ def _create_agent(self, conversation=None):
         )
 
         if old_steps is not None:
-            self.agent.memory.steps = old_steps
+            self.agent.memory.steps = [
+                s for s in old_steps if not isinstance(s, PlanningStep)
+            ]
 
 
 class ConversationHistory:
diff --git a/specs/conversation.md b/specs/conversation.md
@@ -57,7 +57,9 @@ This keeps UI context management separate from stored chat text.
 Each `Conversation` owns its complete agent runtime: the agent instance, the background thread, the abort event, token metadata, and pending approval state.
 
 `Conversation._create_agent()` rebuilds the agent through `create_agent(...)` in `macllm/core/agent_service.py`.
-When an agent is recreated, existing `agent.memory.steps` are preserved so the agent trace survives across re-instantiation within the same conversation.
+When an agent is recreated, existing `agent.memory.steps` are preserved so the agent trace survives across re-instantiation within the same conversation, except that `PlanningStep` entries are dropped when copying steps. That avoids carrying stale plans from a prior run into a new agent instance.
+
+Before each `agent.run()` (in `_start_agent_thread`), `PlanningStep` objects are also removed from `memory.steps` while keeping `TaskStep` and `ActionStep` history. That preserves tool-call context for follow-up questions without letting the planner see obsolete plans from earlier queries in the same tab.
 
 `Conversation.is_agent_running()` checks whether the agent thread is alive. Multiple conversations can have running agents simultaneously. Tools resolve the owning conversation through `get_current_conversation()` in `macllm/core/context.py` (see `specs/tools.md`).