-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Problem Summary:
Frog 2 was unable to read the insect-catching policy and the task proceeded without consulting the policy. The user reported that Frog 2 used the wrong op ("read" instead of "cat") for the flexus_policy_document tool; however, in the captured conversation there is no evidence of any flexus_policy_document tool call. Instead, the assistant directly called catch_insects and encountered confusing validation errors when requesting multiple insects at once.
Evidence:
- thread id: 8FXCEswvdK (feedback path: feedback/EgDjWOUmfl)
- thread.json shows flexus_policy_document is available in the toolset (so reading the policy was possible).
- No flexus_policy_document tool calls found in messages.json (grep for "flexus_policy_document" returned no matches).
- Key message sequence from feedback/EgDjWOUmfl/messages.json:
- Assistant call: catch_insects N=4 (call_55474874)
- Tool response: "Error: N must be positive. Capacity: 4/4 left." (ftm_num 7)
- Assistant call: catch_insects N=1 (call_23964423)
- Tool response: "Insect!" (ftm_num 9)
- Assistant call: catch_insects N=3 (call_21303140)
- Tool response: "Error: N must be positive. Capacity: 3/4 left." (ftm_num 11)
- Assistant ultimately caught 4 insects by calling N=1 three times and additional N=1 calls (ftm_num 14 calls and tool responses ftm_num 15-17)
- A tool help response unrelated but present: flexus_bot_kanban returned "Unknown op 'move'" when assistant attempted to move a task (shows other ops must be used). (message_100_19.txt)
Root cause analysis:
- There is no direct evidence the assistant called flexus_policy_document with any op (correct or incorrect). If the user observed a "read" op being used, that call is not present in the captured logs or it was truncated/omitted from the capture.
- The assistant proceeded without consulting policy and called catch_insects with N values >1. Those calls returned a misleading error message: "Error: N must be positive." when N was a positive integer (4 and 3). This indicates one or more issues:
- Either the assistant never attempted to read the policy (omission), or a flexus_policy_document call was made but not captured in feedback.
- The catch_insects tool has incorrect or unclear validation/error messages for multi-insect requests. The message "N must be positive" is wrong in these cases and should instead clarify whether the request exceeds capacity or if parsing failed.
Recommended next steps:
- Reproduce the issue with logging enabled for flexus_policy_document toolcalls so we can confirm whether any "read" op was invoked and how the tool responded.
- Audit and improve catch_insects input validation and error messages:
- Ensure errors clearly state whether N is invalid, exceeds capacity, or there was a parsing/formatting problem.
- If capacity limits exist, return explicit "exceeds capacity" errors with current capacity shown.
- Add a short guidance snippet in the system/setup prompt or tool schema advising bots to call flexus_policy_document(op="status+help") or flexus_policy_document(op="cat", args={"path":...}) to read policies before making assumptions.
- If a flexus_policy_document call with op="read" was indeed used in production, add validation in flexus_policy_document to suggest the supported op (e.g., "cat") when an unknown op is supplied.
If helpful, I can prepare a small reproducer script or test plan to validate both flexus_policy_document and catch_insects behaviors and attach logs for further triage.
Metadata
Metadata
Assignees
Labels
No labels