[8FXCEswvdK] Missing flexus_policy_document call — assistant never read insect policy; misleading catch_insects errors

Problem Summary:
Frog 2 was unable to read the insect-catching policy and the task proceeded without consulting the policy. The user reported that Frog 2 used the wrong op ("read" instead of "cat") for the flexus_policy_document tool; however, in the captured conversation there is no evidence of any flexus_policy_document tool call. Instead, the assistant directly called catch_insects and encountered confusing validation errors when requesting multiple insects at once.

Evidence:
- thread id: 8FXCEswvdK (feedback path: feedback/EgDjWOUmfl)
- thread.json shows flexus_policy_document is available in the toolset (so reading the policy was possible).
- No flexus_policy_document tool calls found in messages.json (grep for "flexus_policy_document" returned no matches).
- Key message sequence from feedback/EgDjWOUmfl/messages.json:
  - Assistant call: catch_insects N=4 (call_55474874)
  - Tool response: "Error: N must be positive. Capacity: 4/4 left." (ftm_num 7)
  - Assistant call: catch_insects N=1 (call_23964423)
  - Tool response: "Insect!" (ftm_num 9)
  - Assistant call: catch_insects N=3 (call_21303140)
  - Tool response: "Error: N must be positive. Capacity: 3/4 left." (ftm_num 11)
  - Assistant ultimately caught 4 insects by calling N=1 three times and additional N=1 calls (ftm_num 14 calls and tool responses ftm_num 15-17)
- A tool help response unrelated but present: flexus_bot_kanban returned "Unknown op 'move'" when assistant attempted to move a task (shows other ops must be used). (message_100_19.txt)

Root cause analysis:
- There is no direct evidence the assistant called flexus_policy_document with any op (correct or incorrect). If the user observed a "read" op being used, that call is not present in the captured logs or it was truncated/omitted from the capture.
- The assistant proceeded without consulting policy and called catch_insects with N values >1. Those calls returned a misleading error message: "Error: N must be positive." when N was a positive integer (4 and 3). This indicates one or more issues:
  1) Either the assistant never attempted to read the policy (omission), or a flexus_policy_document call was made but not captured in feedback.
  2) The catch_insects tool has incorrect or unclear validation/error messages for multi-insect requests. The message "N must be positive" is wrong in these cases and should instead clarify whether the request exceeds capacity or if parsing failed.

Recommended next steps:
1) Reproduce the issue with logging enabled for flexus_policy_document toolcalls so we can confirm whether any "read" op was invoked and how the tool responded.
2) Audit and improve catch_insects input validation and error messages:
   - Ensure errors clearly state whether N is invalid, exceeds capacity, or there was a parsing/formatting problem.
   - If capacity limits exist, return explicit "exceeds capacity" errors with current capacity shown.
3) Add a short guidance snippet in the system/setup prompt or tool schema advising bots to call flexus_policy_document(op="status+help") or flexus_policy_document(op="cat", args={"path":...}) to read policies before making assumptions.
4) If a flexus_policy_document call with op="read" was indeed used in production, add validation in flexus_policy_document to suggest the supported op (e.g., "cat") when an unknown op is supplied.

If helpful, I can prepare a small reproducer script or test plan to validate both flexus_policy_document and catch_insects behaviors and attach logs for further triage.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[8FXCEswvdK] Missing flexus_policy_document call — assistant never read insect policy; misleading catch_insects errors #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[8FXCEswvdK] Missing flexus_policy_document call — assistant never read insect policy; misleading catch_insects errors #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions