Skip to content

[8FXCEswvdK] Missing flexus_policy_document call — assistant never read insect policy; misleading catch_insects errors #85

@flexus-teams

Description

@flexus-teams

Problem Summary:
Frog 2 was unable to read the insect-catching policy and the task proceeded without consulting the policy. The user reported that Frog 2 used the wrong op ("read" instead of "cat") for the flexus_policy_document tool; however, in the captured conversation there is no evidence of any flexus_policy_document tool call. Instead, the assistant directly called catch_insects and encountered confusing validation errors when requesting multiple insects at once.

Evidence:

  • thread id: 8FXCEswvdK (feedback path: feedback/EgDjWOUmfl)
  • thread.json shows flexus_policy_document is available in the toolset (so reading the policy was possible).
  • No flexus_policy_document tool calls found in messages.json (grep for "flexus_policy_document" returned no matches).
  • Key message sequence from feedback/EgDjWOUmfl/messages.json:
    • Assistant call: catch_insects N=4 (call_55474874)
    • Tool response: "Error: N must be positive. Capacity: 4/4 left." (ftm_num 7)
    • Assistant call: catch_insects N=1 (call_23964423)
    • Tool response: "Insect!" (ftm_num 9)
    • Assistant call: catch_insects N=3 (call_21303140)
    • Tool response: "Error: N must be positive. Capacity: 3/4 left." (ftm_num 11)
    • Assistant ultimately caught 4 insects by calling N=1 three times and additional N=1 calls (ftm_num 14 calls and tool responses ftm_num 15-17)
  • A tool help response unrelated but present: flexus_bot_kanban returned "Unknown op 'move'" when assistant attempted to move a task (shows other ops must be used). (message_100_19.txt)

Root cause analysis:

  • There is no direct evidence the assistant called flexus_policy_document with any op (correct or incorrect). If the user observed a "read" op being used, that call is not present in the captured logs or it was truncated/omitted from the capture.
  • The assistant proceeded without consulting policy and called catch_insects with N values >1. Those calls returned a misleading error message: "Error: N must be positive." when N was a positive integer (4 and 3). This indicates one or more issues:
    1. Either the assistant never attempted to read the policy (omission), or a flexus_policy_document call was made but not captured in feedback.
    2. The catch_insects tool has incorrect or unclear validation/error messages for multi-insect requests. The message "N must be positive" is wrong in these cases and should instead clarify whether the request exceeds capacity or if parsing failed.

Recommended next steps:

  1. Reproduce the issue with logging enabled for flexus_policy_document toolcalls so we can confirm whether any "read" op was invoked and how the tool responded.
  2. Audit and improve catch_insects input validation and error messages:
    • Ensure errors clearly state whether N is invalid, exceeds capacity, or there was a parsing/formatting problem.
    • If capacity limits exist, return explicit "exceeds capacity" errors with current capacity shown.
  3. Add a short guidance snippet in the system/setup prompt or tool schema advising bots to call flexus_policy_document(op="status+help") or flexus_policy_document(op="cat", args={"path":...}) to read policies before making assumptions.
  4. If a flexus_policy_document call with op="read" was indeed used in production, add validation in flexus_policy_document to suggest the supported op (e.g., "cat") when an unknown op is supplied.

If helpful, I can prepare a small reproducer script or test plan to validate both flexus_policy_document and catch_insects behaviors and attach logs for further triage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions