Skip to content

Fix example evals #4

@davidaparicio

Description

@davidaparicio

Need to fix some example evals like : FAILED examples/triage_agent/evals.py::test_conversation_is_successful[messages0] - assert False == True or this one:

___________________________________________________________________________________________________ test_does_not_call_weather_when_not_asked[Hi!] ____________________________________________________________________________________________________

query = 'Hi!'

    @pytest.mark.parametrize(
        "query",
        [
            "Who's the president of the United States?",
            "What is the time right now?",
            "Hi!",
        ],
    )
    def test_does_not_call_weather_when_not_asked(query):
        tool_calls = run_and_get_tool_calls(weather_agent, query)

>       assert not tool_calls
E       assert not [{'function': {'arguments': '{"location": "New York", "time": "now"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]

examples/weather_agent/evals.py:44: AssertionError
==== short test summary info =====
FAILED examples/weather_agent/evals.py::test_does_not_call_weather_when_not_asked[Who's the president of the United States?] - assert not [{'function': {'arguments': '{"location": "United States"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]
FAILED examples/weather_agent/evals.py::test_does_not_call_weather_when_not_asked[What is the time right now?] - assert not [{'function': {'arguments': '{"location": "", "time": "now"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]
FAILED examples/weather_agent/evals.py::test_does_not_call_weather_when_not_asked[Hi!] - assert not [{'function': {'arguments': '{"location": "New York", "time": "now"}', 'name': 'get_weather'}, 'id': 'call_0', 'type': 'function'}]
====== 3 failed, 3 passed in 3.10s ======

Screenshot 2024-10-31 at 22 44 14

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions