Fix fail llama test #2819

Vidit-Ostwal · 2025-05-12T21:30:08Z

Registered a new interaction.

lucasgomide · 2025-05-13T11:00:16Z

tests/utilities/test_converter.py

@@ -359,7 +359,7 @@ def test_convert_with_instructions():

 @pytest.mark.vcr(filter_headers=["authorization"])
 def test_converter_with_llama3_2_model():
-    llm = LLM(model="ollama/llama3.2:3b", base_url="http://localhost:11434")
+    llm = LLM(model="openrouter/meta-llama/llama-3.2-3b-instruct", api_key='ABC')


Why does this change has fixed the flaky?

Honestly, I am still confused.

Initially I tested the previous cassettes file, locally on 3.10,3.11,3.12
and they were working completely fine.

Still the test case when I was making on the PR were failing, so I thought maybe If a different interaction is being made, might resolve the issue.

I am still confused to completely understand why the test case made, became flaky in the first place (couldn't find anything odd)

Also reading from the logs of another PR,

model = 'llama3.2:3b' messages = [{'content': 'Name: Alice Llama, Age: 30', 'role': 'user'}, {'content': '', 'role': 'assistant', 'tool_calls': [{'func...\': \'Age\', \'type\': \'integer\'}}, \'required\': [\'age\', \'name\'], \'type\': \'object\'}}}\n', 'role': 'system'}] timeout = 600.0, temperature = None, top_p = None, n = None, stream = None stream_options = None, stop = None, max_completion_tokens = None max_tokens = None, modalities = None, prediction = None, audio = None presence_penalty = None, frequency_penalty = None, logit_bias = None user = None, reasoning_effort = None, response_format = None, seed = None tools = [{'function': {'description': 'Correctly extracted `SimpleModel` with all the required parameters with correct types',...'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['age', 'name'], 'type': 'object'}}, 'type': 'function'}] tool_choice = {'function': {'name': 'SimpleModel'}, 'type': 'function'} logprobs = None, top_logprobs = None, parallel_tool_calls = None deployment_id = None, extra_headers = None, functions = None function_call = None, base_url = None, api_version = None, api_key = None model_list = None, thinking = None kwargs = {'litellm_call_id': 'c1f47fba-ec49-40af-ba33-f12f2ebebea9', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7fd020dc6810>} args = {'acompletion': False, 'api_base': 'http://localhost:11434', 'api_key': None, 'api_version': None, ...} api_base = 'http://localhost:11434', mock_response = None mock_tool_calls = None, mock_timeout = None, force_timeout = 600 logger_fn = None

I feel that at runtime it's not able to catch the cassettes file, trying to make an actual call.

Vidit-Ostwal · 2025-05-16T14:49:46Z

I ran this test cases 50 times for 3.10, 3.11, 3.12

Successful in first test run :)

tests/utilities/test_converter.py

lucasgomide

Thanks @Vidit-Ostwal, I really appreciate your work.

I think you've shown that this interaction fixes the flaky test, so I believe it's okay to go ahead and merge it.
I'm going to be very honest though… it's still unclear to me why the flaky test was happening. I'm not a big fan of this kind of situation: something still feels unclear, even to you. You just changed the model, and the test magically stopped failing.
Another point is I'm not sure why we wrote a test to converter for custom LLM model. But since you are using a very similar approach - it might be safety(?).

Going to approve that with this consideration.. but again good work!

Vidit-Ostwal · 2025-05-16T19:29:05Z

I think you've shown that this interaction fixes the flaky test, so I believe it's okay to go ahead and merge it. I'm going to be very honest though… it's still unclear to me why the flaky test was happening. I'm not a big fan of this kind of situation: something still feels unclear, even to you.

Agreed even I am not a big fan of this, coding is deterministic, but this randomness is out of some explanation

You just changed the model, and the test magically stopped failing.

I think the issue was with the .yaml interaction file. I’m not exactly sure what went wrong—especially since all the matchers seemed to be working fine—but my hunch is that re-recording the same HTTP interaction could solve it. It’s a bit like restarting your PC or phone when something’s off and you don’t know why. Not the biggest fan of that approach, but sometimes it just works.

Another point is I'm not sure why we wrote a test to converter for custom LLM model. But since you are using a very similar approach - it might be safety(?).

Yup, even I am confused why a converter was test specifically with llama3_2?
@lucasgomide, thanks for your input.

Vidit-Ostwal added 5 commits May 13, 2025 02:58

Changed test case

8e5f33a

Addd new interaction with llama

b838c4c

fixed linting issue

6fc83de

Gemma Flaky test case

76c5c25

Gemma Flaky test case

a12b8dd

Vidit-Ostwal mentioned this pull request May 13, 2025

Fix agent kn reset #2765

Merged

lucasgomide reviewed May 13, 2025

View reviewed changes

Vidit-Ostwal added 3 commits May 13, 2025 19:58

Merge branch 'main' into fix-fail-llama-test

abbef14

Minor change

28fe6e1

Minor change

d48b86c

Vidit-Ostwal requested a review from lucasgomide May 14, 2025 04:47

Merge branch 'main' into fix-fail-llama-test

cf50980

lucasgomide reviewed May 16, 2025

View reviewed changes

tests/utilities/test_converter.py Outdated Show resolved Hide resolved

Vidit-Ostwal added 2 commits May 17, 2025 00:34

Dropped API key

c5e4384

Removed falky test case check file

2dd452f

Vidit-Ostwal requested a review from lucasgomide May 16, 2025 19:10

lucasgomide approved these changes May 16, 2025

View reviewed changes

lucasgomide merged commit aa6e5b7 into crewAIInc:main May 16, 2025
6 checks passed

Vidit-Ostwal deleted the fix-fail-llama-test branch May 16, 2025 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix fail llama test #2819

Fix fail llama test #2819

Uh oh!

Vidit-Ostwal commented May 12, 2025

Uh oh!

lucasgomide May 13, 2025

Uh oh!

Vidit-Ostwal May 13, 2025 •

edited

Loading

Uh oh!

Vidit-Ostwal commented May 16, 2025

Uh oh!

Uh oh!

lucasgomide left a comment

Uh oh!

Uh oh!

Vidit-Ostwal commented May 16, 2025

Uh oh!

Uh oh!

Fix fail llama test #2819

Fix fail llama test #2819

Uh oh!

Conversation

Vidit-Ostwal commented May 12, 2025

Uh oh!

lucasgomide May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Vidit-Ostwal May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vidit-Ostwal commented May 16, 2025

Uh oh!

Uh oh!

lucasgomide left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Vidit-Ostwal commented May 16, 2025

Uh oh!

Uh oh!

Vidit-Ostwal May 13, 2025 •

edited

Loading