-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Fix fail llama test #2819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fail llama test #2819
Conversation
tests/utilities/test_converter.py
Outdated
@@ -359,7 +359,7 @@ def test_convert_with_instructions(): | |||
|
|||
@pytest.mark.vcr(filter_headers=["authorization"]) | |||
def test_converter_with_llama3_2_model(): | |||
llm = LLM(model="ollama/llama3.2:3b", base_url="http://localhost:11434") | |||
llm = LLM(model="openrouter/meta-llama/llama-3.2-3b-instruct", api_key='ABC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this change has fixed the flaky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I am still confused.
Initially I tested the previous cassettes file, locally on 3.10,3.11,3.12
and they were working completely fine.
Still the test case when I was making on the PR were failing, so I thought maybe If a different interaction is being made, might resolve the issue.
I am still confused to completely understand why the test case made, became flaky in the first place (couldn't find anything odd)
Also reading from the logs of another PR,
model = 'llama3.2:3b'
messages = [{'content': 'Name: Alice Llama, Age: 30', 'role': 'user'}, {'content': '', 'role': 'assistant', 'tool_calls': [{'func...\': \'Age\', \'type\': \'integer\'}}, \'required\': [\'age\', \'name\'], \'type\': \'object\'}}}\n', 'role': 'system'}]
timeout = 600.0, temperature = None, top_p = None, n = None, stream = None
stream_options = None, stop = None, max_completion_tokens = None
max_tokens = None, modalities = None, prediction = None, audio = None
presence_penalty = None, frequency_penalty = None, logit_bias = None
user = None, reasoning_effort = None, response_format = None, seed = None
tools = [{'function': {'description': 'Correctly extracted `SimpleModel` with all the required parameters with correct types',...'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['age', 'name'], 'type': 'object'}}, 'type': 'function'}]
tool_choice = {'function': {'name': 'SimpleModel'}, 'type': 'function'}
logprobs = None, top_logprobs = None, parallel_tool_calls = None
deployment_id = None, extra_headers = None, functions = None
function_call = None, base_url = None, api_version = None, api_key = None
model_list = None, thinking = None
kwargs = {'litellm_call_id': 'c1f47fba-ec49-40af-ba33-f12f2ebebea9', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7fd020dc6810>}
args = {'acompletion': False, 'api_base': 'http://localhost:11434', 'api_key': None, 'api_version': None, ...}
api_base = 'http://localhost:11434', mock_response = None
mock_tool_calls = None, mock_timeout = None, force_timeout = 600
logger_fn = None
I feel that at runtime it's not able to catch the cassettes file, trying to make an actual call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Vidit-Ostwal, I really appreciate your work.
I think you've shown that this interaction fixes the flaky test, so I believe it's okay to go ahead and merge it.
I'm going to be very honest though… it's still unclear to me why the flaky test was happening. I'm not a big fan of this kind of situation: something still feels unclear, even to you. You just changed the model, and the test magically stopped failing.
Another point is I'm not sure why we wrote a test to converter for custom LLM model. But since you are using a very similar approach - it might be safety(?).
Going to approve that with this consideration.. but again good work!
Agreed even I am not a big fan of this, coding is deterministic, but this randomness is out of some explanation
I think the issue was with the .yaml interaction file. I’m not exactly sure what went wrong—especially since all the matchers seemed to be working fine—but my hunch is that re-recording the same HTTP interaction could solve it. It’s a bit like restarting your PC or phone when something’s off and you don’t know why. Not the biggest fan of that approach, but sometimes it just works.
Yup, even I am confused why a converter was test specifically with llama3_2? |
Fixes #2817
Registered a new interaction.