Description
I am experiencing non-reproducible results when using the Azure OpenAI API, even when setting parameters explicitly to ensure deterministic behavior. My goal is to get identical outputs when running the same prompt under the same conditions. What I do is:
-
Set up the API connection by default, which is 2024-02-01. Fix the azure_openai_endpoint, and pf_deployment is gpt-4o-mini.
-
Run the prompt with the following fixed parameters:
{
"temperature": 0,
"top_p": str(top_p), # Changing from 0 to 1 in a loop to check if problem existis for all those values
"seed": 42,
"max_tokens": 100,
"frequency_penalty": 0,
"presence_penalty": 0,
"stream": false
}
-
Repeat the request 10 times per top_p value.
-
Observe the output variance.
Expected Behavior:
With temperature = 0 and a fixed seed, I expect deterministic outputs—meaning the same response should be returned each time for identical inputs.
Observed Behavior:
Despite fixing temperature = 0, seed = 42, and all other parameters, the responses vary.
Sometimes, all responses are different (in summaries outputs) for the same prompt.
This issue persists across multiple API versions, including this default 2024-02-01 and 2025-02-01-preview.