Skip to content

test(e2e): stabilize Mistral required streaming#1483

Merged
CatherineSue merged 1 commit into
mainfrom
codex/fix-mistral-required-streaming-flake
May 13, 2026
Merged

test(e2e): stabilize Mistral required streaming#1483
CatherineSue merged 1 commit into
mainfrom
codex/fix-mistral-required-streaming-flake

Conversation

@CatherineSue

@CatherineSue CatherineSue commented May 13, 2026

Copy link
Copy Markdown
Member

Description

Problem

After PR #1478 merged, TestToolChoiceMistral.test_tool_choice_required_streaming started showing flakiness: required streaming sometimes produced no streamed tool-call chunks. The comparable non-streaming required test already pins temperature=0.2, and the stricter required streaming arguments test pins temperature=0.1; this smoke test was still using the backend default sampling temperature.

Solution

Pin the Mistral required streaming smoke test to temperature=0.1 so the assertion exercises streaming tool-call plumbing instead of sampling variance. I also checked the Mistral e2e setup: it uses the model default chat template and --tool-call-parser mistral; no special chat template is injected. The Mistral path still uses the structural-tag constraint from MistralParser::build_structural_tag.

Changes

  • Set temperature=0.1 for test_tool_choice_required_streaming in the shared tool-choice e2e base.

Test Plan

  • python3 -m py_compile e2e_test/chat_completions/test_function_calling.py
  • GPU e2e not run locally; this should be validated by the PR e2e matrix, with the Mistral required streaming job rerun a few times to confirm the flake rate drops.
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

Release Notes

This release contains internal test enhancements with no user-visible changes.

  • Tests
    • Improved streaming function calling test coverage with adjusted test parameters.

Review Change Stack

Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
@github-actions github-actions Bot added the tests Test changes label May 13, 2026
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 94e7874f-06e6-4f98-bc0b-31bbbd81bbe9

📥 Commits

Reviewing files that changed from the base of the PR and between 401e666 and a6c1265.

📒 Files selected for processing (1)
  • e2e_test/chat_completions/test_function_calling.py

📝 Walkthrough

Walkthrough

The streaming chat completions test now explicitly sets temperature=0.1 in its request configuration.

Changes

Streaming Test Configuration

Layer / File(s) Summary
Streaming test temperature parameter
e2e_test/chat_completions/test_function_calling.py
The test_tool_choice_required_streaming method adds temperature=0.1 to the streaming request configuration.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

A tiny parameter, warm and bright,
temperature=0.1 shines with light,
In the streaming test it now does dwell,
Consistency served with care so well! 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: stabilizing a flaky Mistral streaming test by pinning temperature, which is the primary modification in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-mistral-required-streaming-flake

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good — single-line fix to stabilize a flaky e2e test by pinning temperature, consistent with the non-streaming sibling (temperature=0.2 at line 1037) and the stricter streaming test (temperature=0.1). No issues found.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the test_tool_choice_required_streaming test case in e2e_test/chat_completions/test_function_calling.py by adding a temperature parameter set to 0.1 to the chat completions request. This change likely improves the determinism of the test output. I have no feedback to provide as there were no review comments.

@CatherineSue CatherineSue merged commit cce4a63 into main May 13, 2026
102 checks passed
@CatherineSue CatherineSue deleted the codex/fix-mistral-required-streaming-flake branch May 13, 2026 05:20
zach-li-sudo pushed a commit to zach-li-sudo/smg that referenced this pull request May 13, 2026
Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant