feat(llm/anthropic): drop top_p when both temperature and top_p set (closes #100)#159
Open
feat(llm/anthropic): drop top_p when both temperature and top_p set (closes #100)#159
Conversation
…e in request payload task=t1 validator: DIFF: 2 files changed, 30 insertions(+) — src/esperanto/providers/llm/anthropic.py (+7: import logging, logger = logging.getLogger(__name__), nested if/logger.debug block in _create_request_payload) and tests/providers/llm/test_anthropic_provider.py (+23: new test_temperature_drops_top_p_with_debug_log).; AC1: pass — test_temperature_drops_top_p_with_debug_log sets anthropic_model.top_p=0.95 (fixture has temperature=0.7), calls chat_complete, then asserts call_args[1]['json'] contains 'temperature' and NOT 'top_p'. Test passed (45/45 Anthropic suite).; AC2: pass — Same test uses caplog.at_level(DEBUG, logger='esperanto.providers.llm.anthropic') and asserts any record with levelno==DEBUG has both 'top_p' and 'Anthropic' in message. Logged string is 'Dropping top_p — Anthropic recommends setting only temperature OR top_p, not both.' — contains both required words. Test passed.; AC3: pass — Pre-existing test_chat_complete (fixture: temperature=0.7, no top_p) passed; payload asserts temperature==0.7 present, no top_p key. The new code path (temperature not None, top_p is None) reaches payload['temperature']=... without calling logger.debug. No regression.; AC4: pass — Pre-existing test_top_p_used_when_temperature_not_set passed: model with temperature=None and top_p=0.95, _create_request_payload called, asserts 'top_p' in payload, payload['top_p']==0.95, 'temperature' not in payload. The elif branch is correct.; AC5: pass — test_temperature_drops_top_p_with_debug_log exists at line 408 of tests/providers/llm/test_anthropic_provider.py, uses the anthropic_model fixture (mocked HTTP client), sets top_p on instance, calls chat_complete, inspects call_args[1]['json'] for payload shape, checks caplog for DEBUG record. Passed.; AC6: pass — All 45 pre-existing Anthropic tests passed in uv run pytest tests/providers/llm/test_anthropic_provider.py -v --no-cov.; AC7: pass — uv run pytest tests/providers tests/unit tests/common_types tests/test_deprecation_warnings.py -q --no-cov exited 0. Result: 945 passed, 1 skipped, 28 warnings. Previous run had a HuggingFace Hub HTTP 429 rate-limit error in test_qwen_model_configuration (pre-existing external flakiness); re-run cleared it.; AC8: pass — uv run ruff check . output: 'All checks passed!' exit 0. No new ruff errors.; AC9: pass — uv run mypy src/esperanto output: 'Success: no issues found in 73 source files' exit 0. No new mypy errors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Anthropic models reject API requests where both `temperature` and `top_p` are sent simultaneously (HTTP 400). This applies to all Anthropic models — version-string sniffing isn't needed; Anthropic's own docs discourage setting both.
This PR sanitizes the request in the Anthropic provider's request-builder: when both are set on the instance, `top_p` is dropped (keeping `temperature`, the more common knob), and a DEBUG log is emitted.
```python
if temperature is not None and top_p is not None:
logger.debug("Dropping top_p — Anthropic recommends setting only temperature OR top_p, not both.")
payload["temperature"] = temperature
elif temperature is not None:
payload["temperature"] = temperature
elif top_p is not None:
payload["top_p"] = top_p
```
This is the canonical implementation of the Model Quirks vs Unsupported Features principle from ARCHITECTURE.md (committed in #150): users should not need to know which model has which quirk.
Diff shape
```
src/esperanto/providers/llm/anthropic.py | 7 +++++++
tests/providers/llm/test_anthropic_provider.py | 23 +++++++++++++++++++++++
2 files changed, 30 insertions(+)
```
Test plan
References
andtop_p` cannot both be specified for this model." for anthropic passthrough BerriAI/litellm#15097Closes #100.