Skip to content

feat(llm/anthropic): drop top_p when both temperature and top_p set (closes #100)#159

Open
lfnovo wants to merge 1 commit intomainfrom
harny/issue-100
Open

feat(llm/anthropic): drop top_p when both temperature and top_p set (closes #100)#159
lfnovo wants to merge 1 commit intomainfrom
harny/issue-100

Conversation

@lfnovo
Copy link
Copy Markdown
Owner

@lfnovo lfnovo commented May 2, 2026

Summary

Anthropic models reject API requests where both `temperature` and `top_p` are sent simultaneously (HTTP 400). This applies to all Anthropic models — version-string sniffing isn't needed; Anthropic's own docs discourage setting both.

This PR sanitizes the request in the Anthropic provider's request-builder: when both are set on the instance, `top_p` is dropped (keeping `temperature`, the more common knob), and a DEBUG log is emitted.

```python
if temperature is not None and top_p is not None:
logger.debug("Dropping top_p — Anthropic recommends setting only temperature OR top_p, not both.")
payload["temperature"] = temperature
elif temperature is not None:
payload["temperature"] = temperature
elif top_p is not None:
payload["top_p"] = top_p
```

This is the canonical implementation of the Model Quirks vs Unsupported Features principle from ARCHITECTURE.md (committed in #150): users should not need to know which model has which quirk.

Diff shape

```
src/esperanto/providers/llm/anthropic.py | 7 +++++++
tests/providers/llm/test_anthropic_provider.py | 23 +++++++++++++++++++++++
2 files changed, 30 insertions(+)
```

Test plan

  • New `test_temperature_drops_top_p_with_debug_log`: sets both, asserts outgoing payload has `temperature` but NOT `top_p`, asserts DEBUG log is emitted with both 'top_p' and 'Anthropic' in message
  • Pre-existing `test_chat_complete` (only `temperature` set): unchanged behavior, no log
  • Pre-existing `test_top_p_used_when_temperature_not_set` (only `top_p` set): unchanged behavior, top_p passes through
  • All 45 Anthropic tests pass
  • `uv run pytest tests/providers tests/unit tests/common_types tests/test_deprecation_warnings.py -q --no-cov` exits 0 (945 passed)
  • `uv run ruff check .` clean
  • `uv run mypy src/esperanto` clean

References

Closes #100.

…e in request payload

task=t1
validator: DIFF: 2 files changed, 30 insertions(+) — src/esperanto/providers/llm/anthropic.py (+7: import logging, logger = logging.getLogger(__name__), nested if/logger.debug block in _create_request_payload) and tests/providers/llm/test_anthropic_provider.py (+23: new test_temperature_drops_top_p_with_debug_log).; AC1: pass — test_temperature_drops_top_p_with_debug_log sets anthropic_model.top_p=0.95 (fixture has temperature=0.7), calls chat_complete, then asserts call_args[1]['json'] contains 'temperature' and NOT 'top_p'. Test passed (45/45 Anthropic suite).; AC2: pass — Same test uses caplog.at_level(DEBUG, logger='esperanto.providers.llm.anthropic') and asserts any record with levelno==DEBUG has both 'top_p' and 'Anthropic' in message. Logged string is 'Dropping top_p — Anthropic recommends setting only temperature OR top_p, not both.' — contains both required words. Test passed.; AC3: pass — Pre-existing test_chat_complete (fixture: temperature=0.7, no top_p) passed; payload asserts temperature==0.7 present, no top_p key. The new code path (temperature not None, top_p is None) reaches payload['temperature']=... without calling logger.debug. No regression.; AC4: pass — Pre-existing test_top_p_used_when_temperature_not_set passed: model with temperature=None and top_p=0.95, _create_request_payload called, asserts 'top_p' in payload, payload['top_p']==0.95, 'temperature' not in payload. The elif branch is correct.; AC5: pass — test_temperature_drops_top_p_with_debug_log exists at line 408 of tests/providers/llm/test_anthropic_provider.py, uses the anthropic_model fixture (mocked HTTP client), sets top_p on instance, calls chat_complete, inspects call_args[1]['json'] for payload shape, checks caplog for DEBUG record. Passed.; AC6: pass — All 45 pre-existing Anthropic tests passed in uv run pytest tests/providers/llm/test_anthropic_provider.py -v --no-cov.; AC7: pass — uv run pytest tests/providers tests/unit tests/common_types tests/test_deprecation_warnings.py -q --no-cov exited 0. Result: 945 passed, 1 skipped, 28 warnings. Previous run had a HuggingFace Hub HTTP 429 rate-limit error in test_qwen_model_configuration (pre-existing external flakiness); re-run cleared it.; AC8: pass — uv run ruff check . output: 'All checks passed!' exit 0. No new ruff errors.; AC9: pass — uv run mypy src/esperanto output: 'Success: no issues found in 73 source files' exit 0. No new mypy errors.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support models that reject simultaneous temperature + top_p (e.g. Claude 4.x)

1 participant