Skip to content

Releases: hud-evals/hud-python

v0.5.0

17 Dec 06:14
cea28de

Choose a tag to compare

New major version of the HUD SDK (v0.5.0)!

What's new:
Simpler environments: New Environment class with @env.tool() and @env.scenario() decorators, which allow you to define both tools and evaluations in one place with a cleaner syntax. Cuts around 30% of LOC on our sample environments and makes it easier to track, create and run 100+ tasks both via SDK and platform!
Built-in A/B testing: New hud.eval() lets you test multiple models/configs in one line of code with variants and group parameters, all tracked on the platform if using hud.ai/models.
Unified model API: Call Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint.
Other observability features: See all live jobs on platform, store and track tasks on the new evalsets/environment scenario pages, better trace UI.

Migration/Backwards compatibility:
Existing environments and task configs work with all old (v4) commands!
If you wish to migrate to v5, all tasks work via Task.from_v4(), as well as the hud eval CLI command and with --remote runs! See here how you can migrate to new environments: https://docs.hud.ai/migration

v0.4.74 - Longer tool calling timeouts and Bedrock Agent

12 Dec 19:00

Choose a tag to compare

What's Changed

Full Changelog: v0.4.73...v0.4.74

v0.4.73 - Small models update

08 Dec 12:23
dcc2d8c

Choose a tag to compare

Merge pull request #231 from hud-evals/l/update-model-names-gateway

L/update model names gateway

v0.4.72 - Custom trace ids for running tasks

07 Dec 09:47

Choose a tag to compare

v0.4.71 - Model and hud dev updates

07 Dec 09:38
2f35642

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.70...v0.4.71

v0.4.70 - Gemini and Gemini CUA Agent Support + Remote Running

05 Dec 01:10

Choose a tag to compare

What's Changed

Full Changelog: v0.4.69...v0.4.70

v0.4.69 - OAI Computer Use Support, Grok 4 Support, GLM 4.5v support

01 Dec 17:11

Choose a tag to compare

What's Changed

Full Changelog: v0.4.68...v0.4.69

v0.4.68 - Remote rollouts, hud gateway, and passthrough inference

28 Nov 12:44

Choose a tag to compare

What's Changed

  • Introduced Responses Agent for OpenAI
  • Improved tools provided by hud
  • Added Codex agent and tools
  • Introduced Remote eval runs, with passthrough inference.
  • Removed unnecessary deps

Full Changelog: v0.4.67...v0.4.68

v0.4.67 - Fixed cache control on Claude Agent

22 Nov 01:50

Choose a tag to compare

What's Changed

  • avoid cache control on tools, since it is applied automatically by @pimpale in #205

Full Changelog: v0.4.66...v0.4.67

v0.4.66 - Additional beta CLI commands

21 Nov 04:55
083300b

Choose a tag to compare

Merge pull request #202 from hud-evals/j/openai-rft

OpenAI RFT