Releases · hud-evals/hud-python

17 Dec 06:14

v0.5.0

cea28de

v0.5.0

New major version of the HUD SDK (v0.5.0)!

What's new:
Simpler environments: New Environment class with @env.tool() and @env.scenario() decorators, which allow you to define both tools and evaluations in one place with a cleaner syntax. Cuts around 30% of LOC on our sample environments and makes it easier to track, create and run 100+ tasks both via SDK and platform!
Built-in A/B testing: New hud.eval() lets you test multiple models/configs in one line of code with variants and group parameters, all tracked on the platform if using hud.ai/models.
Unified model API: Call Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint.
Other observability features: See all live jobs on platform, store and track tasks on the new evalsets/environment scenario pages, better trace UI.

Migration/Backwards compatibility:
Existing environments and task configs work with all old (v4) commands!
If you wish to migrate to v5, all tasks work via Task.from_v4(), as well as the hud eval CLI command and with --remote runs! See here how you can migrate to new environments: https://docs.hud.ai/migration

Assets 2

12 Dec 19:00

Parth220

v0.4.74

39a525e

v0.4.74 - Longer tool calling timeouts and Bedrock Agent

What's Changed

add environment variable resolution to hud eval toml by @jdchawla29 in #233
Move reasoning content to AgentResponse.reasoning by @jdchawla29 in #234
Increase timeout settings to support long-running operations. by @jdchawla29 in #236
update model configuration in eval CLI and template files by @jdchawla29 in #237
Bedrock claude agent by @dylanbowman314 in #211
configurable client timeout for MCP operations by @jdchawla29 in #238
hotfixes by @jdchawla29 in #240

Full Changelog: v0.4.73...v0.4.74

Contributors

dylanbowman314 and jdchawla29

Assets 2

08 Dec 12:23

lorenss-m

v0.4.73

dcc2d8c

v0.4.73 - Small models update

Merge pull request #231 from hud-evals/l/update-model-names-gateway

L/update model names gateway

Assets 2

07 Dec 09:47

lorenss-m

v0.4.72

5248072

v0.4.72 - Custom trace ids for running tasks

add trace in run task

Assets 2

07 Dec 09:38

lorenss-m

v0.4.71

2f35642

v0.4.71 - Model and hud dev updates

What's Changed

Improved lock file by @shfunc in #188
Migrate hud init to clone from separate repos by @farrelmahaztra in #187
allow remote execution restriction for gemini agents by @jdchawla29 in #230
Hud dev additions by @lorenss-m in #216

New Contributors

@shfunc made their first contribution in #188

Full Changelog: v0.4.70...v0.4.71

Contributors

farrelmahaztra, lorenss-m, and 2 other contributors

Assets 2

05 Dec 01:10

Parth220

v0.4.70

38d4d8e

v0.4.70 - Gemini and Gemini CUA Agent Support + Remote Running

What's Changed

claude streaming by @pimpale in #227
add openai tools and tests by @pimpale in #226
Split Gemini into Gemini and Gemini CUA by @pimpale in #228

Full Changelog: v0.4.69...v0.4.70

Contributors

pimpale

Assets 2

01 Dec 17:11

Parth220

v0.4.69

2e1a671

v0.4.69 - OAI Computer Use Support, Grok 4 Support, GLM 4.5v support

What's Changed

Add custom agent example using HUD Gateway for inference by @jdchawla29 in #223
Preflight Check For Remote Agent Execution by @jdchawla29 in #222
update model references by @jdchawla29 in #225

Full Changelog: v0.4.68...v0.4.69

Contributors

jdchawla29

Assets 2

28 Nov 12:44

Parth220

v0.4.68

4a8687b

v0.4.68 - Remote rollouts, hud gateway, and passthrough inference

What's Changed

Introduced Responses Agent for OpenAI
Improved tools provided by hud
Added Codex agent and tools
Introduced Remote eval runs, with passthrough inference.
Removed unnecessary deps

Full Changelog: v0.4.67...v0.4.68

Assets 2

22 Nov 01:50

Parth220

v0.4.67

32ab11c

v0.4.67 - Fixed cache control on Claude Agent

What's Changed

avoid cache control on tools, since it is applied automatically by @pimpale in #205

Full Changelog: v0.4.66...v0.4.67

Contributors

pimpale

Assets 2

21 Nov 04:55

lorenss-m

v0.4.66

083300b

v0.4.66 - Additional beta CLI commands

Merge pull request #202 from hud-evals/j/openai-rft

OpenAI RFT

Assets 2

Releases: hud-evals/hud-python

v0.5.0

Uh oh!

v0.4.74 - Longer tool calling timeouts and Bedrock Agent

What's Changed

Contributors

Uh oh!

v0.4.73 - Small models update

Uh oh!

v0.4.72 - Custom trace ids for running tasks

Uh oh!

v0.4.71 - Model and hud dev updates

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.70 - Gemini and Gemini CUA Agent Support + Remote Running

What's Changed

Contributors

Uh oh!

v0.4.69 - OAI Computer Use Support, Grok 4 Support, GLM 4.5v support

What's Changed

Contributors

Uh oh!

v0.4.68 - Remote rollouts, hud gateway, and passthrough inference

What's Changed

Uh oh!

v0.4.67 - Fixed cache control on Claude Agent

What's Changed

Contributors

Uh oh!

v0.4.66 - Additional beta CLI commands

Uh oh!