Releases: hud-evals/hud-python
v0.5.0
New major version of the HUD SDK (v0.5.0)!
What's new:
Simpler environments: New Environment class with @env.tool() and @env.scenario() decorators, which allow you to define both tools and evaluations in one place with a cleaner syntax. Cuts around 30% of LOC on our sample environments and makes it easier to track, create and run 100+ tasks both via SDK and platform!
Built-in A/B testing: New hud.eval() lets you test multiple models/configs in one line of code with variants and group parameters, all tracked on the platform if using hud.ai/models.
Unified model API: Call Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint.
Other observability features: See all live jobs on platform, store and track tasks on the new evalsets/environment scenario pages, better trace UI.
Migration/Backwards compatibility:
Existing environments and task configs work with all old (v4) commands!
If you wish to migrate to v5, all tasks work via Task.from_v4(), as well as the hud eval CLI command and with --remote runs! See here how you can migrate to new environments: https://docs.hud.ai/migration
v0.4.74 - Longer tool calling timeouts and Bedrock Agent
What's Changed
- add environment variable resolution to hud eval toml by @jdchawla29 in #233
- Move reasoning content to AgentResponse.reasoning by @jdchawla29 in #234
- Increase timeout settings to support long-running operations. by @jdchawla29 in #236
- update model configuration in eval CLI and template files by @jdchawla29 in #237
- Bedrock claude agent by @dylanbowman314 in #211
- configurable client timeout for MCP operations by @jdchawla29 in #238
- hotfixes by @jdchawla29 in #240
Full Changelog: v0.4.73...v0.4.74
v0.4.73 - Small models update
Merge pull request #231 from hud-evals/l/update-model-names-gateway L/update model names gateway
v0.4.72 - Custom trace ids for running tasks
add trace in run task
v0.4.71 - Model and hud dev updates
What's Changed
- Improved lock file by @shfunc in #188
- Migrate hud init to clone from separate repos by @farrelmahaztra in #187
- allow remote execution restriction for gemini agents by @jdchawla29 in #230
- Hud dev additions by @lorenss-m in #216
New Contributors
Full Changelog: v0.4.70...v0.4.71
v0.4.70 - Gemini and Gemini CUA Agent Support + Remote Running
v0.4.69 - OAI Computer Use Support, Grok 4 Support, GLM 4.5v support
What's Changed
- Add custom agent example using HUD Gateway for inference by @jdchawla29 in #223
- Preflight Check For Remote Agent Execution by @jdchawla29 in #222
- update model references by @jdchawla29 in #225
Full Changelog: v0.4.68...v0.4.69
v0.4.68 - Remote rollouts, hud gateway, and passthrough inference
What's Changed
- Introduced Responses Agent for OpenAI
- Improved tools provided by hud
- Added Codex agent and tools
- Introduced Remote eval runs, with passthrough inference.
- Removed unnecessary deps
Full Changelog: v0.4.67...v0.4.68
v0.4.67 - Fixed cache control on Claude Agent
What's Changed
Full Changelog: v0.4.66...v0.4.67
v0.4.66 - Additional beta CLI commands
Merge pull request #202 from hud-evals/j/openai-rft OpenAI RFT