Skip to content

Latest commit

Β 

History

History
107 lines (78 loc) Β· 4.15 KB

File metadata and controls

107 lines (78 loc) Β· 4.15 KB

Changelog

All notable changes to this project will be documented in this file.

[0.4] - 2025-05-21

πŸš€ Features

  • Support thinking model in judge
  • Add nb_tool_call as ops metrics + add MCP_BRIFGE_URL + format
  • Parquet dataset support + ocr metrics + notebook demo
  • Add and handle new with_vision and prelude_prompt attribute
  • Calculation of the environmental impact of models for the response generation part.
  • Creation of two new environmental metrics: energy_consumption and gwp_consumption.

πŸ”§ Improvements

  • [UI] display of the environmental brick in the OPS pane and experiments_set metric results.

πŸ› Bug Fixes

  • (runners) Nb_tool_calls metrics computation
  • Improve log level warning.
  • (ui) Use two point float precision in score table.
  • Temporary url for MCP bridge
  • Tool activation and rag metrics error handling
  • (ui) Show dataset name and all model paramsi in expeset overview
  • (mcp) Allow tool_choice tuning
  • (clients) Add support for aliases models in v1/models
  • (api) Judge model must be unique in a set.
  • Strip answer + think
  • Fix multi-step agent loop generation if max_steps is reached.
  • Remove rerun_metric in patch exp + better handle error in patch expset route + fix format
  • Disallow model_judge patch for experiment and experiment_set
  • Parquet support and schemas
  • (schema) Rename prompt_system to system_prompt
  • Columns_map for ocr marker demo dataset
  • Dataset views
  • (tasks) Empty query

πŸ’Ό Other

  • Strip answer + not test on integrer !!!
  • (runner) Limit deep search steps + tool_choice 'none'.
  • Import collections
  • (mcp) Fix the multi-step loop
  • Unbound variables
  • Non blocking model sync

[0.3.1] - 2025-04-02

πŸš€ Features

  • [API] Support Anthropic, Openai, Mistral and Albert providers for judge models judge_model parameter in experiments (models are fetch from the openai api v1/models endpoints)
  • [SCRIPTS] ADD convenient scripts to run experiment from an isolated environment (e.g. like cortex, see the tutorial )

πŸ”§ Improvements

  • [UI] Add a special card for orphan experiments at the bottom of the experiments list.
  • [UI] Order the experiment set from the newest first
  • [UI] Remove old confusing experiments menu in favor of only the experiment sets menu (renamed simply experiments)

[0.3] - 2025-03-27

πŸš€ Features

  • Added experiment set with cross-validation parameters and demo notebooks.
  • Integrated multiple RAG metrics for deep evaluation.
  • Supported delete experiment route for admin users.
  • Introduced new retry and post routes with UI improvements.
  • Added experiments 'finished' and 'failure' ratio in overview.
  • Integrated MCP support and multi-step LLM generations with MCP client bridge.
  • New tests for an increase code coverage and addressed pydantic warnings.
  • Implemented loop limit and tool call step saving.
  • Improved sorting and metrics highlighting in the experiment set score table.

πŸ› Bug Fixes

  • Enhanced error handling for missing metric input and baseline demo notebook.
  • Removed unnecessary attributes and improved schema validation.
  • Fixed various UI bugs and improved experiment view.
  • Improved notebook variable names and used public endpoints.
  • Enhanced GitHub Actions CI and addressed Alembic issues.
  • Corrected schema serialization and computation needs.
  • Improved experiment status updates and endpoint terminology.
  • Handled unknown model cases and improved dataset visibility.
  • Fixed various typos and improved model sorting and ops board status.
  • Improved schema validation and error detail return for API.
  • Addressed issues with experiment view and retry functionality.

πŸ› οΈ Code Improvements

  • Reorganized code structure (pip ready) and fixed import issues.
  • Moved API components to clients and adjusted imports accordingly.

πŸ”₯ Hotfixes

  • Addressed dataset and SQL float compatibility issues.
  • Updated configuration files for supervisord and Alembic.

βš™οΈ Operations

  • Added Docker and Docusaurus configuration files.
  • Fix supervisord path to deploy.