V0.12.1#106
Merged
Merged
Conversation
Comment on lines
246
to
279
Check failure
Code scanning / CodeQL
Missing rate limiting High
itomek
pushed a commit
that referenced
this pull request
Mar 12, 2026
# GAIA v0.12.1 Release Notes ## Overview This patch release focuses on bug fixes and improvements to the evaluation framework, particularly addressing issues with the visualization and reporting tools. All changes improve the reliability and usability of the `gaia eval`, `gaia visualize`, and `gaia report` commands. ## What's Changed ### Bug Fixes #### 🔧 Fix Evaluation Visualizer Model Count and Path Issues (#823) Fixed multiple critical issues in the `gaia visualize` and `gaia report` commands: - **Incorrect Model Count in Consolidated Report**: Fixed model count calculation in the webapp to show the correct number of models (was showing only 4 instead of 8) - Now calculates unique models directly from `metadata.evaluation_files` instead of filtered/grouped data - **Windows Path Separator Bug**: Fixed cross-platform compatibility issue in `isMainEvaluationEntry()` function - Now handles both Unix (`/`) and Windows (`\`) path separators correctly - **Incorrect Default Directory Paths**: Updated default paths to match actual evaluation output locations - Changed from `workspace/evaluation` to `workspace/output/evaluations` - Changed from `workspace/experiments` to `workspace/output/experiments` - **Outdated Report Filename**: Updated default report filename from `LLM_RAG_Evaluation_Report.md` to `LLM_Evaluation_Report.md` - Better reflects support for multiple evaluation types (RAG, summarization, etc.) **Files Changed**: `src/gaia/cli.py`, `src/gaia/eval/eval.py`, `src/gaia/eval/webapp/public/app.js` ### Improvements #### 📊 Standardize Evaluation Workflow Default Directories (#820) Implemented consistent default parameters across all evaluation commands with a unified directory structure: ``` ./output/ ├── test_data/ # gaia generate ├── groundtruth/ # gaia groundtruth ├── experiments/ # gaia batch-experiment └── evaluations/ # gaia eval ``` **Key Changes**: - Added centralized directory constants in `cli.py` - Added `GAIA_WORKSPACE` environment variable support for flexible workspace management - Updated all command defaults to use the new structure - Updated documentation in `docs/eval.md` and `docs/cli.md` **Benefits**: - Consistency: All evaluation artifacts organized in one location - Maintainability: Centralized constants eliminate duplication - Flexibility: Workspace environment variable for managing multiple projects - Cleanup: Single directory to clean or ignore **Files Changed**: Multiple files including CLI, evaluation modules, webapp components, and documentation #### 🏷️ Improve Reporting for Cloud Model Identifiers (#834) Enhanced model counting logic in the Evaluation Visualizer to support additional cloud model identifiers: - Added support for 'gpt-4' and 'gemini' model identifiers - Improved accuracy of model classification in reports **Files Changed**: `src/gaia/eval/webapp/public/app.js` ## Contributors - Kalin Ovtcharov (@kalin-ovtcharov) ## Upgrade Notes If you have existing evaluation workflows, note the following directory changes: - `./evaluation` → `./output/evaluations` - `./experiments` → `./output/experiments` You can set the `GAIA_WORKSPACE` environment variable to use a custom workspace location if needed. --- **Full Changelog**: v0.12.0...v0.12.1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
GAIA v0.12.1 Release Notes
Overview
This patch release focuses on bug fixes and improvements to the evaluation framework, particularly addressing issues with the visualization and reporting tools. All changes improve the reliability and usability of the
gaia eval,gaia visualize, andgaia reportcommands.What's Changed
Bug Fixes
🔧 Fix Evaluation Visualizer Model Count and Path Issues (#823)
Fixed multiple critical issues in the
gaia visualizeandgaia reportcommands:Incorrect Model Count in Consolidated Report: Fixed model count calculation in the webapp to show the correct number of models (was showing only 4 instead of 8)
metadata.evaluation_filesinstead of filtered/grouped dataWindows Path Separator Bug: Fixed cross-platform compatibility issue in
isMainEvaluationEntry()function/) and Windows (\) path separators correctlyIncorrect Default Directory Paths: Updated default paths to match actual evaluation output locations
workspace/evaluationtoworkspace/output/evaluationsworkspace/experimentstoworkspace/output/experimentsOutdated Report Filename: Updated default report filename from
LLM_RAG_Evaluation_Report.mdtoLLM_Evaluation_Report.mdFiles Changed:
src/gaia/cli.py,src/gaia/eval/eval.py,src/gaia/eval/webapp/public/app.jsImprovements
📊 Standardize Evaluation Workflow Default Directories (#820)
Implemented consistent default parameters across all evaluation commands with a unified directory structure:
Key Changes:
cli.pyGAIA_WORKSPACEenvironment variable support for flexible workspace managementdocs/eval.mdanddocs/cli.mdBenefits:
Files Changed: Multiple files including CLI, evaluation modules, webapp components, and documentation
🏷️ Improve Reporting for Cloud Model Identifiers (#834)
Enhanced model counting logic in the Evaluation Visualizer to support additional cloud model identifiers:
Files Changed:
src/gaia/eval/webapp/public/app.jsContributors
Upgrade Notes
If you have existing evaluation workflows, note the following directory changes:
./evaluation→./output/evaluations./experiments→./output/experimentsYou can set the
GAIA_WORKSPACEenvironment variable to use a custom workspace location if needed.Full Changelog: v0.12.0...v0.12.1