Skip to content

V0.12.1#106

Merged
kovtcharov merged 3 commits into
mainfrom
v0.12.1
Oct 22, 2025
Merged

V0.12.1#106
kovtcharov merged 3 commits into
mainfrom
v0.12.1

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

GAIA v0.12.1 Release Notes

Overview

This patch release focuses on bug fixes and improvements to the evaluation framework, particularly addressing issues with the visualization and reporting tools. All changes improve the reliability and usability of the gaia eval, gaia visualize, and gaia report commands.

What's Changed

Bug Fixes

🔧 Fix Evaluation Visualizer Model Count and Path Issues (#823)

Fixed multiple critical issues in the gaia visualize and gaia report commands:

  • Incorrect Model Count in Consolidated Report: Fixed model count calculation in the webapp to show the correct number of models (was showing only 4 instead of 8)

    • Now calculates unique models directly from metadata.evaluation_files instead of filtered/grouped data
  • Windows Path Separator Bug: Fixed cross-platform compatibility issue in isMainEvaluationEntry() function

    • Now handles both Unix (/) and Windows (\) path separators correctly
  • Incorrect Default Directory Paths: Updated default paths to match actual evaluation output locations

    • Changed from workspace/evaluation to workspace/output/evaluations
    • Changed from workspace/experiments to workspace/output/experiments
  • Outdated Report Filename: Updated default report filename from LLM_RAG_Evaluation_Report.md to LLM_Evaluation_Report.md

    • Better reflects support for multiple evaluation types (RAG, summarization, etc.)

Files Changed: src/gaia/cli.py, src/gaia/eval/eval.py, src/gaia/eval/webapp/public/app.js

Improvements

📊 Standardize Evaluation Workflow Default Directories (#820)

Implemented consistent default parameters across all evaluation commands with a unified directory structure:

./output/
├── test_data/          # gaia generate
├── groundtruth/        # gaia groundtruth
├── experiments/        # gaia batch-experiment
└── evaluations/        # gaia eval

Key Changes:

  • Added centralized directory constants in cli.py
  • Added GAIA_WORKSPACE environment variable support for flexible workspace management
  • Updated all command defaults to use the new structure
  • Updated documentation in docs/eval.md and docs/cli.md

Benefits:

  • Consistency: All evaluation artifacts organized in one location
  • Maintainability: Centralized constants eliminate duplication
  • Flexibility: Workspace environment variable for managing multiple projects
  • Cleanup: Single directory to clean or ignore

Files Changed: Multiple files including CLI, evaluation modules, webapp components, and documentation

🏷️ Improve Reporting for Cloud Model Identifiers (#834)

Enhanced model counting logic in the Evaluation Visualizer to support additional cloud model identifiers:

  • Added support for 'gpt-4' and 'gemini' model identifiers
  • Improved accuracy of model classification in reports

Files Changed: src/gaia/eval/webapp/public/app.js

Contributors

  • Kalin Ovtcharov (@kalin-ovtcharov)

Upgrade Notes

If you have existing evaluation workflows, note the following directory changes:

  • ./evaluation./output/evaluations
  • ./experiments./output/experiments

You can set the GAIA_WORKSPACE environment variable to use a custom workspace location if needed.


Full Changelog: v0.12.0...v0.12.1

@kovtcharov kovtcharov requested review from itomek and vgodsoe October 22, 2025 06:30
@kovtcharov kovtcharov self-assigned this Oct 22, 2025
Comment thread src/gaia/eval/webapp/server.js Dismissed
Comment thread src/gaia/eval/webapp/server.js Outdated
Comment on lines 246 to 279

Check failure

Code scanning / CodeQL

Missing rate limiting High

This route handler performs
a file system access
, but is not rate-limited.
This route handler performs
a file system access
, but is not rate-limited.
This route handler performs
a file system access
, but is not rate-limited.
This route handler performs
a file system access
, but is not rate-limited.
Comment thread src/gaia/eval/webapp/server.js Dismissed
@kovtcharov kovtcharov enabled auto-merge (squash) October 22, 2025 06:37
@kovtcharov kovtcharov disabled auto-merge October 22, 2025 06:38
@kovtcharov kovtcharov merged commit 84f0fd2 into main Oct 22, 2025
19 of 23 checks passed
@kovtcharov kovtcharov deleted the v0.12.1 branch October 22, 2025 06:39
itomek pushed a commit that referenced this pull request Mar 12, 2026
# GAIA v0.12.1 Release Notes

## Overview

This patch release focuses on bug fixes and improvements to the
evaluation framework, particularly addressing issues with the
visualization and reporting tools. All changes improve the reliability
and usability of the `gaia eval`, `gaia visualize`, and `gaia report`
commands.

## What's Changed

### Bug Fixes

#### 🔧 Fix Evaluation Visualizer Model Count and Path Issues (#823)

Fixed multiple critical issues in the `gaia visualize` and `gaia report`
commands:

- **Incorrect Model Count in Consolidated Report**: Fixed model count
calculation in the webapp to show the correct number of models (was
showing only 4 instead of 8)
- Now calculates unique models directly from `metadata.evaluation_files`
instead of filtered/grouped data

- **Windows Path Separator Bug**: Fixed cross-platform compatibility
issue in `isMainEvaluationEntry()` function
- Now handles both Unix (`/`) and Windows (`\`) path separators
correctly

- **Incorrect Default Directory Paths**: Updated default paths to match
actual evaluation output locations
- Changed from `workspace/evaluation` to `workspace/output/evaluations`
- Changed from `workspace/experiments` to `workspace/output/experiments`

- **Outdated Report Filename**: Updated default report filename from
`LLM_RAG_Evaluation_Report.md` to `LLM_Evaluation_Report.md`
- Better reflects support for multiple evaluation types (RAG,
summarization, etc.)

**Files Changed**: `src/gaia/cli.py`, `src/gaia/eval/eval.py`,
`src/gaia/eval/webapp/public/app.js`

### Improvements

#### 📊 Standardize Evaluation Workflow Default Directories (#820)

Implemented consistent default parameters across all evaluation commands
with a unified directory structure:

```
./output/
├── test_data/          # gaia generate
├── groundtruth/        # gaia groundtruth
├── experiments/        # gaia batch-experiment
└── evaluations/        # gaia eval
```

**Key Changes**:
- Added centralized directory constants in `cli.py`
- Added `GAIA_WORKSPACE` environment variable support for flexible
workspace management
- Updated all command defaults to use the new structure
- Updated documentation in `docs/eval.md` and `docs/cli.md`

**Benefits**:
- Consistency: All evaluation artifacts organized in one location
- Maintainability: Centralized constants eliminate duplication
- Flexibility: Workspace environment variable for managing multiple
projects
- Cleanup: Single directory to clean or ignore

**Files Changed**: Multiple files including CLI, evaluation modules,
webapp components, and documentation

#### 🏷️ Improve Reporting for Cloud Model Identifiers (#834)

Enhanced model counting logic in the Evaluation Visualizer to support
additional cloud model identifiers:

- Added support for 'gpt-4' and 'gemini' model identifiers
- Improved accuracy of model classification in reports

**Files Changed**: `src/gaia/eval/webapp/public/app.js`

## Contributors

- Kalin Ovtcharov (@kalin-ovtcharov)

## Upgrade Notes

If you have existing evaluation workflows, note the following directory
changes:

- `./evaluation` → `./output/evaluations`
- `./experiments` → `./output/experiments`

You can set the `GAIA_WORKSPACE` environment variable to use a custom
workspace location if needed.

---

**Full Changelog**:
v0.12.0...v0.12.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants