Skip to content

[OPIK-4987] [FE][BE] feat: integrate Ollie Console sidebar into Opik frontend#5680

Draft
Nimrod007 wants to merge 75 commits intomainfrom
nimrodlahav/add-ollie-sidebar-to-opik
Draft

[OPIK-4987] [FE][BE] feat: integrate Ollie Console sidebar into Opik frontend#5680
Nimrod007 wants to merge 75 commits intomainfrom
nimrodlahav/add-ollie-sidebar-to-opik

Conversation

@Nimrod007
Copy link
Copy Markdown
Collaborator

@Nimrod007 Nimrod007 commented Mar 16, 2026

Summary

  • Add Ollie Console chat sidebar to Opik frontend behind OLLIE_CONSOLE_ENABLED feature toggle
  • Uses the comet plugin system — zero ollie code in the open-source codebase
  • Plugin loads @comet-ml/ollie-sidebar library from private comet-ml/ollie-console repo
  • Toggle defaults to false — zero impact on open-source users

What changed

Backend:

  • ollieConsoleEnabled field in ServiceTogglesConfig.java
  • Config entry in config.yml via TOGGLE_OLLIE_CONSOLE_ENABLED env var

Frontend (open-source):

  • OLLIE_CONSOLE_ENABLED feature toggle (enum + default state)
  • OllieSidebar slot added to PluginsStore (null for OSS)
  • PageLayout renders plugin if provided, with lazy-load fallback for dev mode
  • Content area adjusts width via --ollie-sidebar-width CSS variable
  • Vite resolve.alias for React to prevent duplicate instances

Frontend (comet plugin — private):

  • plugins/comet/OllieSidebar.tsx — the actual implementation
    • Lazy-loads ChatSidebar from @comet-ml/ollie-sidebar
    • CSS loaded at runtime via ?raw to bypass Tailwind 3 PostCSS
    • Open/close state persisted in localStorage
    • Feature-toggled via OLLIE_CONSOLE_ENABLED

Architecture

Open-source Opik                      Private (comet plugin)
┌─────────────────────────┐    ┌──────────────────────────────────┐
│ PluginsStore:           │    │ plugins/comet/OllieSidebar.tsx   │
│   OllieSidebar = null   │◄───│   lazy-loads @comet-ml/ollie-   │
│                         │    │   sidebar from npm/CDN           │
│ PageLayout:             │    │   passes bridge API (auth,       │
│   plugin ?? fallback    │    │   workspace, theme)              │
└─────────────────────────┘    └──────────────────────────────────┘
                                           │
                                           ▼
                              comet-ml/ollie-console repo
                              (separate release cycle)

Local development setup

Prerequisites: Clone both repos as siblings in the same parent directory:

your-code-dir/
├── opik/                    # this repo
└── ollie-console/           # git clone git@github.com:comet-ml/ollie-console.git

Step 1: Build the ollie-sidebar library

cd ollie-console
npm install
cd packages/ollie-sidebar
npm run build    # produces dist/index.mjs, dist/index.js, dist/styles.css

Step 2: Install Opik frontend dependencies

cd opik/apps/opik-frontend
npm install      # resolves @comet-ml/ollie-sidebar via file: reference to ../../../ollie-console/packages/ollie-sidebar

Step 3: Enable the feature toggle

In apps/opik-backend/config.yml, temporarily change the default:

ollieConsoleEnabled: ${TOGGLE_OLLIE_CONSOLE_ENABLED:-"true"}

Step 4: Start everything

cd opik
./scripts/dev-runner.sh --restart

Open http://localhost:5174 — the Ollie sidebar appears on the right.

After changing ollie-sidebar code:

cd ollie-console/packages/ollie-sidebar
npm run build    # rebuild library
# Vite HMR picks up the changes automatically

Test plan

  • Frontend type-checks cleanly (tsc --noEmit)
  • Frontend builds cleanly (vite build)
  • Toggle off: No sidebar rendered, no console errors
  • Toggle on: Sidebar renders at 380px, content area shrinks
  • No duplicate React errors (verified with Playwright)
  • Plugin system: OllieSidebar loaded via plugin in comet mode
  • Dev mode fallback: OllieSidebar lazy-loaded directly when plugin not active
  • Backend builds cleanly

🤖 Generated with Claude Code

…frontend

Add Ollie Console chat sidebar behind OLLIE_CONSOLE_ENABLED feature
toggle. The sidebar renders as a 380px right panel using the
@comet-ml/ollie-sidebar library from the private ollie-console repo.

Backend:
- Add ollieConsoleEnabled to ServiceTogglesConfig (default: false)

Frontend:
- Add OLLIE_CONSOLE_ENABLED feature toggle
- Create OllieSidebar wrapper with lazy loading (React.lazy)
- Integrate into PageLayout with dynamic width via CSS variables
- Load ollie CSS at runtime (?raw) to bypass Tailwind 3 PostCSS
- Alias React in Vite config to prevent duplicate React instances
- Install @comet-ml/ollie-sidebar via file: reference (local dev)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added dependencies Pull requests that update a dependency file java Pull requests that update Java code Frontend Backend typescript *.ts *.tsx labels Mar 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

📋 PR Linter Failed

Missing Section. The description is missing the ## Details section.


Missing Section. The description is missing the ## Change checklist section.


Missing Section. The description is missing the ## Issues section.


Missing Section. The description is missing the ## Testing section.


Missing Section. The description is missing the ## Documentation section.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Unit Tests

1 496 tests   1 494 ✅  54s ⏱️
  180 suites      2 💤
  180 files        0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 7

1 137 tests   1 136 ✅  6m 2s ⏱️
    9 suites      1 💤
    9 files        0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 15

172 tests   170 ✅  3m 44s ⏱️
 27 suites    2 💤
 27 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 5

251 tests   251 ✅  2m 17s ⏱️
 26 suites    0 💤
 26 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 11

 23 files   23 suites   3m 47s ⏱️
131 tests 131 ✅ 0 💤 0 ❌
113 runs  113 ✅ 0 💤 0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 12

184 tests   182 ✅  6m 0s ⏱️
 36 suites    2 💤
 36 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 8

280 tests   280 ✅  4m 46s ⏱️
 22 suites    0 💤
 22 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 13

445 tests   442 ✅  3m 45s ⏱️
 21 suites    3 💤
 21 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 16

 16 files   16 suites   1m 0s ⏱️
187 tests 187 ✅ 0 💤 0 ❌
165 runs  165 ✅ 0 💤 0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 6

105 tests   105 ✅  2m 47s ⏱️
 23 suites    0 💤
 23 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 9

322 tests   321 ✅  8m 48s ⏱️
 24 suites    1 💤
 24 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 4

1 362 tests   1 362 ✅  8m 49s ⏱️
    5 suites      0 💤
    5 files        0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 10

253 tests   251 ✅  7m 7s ⏱️
 21 suites    2 💤
 21 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 3

313 tests   313 ✅  9m 49s ⏱️
 29 suites    0 💤
 29 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 1

413 tests   413 ✅  13m 13s ⏱️
 24 suites    0 💤
 24 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

Comment on lines +775 to +778
agentConfigurationEnabled: ${TOGGLE_AGENT_CONFIGURATION_ENABLED:-"false"}
# Default: false
# Description: Whether or not Ollie Console sidebar is enabled
ollieConsoleEnabled: ${TOGGLE_OLLIE_CONSOLE_ENABLED:-"false"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ollieConsoleEnabled was added as a feature flag but its comment doesn't follow the required template and omits explicit metadata: default false, units/format boolean, the component it gates (Ollie Console sidebar in the Opik UI), scope global, the operational/safety rationale, and a suggested safe range. This omission makes the flag not production-safe or discoverable and risks it being enabled before FE and backend are ready. Can we update the comment to explicitly state default false, boolean format, that it gates the Ollie Console sidebar in the Opik UI, its global scope, the safety rationale (disabling preserves the current sidebar; enabling requires FE rollout and a backend feature flag), and recommend defaulting to false until FE and backend ship?

Finding type: Config defaults and compatibility | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-backend/config.yml around lines 775 to 778, the configuration key
`ollieConsoleEnabled` currently has an incomplete comment. Replace the current two-line
comment with a standard template comment that explicitly states: Default: false;
Units/format: boolean toggle; Component/behavior gated: Ollie Console sidebar in the
Opik UI; Scope: global default (backend flag); Operational impact/safety rationale:
disabling preserves the existing sidebar experience, enabling requires coordinated FE
rollout and backend flag activation; Suggested safe range: false until both frontend and
backend changes are deployed. Keep the same comment style/indentation as the neighboring
`agentConfigurationEnabled` entry.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit cc50b4f addressed this comment by removing the ollieConsoleEnabled toggle and its incomplete comment from apps/opik-backend/config.yml, eliminating the need for the requested metadata.

Comment on lines +13 to +24
const loadOllieCss = () => {
const id = "ollie-sidebar-styles";
if (document.getElementById(id)) return;

import("@comet-ml/ollie-sidebar/styles.css?raw").then((css) => {
const style = document.createElement("style");
style.id = id;
style.textContent = css.default;
document.head.appendChild(style);
});
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The component uses magic literals for config: useLocalStorageState("ollie-sidebar-open"…), a style element id = "ollie-sidebar-styles", and the collapsed width 32. Hardcoding these values risks inconsistencies and makes changes error‑prone. Can we extract them into named constants (e.g. OLLIE_SIDEBAR_STORAGE_KEY, OLLIE_SIDEBAR_STYLE_ID, OLLIE_SIDEBAR_COLLAPSED_WIDTH) and reference those instead?

Finding type: Avoid hardcoded configuration values | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-frontend/src/components/layout/OllieSidebar/OllieSidebar.tsx around lines
13 to 45, the code currently uses hardcoded literals: the localStorage key
"ollie-sidebar-open", the style element id "ollie-sidebar-styles", and the collapsed
width literal 32. Refactor by adding named constants near the top of the file (for
example OLLIE_SIDEBAR_STORAGE_KEY, OLLIE_SIDEBAR_STYLE_ID, and
OLLIE_SIDEBAR_COLLAPSED_WIDTH), replace the raw string/number occurrences in
loadOllieCss, the useLocalStorageState call, and the onWidthChange call with these
constants, and ensure the constants are exported or documented if they will be reused
elsewhere.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit cc50b4f addressed this comment by removing the OllieSidebar plugin file entirely, so the hardcoded storage key, style ID, and collapsed width literals no longer exist in that component, resolving the concern about extracting those values into constants.

Comment on lines +35 to +46
const [isOpen, setIsOpen] = useLocalStorageState("ollie-sidebar-open", {
defaultValue: true,
});

useEffect(() => {
if (!isEnabled) {
onWidthChange(0);
return;
}
onWidthChange(isOpen ? OLLIE_SIDEBAR_WIDTH : 32);
}, [isEnabled, isOpen, onWidthChange]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isOpen is only set to false in handleClose and never reset to true, so after closing the sidebar onWidthChange keeps emitting 32px and the layout stays collapsed. Can we switch to a controlled open prop or add an onOpen callback that calls setIsOpen(true) so the width toggles back to 380px when the sidebar reopens?

Finding types: prefer direct React patterns Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-frontend/src/components/layout/OllieSidebar/OllieSidebar.tsx around lines
35-61, the local state `isOpen` (from useLocalStorageState("ollie-sidebar-open")) is
only set to false via handleClose and never set back to true when ChatSidebar reopens,
so onWidthChange keeps emitting 32 and the layout never expands. Refactor so the
ChatSidebar is controlled: pass an explicit `open={isOpen}` prop (instead of
defaultOpen) and add an onOpen/onToggle handler that calls `setIsOpen(true)` when the
sidebar opens. Also ensure the useEffect that calls onWidthChange reads the controlled
open state so width is set to OLLIE_SIDEBAR_WIDTH when the user opens the sidebar and
back to 32 when closed. Preserve the localStorage initial state (use the current
defaultValue) and remove reliance on defaultOpen for future toggles.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit cc50b4f addressed this comment by deleting apps/opik-frontend/src/plugins/comet/OllieSidebar.tsx, which removed the affected component and eliminated the previous ChatSidebar state management issue.

Comment on lines +57 to +61
return (
<div className="absolute right-0 top-[var(--banner-height)] bottom-0 z-10">
<Suspense>
<ChatSidebar onClose={handleClose} defaultOpen={isOpen} />
</Suspense>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI renders empty while ChatSidebar lazy-loads because the surrounding <Suspense> has no fallback. Per .agents/skills/opik-frontend/performance.md this omits the required placeholder; can we add an explicit fallback to <Suspense> (even null or a small skeleton)?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-frontend/src/components/layout/OllieSidebar/OllieSidebar.tsx around lines
57 to 61, the Suspense wrapper for ChatSidebar is rendered without a fallback, causing
an empty render while the lazy chunk loads. Update the return so Suspense includes an
explicit fallback prop (for example a small skeleton placeholder div or a concise
loading indicator, or at minimum fallback={null}) and ensure the placeholder matches the
sidebar dimensions/positioning so the layout doesn't shift. Keep the change local to the
JSX return (no major refactor) and prefer a small accessible placeholder component if
available.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Backend Tests - Integration Group 14

267 tests   267 ✅  8m 48s ⏱️
 29 suites    0 💤
 29 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

TS SDK E2E Tests - Node 18

238 tests   236 ✅  18m 15s ⏱️
 25 suites    2 💤
  1 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

TS SDK E2E Tests - Node 20

238 tests   236 ✅  18m 0s ⏱️
 25 suites    2 💤
  1 files      0 ❌

Results for commit 5d49549.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Python SDK E2E Tests Results (Python 3.10)

238 tests  ±0   236 ✅ ±0   9m 45s ⏱️ +47s
  1 suites ±0     2 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 5d49549. ± Comparison against base commit def49d0.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.
tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d06fb-aea2-73a8-8a19-20ba83a7ff2b]
tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d0b5f-7f4b-739d-8313-606b854df7b4]

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Python SDK E2E Tests Results (Python 3.13)

238 tests  ±0   236 ✅ ±0   10m 47s ⏱️ + 1m 32s
  1 suites ±0     2 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 5d49549. ± Comparison against base commit def49d0.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.
tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d06f9-82af-7745-a955-52411673549d]
tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d0b5e-820a-7c57-99df-49da4a2c8e5e]

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 16, 2026

Python SDK E2E Tests Results (Python 3.11)

238 tests  ±0   236 ✅ ±0   9m 43s ⏱️ +20s
  1 suites ±0     2 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 5d49549. ± Comparison against base commit def49d0.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.
tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d06f8-1eb9-7fab-a908-4b947aa7f0e8]
tests.e2e.test_tracing ‑ test_opik_client__update_trace__happy_flow[None-None-None-None-019d0b5f-8ddb-7ab4-97b2-ca22b6252d85]

♻️ This comment has been updated with latest results.

thiagohora and others added 25 commits March 25, 2026 10:38
…lback lookups (#5760)

* [OPIK-4938] [BE] Add project-scoped endpoint tests for prompts, datasets, experiments, and dashboards

Add FindProjectPrompts, FindProjectDatasets, FindProjectExperiments, and
FindProjectDashboards nested test classes that mirror their workspace-scoped
counterparts, exercising the /v1/private/projects/{projectId}/{resource}
endpoints with filtering, pagination, and sorting coverage.

Also extract shared assertion helpers (assertPromptsPage, assertDashboardPage)
to outer test classes for reuse, and add getProjectDashboards/getProjectPrompts
client methods to the respective resource clients.

* Revision 2: Fix missing @RequiredPermissions on ProjectPromptsResource and rename duplicate "By" in test method names

* Revision 3: Remove @RequiredPermissions from ProjectPromptsResource that caused test compilation issue

* Revision 4: Apply spotless formatting

* [OPIK-4938] [BE] Add X-Opik-Deprecation header for workspace-wide fallback lookups

When a dataset, prompt, or dashboard is found via workspace-wide search
(i.e., the requested projectId did not match but a workspace-wide record
exists), the response now includes X-Opik-Deprecation with a formatted
message warning that explicit project scoping will be required in a
future version.

* Revision 2: Add tests for X-Opik-Deprecation workspace-fallback header

Tests verify:
- Header is returned with full formatted message when fallback to
  workspace-wide search is used (non-matching/non-existent project)
- Header is absent when the entity is found directly in the requested project

* Revision 3: Use WORKSPACE_FALLBACK_MESSAGE_TEMPLATE constant in tests

* Revision 4: Fix fallback message when project name is provided but does not exist

* Revision 5: Remove workspace fallback message from DashboardService (no expose endpoint)

* Revision 6: Extract setWorkspaceFallbackFor helper to centralize fallback message setting

* Revision 7: Fix OutOfScopeException in reactive stream by introducing resolveDatasetByName

Split the blocking findByName(DatasetIdentifier) from the reactive stream
path. Added resolveDatasetByName(DatasetIdentifier, Visibility) to the
DatasetService interface that captures workspaceId on the request thread
and returns a Mono<Dataset>, so the reactive chain in DatasetItemService
no longer accesses the @RequestScoped RequestContext from a non-request
thread.

---------

Co-authored-by: Andres Cruz <andresc@comet.com>
…_config() (#5751)

* [NA] [SDK] feat: require @track context when calling get_agent_config()

Raises RuntimeError if get_agent_config() is called outside a function
decorated with @opik.track, replacing the previous approach of emitting
a warning on attribute access when the mask didn't match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix tests

* use opik_context

* use context manager in tests

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix breadcrumb to show experiment name instead of static "Compare"
- Rename "Legacy dataset" label to "Dataset" in evaluation suites
- Hide item ID column by default on compare experiments page
- Open global settings dialog on global assertions tag click
… correctness (#5726)

* [OPIK-5050] [BE] fix: replace FINAL with LIMIT 1 BY in trace thread queries and add exponential backoff

Replace FINAL with LIMIT 1 BY subqueries in TraceThreadDAO to avoid
full table scans on the closing job hot path. Push status filter into
subquery for FIND_PENDING_CLOSURE_THREADS_SQL to skip inactive rows
early. Fix correctness issue in FIND_THREADS_BY_PROJECT_SQL where
mutable column filters were applied before deduplication, potentially
returning stale results. Add exponential backoff to
TraceThreadsClosingJob to prevent hammering ClickHouse on consecutive
failures.

* [OPIK-5050] [BE] chore: address PR review feedback

- Simplify LIMIT 1 BY (workspace_id, project_id, thread_id, id) to
  LIMIT 1 BY id in FIND_PENDING_CLOSURE_THREADS_SQL and
  OPEN_CLOSURE_THREADS_SQL for consistency with rest of codebase
- Add SETTINGS log_comment to FIND_THREADS_BY_PROJECT_SQL,
  FIND_PENDING_CLOSURE_THREADS_SQL, and OPEN_CLOSURE_THREADS_SQL
  for query observability in ClickHouse
- Fix log placeholder formatting to use single-quoted '{}' per
  project conventions

* [OPIK-5050] [BE] chore: fix import ordering (spotless)

* [OPIK-5050] [BE] fix: escape angle brackets in SQL to prevent StringTemplate interpolation

The < and > SQL comparison operators in FIND_PENDING_CLOSURE_THREADS_SQL
were being interpreted as StringTemplate delimiters after switching from
raw string to getSTWithLogComment, silently corrupting the query.

* [OPIK-5050] [BE] fix: revert log_comment on queries with SQL angle brackets

FIND_PENDING_CLOSURE_THREADS_SQL and OPEN_CLOSURE_THREADS_SQL contain
SQL < and > operators which StringTemplate interprets as template
delimiters, silently corrupting the rendered query. Revert these two
queries to use raw strings like the original code. Keep log_comment
on FIND_THREADS_BY_PROJECT_SQL which only uses ST template expressions.

* [OPIK-5050] [BE] perf: use time-bounded FINAL with minmax skip index for closing job query

Replace LIMIT 1 BY approach with time-bounded FINAL + minmax skip index
on last_updated_at. The closing job query now only scans recent granules
instead of the entire trace_threads table, reducing granules read from
369/369 to 12/369 (97% reduction) in benchmarks.

- Add cached getMaxTimeoutMarkThreadAsInactive to compute lookback window
- Bind cached_max_inactive_period parameter in DAO
- Add use_skip_indexes_if_final=1 SETTINGS to enable skip index with FINAL
- Add cache config for max_timeout (30min TTL)

* [OPIK-5050] [BE] perf: add cold start lookback, GROUP BY optimization, increase default job interval

- Add 7-day cold start lookback on first run after startup to catch threads
  that became stale during outages
- Normal lookback floor: max(maxTimeout + 1h, 1 day) via minmax skip index
- GROUP BY workspace_id, project_id, status with min(last_updated_at) in
  subquery to reduce rows before workspace_configurations JOIN
- Increase default OPIK_CLOSE_TRACE_THREAD_JOB_INTERVAL from 3s to 15s
- Add minmax skip index migration for last_updated_at (GRANULARITY 1)

* [OPIK-5050] [BE] chore: add cache config docs, bump lock time to match interval

- Add documentation comment for max_timeout_mark_thread_as_inactive cache
- Increase closeTraceThreadJobLockTime from 4s to 14s to match the 15s
  job interval (prevents premature lock release with more accumulated work)

* [OPIK-5050] [BE] chore: update MAX_BACKOFF_EXPONENT comment for 15s interval

* [OPIK-5050] [BE] chore: improve backoff comment, use LEFT ANY JOIN for workspace config

- Clarify MAX_BACKOFF_EXPONENT comment to describe doubling pattern
- Use LEFT ANY JOIN for workspace_configurations (at most one row per
  workspace after FINAL dedup, communicates intent and is slightly more efficient)

* [OPIK-5050] [BE] fix: move success handler to onComplete, add migration comment

- Mono<Void> never emits onNext, so completedFirstRun/backoff reset was
  unreachable. Move success logic to onComplete (3rd subscribe arg).
- Add --comment to migration file per convention.

* [OPIK-5050] [BE] chore: move cold-start lookback and max backoff exponent to config

Move COLD_START_LOOKBACK and MAX_BACKOFF_EXPONENT from hardcoded
constants to TraceThreadConfig, backed by env vars
OPIK_CLOSE_TRACE_THREAD_COLD_START_LOOKBACK (default 7d) and
OPIK_CLOSE_TRACE_THREAD_MAX_BACKOFF_EXPONENT (default 5).

Also fix log message in onComplete handler ("started" -> "completed").

* [OPIK-5050] [BE] fix: rename migration 71 -> 73 to avoid conflict with main

* [OPIK-5050] [BE] chore: remove benchmark script, expand config docs
* [OPIK-4714] add new permission

* [OPIK-4714] add permission checks

* [OPIK-4714] add checks for dataset items

* [OPIK-4714] Add missing imports

* [OPIK-4714] refactor

* [OPIK-4714] fix after merge
Co-authored-by: Andres Cruz <andresc@comet.com>
)

Bumps [com.diffplug.spotless:spotless-maven-plugin](https://github.com/diffplug/spotless) from 3.3.0 to 3.4.0.
- [Release notes](https://github.com/diffplug/spotless/releases)
- [Changelog](https://github.com/diffplug/spotless/blob/main/CHANGES.md)
- [Commits](diffplug/spotless@lib/3.3.0...maven/3.4.0)

---
updated-dependencies:
- dependency-name: com.diffplug.spotless:spotless-maven-plugin
  dependency-version: 3.4.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Andres Cruz <andresc@comet.com>
#5780)

Bumps [org.jdbi:jdbi3-stringtemplate4](https://github.com/jdbi/jdbi) from 3.51.0 to 3.52.0.
- [Release notes](https://github.com/jdbi/jdbi/releases)
- [Changelog](https://github.com/jdbi/jdbi/blob/master/RELEASE_NOTES.md)
- [Commits](jdbi/jdbi@v3.51.0...v3.52.0)

---
updated-dependencies:
- dependency-name: org.jdbi:jdbi3-stringtemplate4
  dependency-version: 3.52.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Andres Cruz <andresc@comet.com>
…cs (#5784)

* add[docs]: section for running online evals retrospectively

* Optimised images with calibre/image-actions

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…istence (#5787)

* [OPIK-4966] [FE] refactor: extract shared deps from v1 for v1/v2 coexistence

Move components, hooks, types, and constants out of v1/ into shared
locations so both v1 and v2 can use the same instances. This prevents
broken contexts and duplicated code when v2 is cloned from v1.

Extracted to shared:
- PageBodyStickyContainer → src/shared/
- PageBodyScrollContainer context → src/contexts/
- BaseTraceDataTypeIcon → src/shared/
- VerticallySplitCellWrapper → src/shared/
- UserComment folder → src/shared/
- GoogleColabCardCore → src/shared/
- ConfigurationType, GoogleColabCardCoreProps → src/types/shared
- ProviderGridOption → src/types/providers
- WorkspacePreference types/constants → src/constants/workspace-preferences
- theme-provider, server-sync-provider, feature-toggles-provider → src/contexts/
- integration-scripts, integration-logs → src/constants/

Moved to v1 (not shared, has v1 deps):
- TraceCountCell → v1/pages-shared/traces/
- PromptImprovementDialog → v1/pages-shared/llm/

Updated dependency-cruiser:
- Broadened no-shared-importing-pages to block all v1/v2 imports
- Added no-shared-infra-importing-versioned rule
- Removed stale v1 provider exceptions from hooks rule

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [OPIK-4966] [FE] chore: remove stale dependency-cruiser violation rules

Cleanup outdated dependency-cruiser exceptions for `no-hooks-importing-components` and `no-shared-importing-pages`, aligning with recent refactors.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…v2 (#5791)

Clone all v1 page components, layout, pages-shared, and router into the
v2 directory structure. Update all imports from @/v1/ to @/v2/ within
cloned files. Update dependency-cruiser known violations baseline to
include v2 copies of pre-existing circular deps.

This establishes the v2 code base that will be independently modified
for project-first navigation.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…optimization write endpoints (#5772)

* [OPIK-4938] [BE] Add project_name support to dataset, experiment and optimization endpoints

- Add project_name field to DatasetItemBatch and propagate it through DatasetItemService to scope dataset items to a project
- Resolve project_name to project_id in DatasetService and ExperimentService when creating datasets/experiments
- Add project_id column to optimizations table via Liquibase migration (000073)
- Expose project_id (read-only) and project_name (write-only) on Optimization model
- Fix NullPointerException in OptimizationService when project_name is null by using AbstractMap.SimpleEntry instead of Map.entry
- Add integration tests for project-scoped dataset creation in DatasetsResourceTest, ExperimentsResourceTest, and OptimizationsResourceTest

* Revision 2: Add ProjectOptimizationsResource for project-scoped optimization listing

* Revision 3: Add project_id filter to OptimizationsResource and integration tests for ProjectOptimizationsResource

* [OPIK-4938] [BE] Fix DatasetItemBatch project resolution and centralize test factory calls

Fix Reactor empty Mono bug in DatasetItemService.getDatasetId() where batches
without projectId/projectName caused the flatMap to never execute (data loss).
Added switchIfEmpty to handle the null-project case properly.

Centralize factory.manufacturePojo(DatasetItemBatch.class) and
factory.manufacturePojo(DatasetItem.class) calls in DatasetResourceClient
to null out server-assigned fields (projectId, projectName, datasetId, etc.),
preventing PODAM-generated random UUIDs from causing 404 errors in tests.

* Revision 2: Address PR review comments

- Rename migration 000073 → 000074 to avoid prefix conflict with main
- Add trailing blank line to migration file per guidelines
- Remove @RequiredPermissions(EXPERIMENT_VIEW) from ProjectOptimizationsResource.find()
  to match unrestricted access pattern of the global endpoint
- Add dataset_name query param to ProjectOptimizationsResource.find()
  for parity with global /v1/private/optimizations endpoint
- Fix @Schema description on DatasetItemBatch.projectId (was "dataset_name must be
  provided", now "project_name must be provided")
- Use DatasetItemBatch builder instead of positional constructor in
  DatasetExportJobSubscriberResourceTest

* Revision 3: Fix insertInvalidDatasetItemWorkspace test failure

Use DatasetResourceClient helpers and null out datasetId to avoid PODAM
generating random UUIDs that cause 404s in DatasetItemService resolution.

* Revision 4: Simplify resolveProjectId in OptimizationService

Inline context accesses inside fromCallable lambda, consistent with
DatasetItemService.resolveProjectId pattern.

* Revision 5: Support projectId on Optimization write + validate on upsert

Remove READ_ONLY from Optimization.projectId so callers can pass it
directly. Add a resolveProjectId branch that validates the provided
projectId exists in the workspace before using it, mirroring the
DatasetItemBatch projectId/projectName duality.

* Revision 6: Clarify projectName/projectId as optional in DatasetItemBatch schema

Both fields are optional (both null = no project scoping). Update @Schema
descriptions to remove misleading "must be provided" language and describe
precedence rules instead.

* Revision 7: Extract resolveProjectIdOrCreate into ProjectService

Both OptimizationService and ExperimentService had identical inline logic
for resolving a project from (projectId, projectName): validate the id if
provided, getOrCreate from the name otherwise, return empty if neither.

The shared helper lives in ProjectService.resolveProjectIdOrCreate and uses
deferContextual so callers no longer need to extract workspaceId/userName
themselves. ExperimentService's logic is also aligned to projectId-first
priority, consistent with OptimizationService and DatasetItemService.

* Revision 8: Add OpenAPI schema descriptions for Optimization project_id/project_name

Matches the existing descriptions on Experiment, making the auto-create
and precedence semantics visible in the generated API docs.

* [OPIK-4938] [BE] Fix trailing blank line in migration 000074
#5691)

* [OPIK-5019] [BE] feat: add LLM model registry service and API endpoint

Add a YAML-based model registry that loads supported LLM models at
startup from a classpath resource, with optional local override file
for self-hosted customers. Expose via GET /v1/private/llm/models.

- LlmModelDefinition record with id, qualifiedName, structuredOutput, reasoning
- LlmModelRegistryService loads/merges/caches from YAML
- LlmModelsResource REST endpoint
- llm-models-default.yaml with 525 models across 5 providers
- 52 reasoning models tagged (OpenAI o-series, DeepSeek R1, QwQ, :thinking)
- 9 unit tests covering load, merge, override, reload, immutability
- Guice wiring in LlmModule, config in OpikConfiguration + config.yml

No changes to existing routing or frontend — additive only.

Implements OPIK-5019: [BE] Add LLM model registry service and API endpoint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(registry): narrow reload() catch and guard null lists in merge

- Catch only UncheckedIOException | IllegalStateException in reload()
  instead of broad Exception
- Skip null/empty override lists in merge() to prevent NPE from
  malformed customer YAML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(registry): make LlmModelRegistryService self-injectable via Guice

Integration tests that disable LlmModule (e.g. AutomationRuleEvaluatorsResourceTest,
ManualEvaluationResourceTest) failed because vyarus auto-config discovered
LlmModelsResource but had no binding for LlmModelRegistryService.

Move from @provides in LlmModule to @Inject @singleton on the service itself,
with a package-private constructor for unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(registry): address PR review feedback

- Remove @JsonProperty annotations that broke global snake_case convention in HTTP responses
- Move LlmModelDefinition to com.comet.opik.api (response DTO, not infrastructure)
- Add null/blank id guard in merge() for both default and override entries
- Add @nonnull on merge() parameters
- Add // visible for testing comment on package-private constructor
- Add comment on volatile explaining scheduled refresh intent (OPIK-5020)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(registry): deep-copy provider lists for full immutability

Map.copyOf() only makes the outer map immutable; the List values from
Jackson are mutable ArrayLists. Added immutable() helper to wrap each
list with List.copyOf() in the load() fast paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixed formatting

---------

Co-authored-by: Andrei Căutișanu <andreicautisanu@Andreis-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* [OPIK-4891] [BE] Data retention policy enforcement job

- Add retention rule CRUD endpoints and scheduled enforcement job
- Delete only traces and spans (children first); feedback scores and
  comments are left as lightweight orphans (~3% of storage)
- 3-day sliding window [cutoff-3d, cutoff) for incremental processing
- Experiment exclusion: traces/spans linked to experiments are protected
  via NOT IN subquery with allow_nondeterministic_mutations=1
- Cutoff normalized to start-of-day UTC using InstantToUUIDMapper
- Two delete patterns: applyToPast=true (simple IN) and applyToPast=false
  (per-workspace OR conditions with max(minId, cutoff-3d))
- UUID v7 range partitioning splits workspace space across N fractions/day
- Distributed locking via LockService for multi-instance safety
- lightweight_deletes_sync=1 ensures mutations complete before returning

* Make sliding window days configurable via retention.slidingWindowDays

* Bump migration to 000059, fix slidingWindowDays @min(1)

* Fix test failures and address PR review comments

- Fix RetentionRulesResourceTest: applyToPast default is now true
- Fix RetentionPolicyServiceTest: remove feedback_scores/comments
  assertions (not part of retention deletion), add Awaitility waits
  for ClickHouse async write consistency
- Make organizationLevel write-only in RetentionRule (excluded from
  read responses since it's only used on create)
- Wrap log placeholders in single quotes per codebase convention

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…5788)

* [OPIK-4938] [BE] Add project_name support to dataset, experiment and optimization endpoints

- Add project_name field to DatasetItemBatch and propagate it through DatasetItemService to scope dataset items to a project
- Resolve project_name to project_id in DatasetService and ExperimentService when creating datasets/experiments
- Add project_id column to optimizations table via Liquibase migration (000073)
- Expose project_id (read-only) and project_name (write-only) on Optimization model
- Fix NullPointerException in OptimizationService when project_name is null by using AbstractMap.SimpleEntry instead of Map.entry
- Add integration tests for project-scoped dataset creation in DatasetsResourceTest, ExperimentsResourceTest, and OptimizationsResourceTest

* Revision 2: Add ProjectOptimizationsResource for project-scoped optimization listing

* Revision 3: Add project_id filter to OptimizationsResource and integration tests for ProjectOptimizationsResource

* [OPIK-4938] [BE] Fix DatasetItemBatch project resolution and centralize test factory calls

Fix Reactor empty Mono bug in DatasetItemService.getDatasetId() where batches
without projectId/projectName caused the flatMap to never execute (data loss).
Added switchIfEmpty to handle the null-project case properly.

Centralize factory.manufacturePojo(DatasetItemBatch.class) and
factory.manufacturePojo(DatasetItem.class) calls in DatasetResourceClient
to null out server-assigned fields (projectId, projectName, datasetId, etc.),
preventing PODAM-generated random UUIDs from causing 404 errors in tests.

* Revision 2: Address PR review comments

- Rename migration 000073 → 000074 to avoid prefix conflict with main
- Add trailing blank line to migration file per guidelines
- Remove @RequiredPermissions(EXPERIMENT_VIEW) from ProjectOptimizationsResource.find()
  to match unrestricted access pattern of the global endpoint
- Add dataset_name query param to ProjectOptimizationsResource.find()
  for parity with global /v1/private/optimizations endpoint
- Fix @Schema description on DatasetItemBatch.projectId (was "dataset_name must be
  provided", now "project_name must be provided")
- Use DatasetItemBatch builder instead of positional constructor in
  DatasetExportJobSubscriberResourceTest

* Revision 3: Fix insertInvalidDatasetItemWorkspace test failure

Use DatasetResourceClient helpers and null out datasetId to avoid PODAM
generating random UUIDs that cause 404s in DatasetItemService resolution.

* Revision 4: Simplify resolveProjectId in OptimizationService

Inline context accesses inside fromCallable lambda, consistent with
DatasetItemService.resolveProjectId pattern.

* Revision 5: Support projectId on Optimization write + validate on upsert

Remove READ_ONLY from Optimization.projectId so callers can pass it
directly. Add a resolveProjectId branch that validates the provided
projectId exists in the workspace before using it, mirroring the
DatasetItemBatch projectId/projectName duality.

* Revision 6: Clarify projectName/projectId as optional in DatasetItemBatch schema

Both fields are optional (both null = no project scoping). Update @Schema
descriptions to remove misleading "must be provided" language and describe
precedence rules instead.

* Revision 7: Extract resolveProjectIdOrCreate into ProjectService

Both OptimizationService and ExperimentService had identical inline logic
for resolving a project from (projectId, projectName): validate the id if
provided, getOrCreate from the name otherwise, return empty if neither.

The shared helper lives in ProjectService.resolveProjectIdOrCreate and uses
deferContextual so callers no longer need to extract workspaceId/userName
themselves. ExperimentService's logic is also aligned to projectId-first
priority, consistent with OptimizationService and DatasetItemService.

* Revision 8: Add OpenAPI schema descriptions for Optimization project_id/project_name

Matches the existing descriptions on Experiment, making the auto-create
and precedence semantics visible in the generated API docs.

* [OPIK-4938] [BE] Fix trailing blank line in migration 000074

* [OPIK-4938] [BE] Minor test and code cleanup from PR review feedback

- Merge duplicate test pairs in DatasetsResourceTest: tests that checked
  deprecated behavior and response headers now combined into single tests
- Add callRetrievePromptVersion helper to PromptResourceClient and use it
  in PromptResourceTest instead of raw HTTP calls
- Fix StringUtils.isNotEmpty → isNotBlank in OptimizationDAO ClickHouse
  row mapper for proper null/whitespace handling of project_id field

* [OPIK-4938] [BE] Remove unused import and use factory directly in DatasetsResourceTest

* [OPIK-4938] [BE] Use DatasetResourceClient factory methods instead of direct PODAM calls

* [OPIK-4938] [BE] Restore separate deprecation header tests in DatasetsResourceTest

* [OPIK-4938] [BE] Fix incorrect null assertion in twoSuiteExperiments test

computeRunSummaries populates summaries for any experiment that has
assertion results, regardless of group size. passThreshold defaults to 1
when no executionPolicy is set, so single-run experiments get independent
PASSED/FAILED summaries.

* Refactor project ID string check in OptimizationDAO

* [OPIK-4938] [BE] Restore FindProjectDatasets tests accidentally deleted in refactoring

Restores 6 integration tests for GET /v1/private/projects/{projectId}/datasets
that were accidentally deleted in a prior sed-based refactoring commit.

The tests cover: basic pagination, page size limiting, default sort by created
date, sorting by all valid fields (parameterized), filtering (parameterized),
and case-insensitive name search.

Also promotes getDatasets__whenFetchingAllDatasets__thenReturnDatasetsSortedByByValidFields
and getValidFilters to static in FindDatasets so they can be shared as
external @MethodSource providers.

---------

Co-authored-by: Andres Cruz <andresc@comet.com>
…onfiguration (#5782)

* [OPIK-5189] [FE] fix: render ChatPrompt messages in optimizer trial configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address baz review — extract shared helper, segment-aware key matching, deduplicate isMessagesArray

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: scope optimizer meta filtering to hasStructuredPrompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply ChatPrompt rendering fixes to V2 components

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…d cleaner structure (#5800)

* OPIK-4897 reset to global execution policy

* [OPIK-5032] [FE] fix: eval suite experiment export with assertions and cleaner structure

* [OPIK-5032] [FE] fix: prettier lint and policy override logic

- Remove extra parentheses in DatasetItemsActionsPanel (prettier)
- Simplify policyChanged to `policy != null` so disabling global
  policy always persists the override, even when values match defaults
…ing (#5802)

Implements the v2 sidebar with project-first navigation as part of the
IA Revamp (OPIK-4617).

Route tree:
- All feature routes moved under /$ws/projects/$projectId/...
- V1 compat splat redirects catch old workspace-level URLs
- SDK redirects (RedirectProjects, RedirectDatasets) unchanged

Active project:
- activeProjectId stored in AppStore (single source of truth)
- useActiveProject hook syncs localStorage/API fallback → store
- ProjectPage syncs URL $projectId → store

Sidebar:
- Project selector dropdown with search, edit/delete actions
- Grouped menu sections: Observability, Evaluation, Prompt engineering,
  Optimization, Production
- Workspace section at bottom with workspace selector + Dashboards +
  Configuration
- SupportHub moved to TopBar
- Insights and Agent configuration shown as disabled (no route yet)

Internal links:
- ~40 navigate()/Link references updated to project-scoped paths
- Shared hooks (useSuiteIdFromURL, usePromptIdFromURL, etc.) duplicated
  in v2 with project-scoped from: paths
- useNavigateToExperiment and useLoadPlayground duplicated in v2
- ResourceLink uses projectUrl field for version-aware URL resolution
- matchRoute/useParams from: patterns updated for project-scoped routes

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…code-review (#5783)

* [OPIK-4688] [INFRA] Auto-trigger FERN update on BE merge and notify #code-review

- Trigger workflow on push to main when BE Java/OpenAPI/pom files change
- Add concurrency group to prevent parallel runs
- Extract merge author and originating PR for traceability
- Add PR body context linking back to the triggering BE merge
- Add notify-slack job that posts to #code-review via SLACK_WEBHOOK_URL_CODE_REVIEW
- Include Slack user mention mapping for author tagging
- Fail the workflow if Slack notification doesn't send successfully

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4688] [INFRA] Add contents:read permission for originating PR lookup

The gh API call to find the originating PR needs contents:read
permission at the job level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4688] [INFRA] Add pull_request trigger for testing workflow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4688] [INFRA] Remove job-level permissions that blocked branch creation

The remote-branch-action and create-pull-request steps need
contents:write via the default GITHUB_TOKEN. The explicit
contents:read restriction was blocking the push.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4688] [INFRA] Temporarily allow notify-slack on pull_request for testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-4688] [INFRA] Add auto FERN update on BE merge with Slack notification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revision 2: Fix jq syntax — use quoted keys for JSON objects

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revision 3: Fix jq — wrap blocks array in parentheses for concatenation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revision 4: Address PR comments — refine path filter and remove pull_request trigger

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revision 5: Extract trigger context into separate job with scoped permissions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revision 6: Add explicit permissions per job and skip Slack post for testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revision 7: Remove temporary PR testing triggers and Slack skip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
)

* [OPIK-4518] [BE] Restore stable dataset item IDs in versioned API

After dataset versioning was introduced, the old API endpoints started
returning version-specific row IDs (changing per version) as the `id`
field instead of the stable `dataset_item_id`. This broke user flows
where items would get new IDs with each new version snapshot.

Fix: expose `dataset_item_id AS id` in all versioned item queries so
the `id` field is stable across versions. Update streaming pagination
cursor to use `dataset_item_id` instead of row `id`. Remove the
row-ID-to-dataset-item-ID mapping logic in the service since incoming
`id` values are now already stable `dataset_item_id`s.

* fix(backend): clean up review issues in stable dataset item IDs

- Remove unused workspaceId param from getDatasetItemWorkspace DAO
  (query is intentionally unscoped for cross-workspace validation)
- Fix allMatch validation gap in ExperimentItemService: verify all
  requested item IDs were found before checking workspace ownership
- Remove identity map indirection from editItemsViaSelectInsert now
  that IDs are stable dataset_item_ids

* fix(backend): renumber migration files to avoid conflicts with main

000062 → 000065, 000063 → 000066 (skip indexes and projection for
dataset_item_id). Updated changeset IDs inside files to match.

* fix(backend): clarify DatasetItemEdit.id is the stable dataset_item_id

* Fix warnings on updated code

* feat(backend): resolve experiment dataset_item_id in trace queries

Resolve old physical row IDs to stable dataset_item_ids in the
experiments_agg CTE using a targeted LEFT JOIN to
dataset_item_versions. No dedup needed since id and dataset_item_id
are immutable columns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(backend): remove manual data migration for experiment items

The dual-join approach handles both old (physical row ID) and new
(stable dataset_item_id) experiment items at query time, making
the manual migration unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(backend): dual-join all experiment-item queries for backward compat

All queries joining experiment_items.dataset_item_id with
dataset_item_versions now match on both physical row id AND stable
dataset_item_id, so old experiment items (storing row IDs) and new
ones (storing stable IDs) both resolve correctly without requiring
a data migration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): renumber migrations 000065/000066 → 000070/000071

Avoid collision with main's 000065-000069 added since last rebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): renumber migrations 000070/000071 → 000073/000074

Main now has 000070-000072, so bump to next available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): use if() instead of COALESCE for FixedString LEFT JOIN

ClickHouse FixedString(36) columns return null bytes (not SQL NULL)
on LEFT JOIN miss, so COALESCE never falls through. Use if(div.id != '')
to check whether the join actually matched.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(backend): remove ClickHouse projection migration

Defer the dataset_item_id projection to a follow-up. The skip indexes
(000073) provide sufficient coverage for now. The projection can be
added later if large-version performance requires it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf(backend): use arrayJoin to avoid duplicate CTE evaluation

Consolidate OR-expanded filter conditions that referenced the same CTE
twice into single arrayJoin([col1, col2]) calls. Each CTE is now
evaluated exactly once per usage site.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): rename skip indexes to follow naming convention

Use idx_{table}_{column} pattern: idx_dataset_item_versions_dataset_item_id_bf
and idx_dataset_item_versions_dataset_item_id_minmax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): restore optimized SELECT_EXPERIMENT_ITEMS_OUTPUT_COLUMNS

Revert to main's simpler query that resolves trace_ids directly from
experiment_items without joining dataset_item_versions. The dual-join
treatment is unnecessary here — output column discovery only needs
trace_ids, not dataset item mapping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): add two-level dedup to aggregated CTEs, fix naming/javadoc

- Add LIMIT 1 BY dataset_item_id to dataset_items_agg_resolved,
  dataset_items_aggr_resolved, and ExperimentAggregatesDAO's
  dataset_item_versions_resolved CTEs. Without this, the OR-condition
  joins could match one experiment item to multiple dataset item rows
  from different versions, inflating groupArray results.
- Remove orphaned Javadoc from deleted validateMappingsBelongToSameDataset
- Rename deletedRowIds to deletedIds (now holds stable IDs, not row IDs)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): fix stats query item_agg filter for dual-ID compat

The dataset_item_filters in item_agg selected only physical id,
missing new experiment items that store stable dataset_item_id.
Use arrayJoin([id, dataset_item_id]) and add two-level dedup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): renumber migration 000073 → 000074

Main added 000073_add_minmax_index_trace_threads_last_updated_at.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(backend): unify CTE column aliases to dataset_item_id AS id, id AS row_id

All aggregated CTEs (dataset_items_agg_resolved, dataset_items_aggr_resolved,
dataset_item_versions_resolved) now use the same convention as
dataset_items_resolved: dataset_item_id AS id (stable), id AS row_id (physical).
Updated all join conditions, arrayJoin filters, and GROUP BY references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(backend): add multi-dataset rejection tests for delete and batch update

Verify that delete and batch update requests with item IDs spanning
multiple datasets (without explicit datasetId) return 400.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… returns 100%) - hotfix (#5805)

* [OPIK-5219] [BE] Fix pass_rate query to read from assertion_results instead of feedback_scores

The GET_PASS_RATE_AGGREGATION query in ExperimentAggregatesDAO was reading
from feedback_scores to determine run pass/fail, but assertion scores with
category_name="suite_assertion" are routed exclusively to the assertion_results
table by FeedbackScoreService. This caused every run to have no scores in
feedback_scores, defaulting to "passed" and producing 100% pass rate.

Replaced feedback_scores_combined/feedback_scores_final CTEs with
assertion_results_final using the same ROW_NUMBER() deduplication pattern
as ExperimentDAO.java.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [OPIK-5219] [BE] Fix flaky test: use shared workspace for pass rate aggregation test

The test was creating a new workspace per run, but populateAggregations
silently returned empty when getExperimentData couldn't find the
experiment in the freshly-created workspace (ClickHouse timing).
Use the static shared workspace and createExperimentItemWithData helper
matching all other passing tests in this class.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation python Pull requests that update Python code tests Including test files, or tests related like configuration. Python SDK TypeScript SDK labels Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation Frontend Infrastructure java Pull requests that update Java code Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration. TypeScript SDK typescript *.ts *.tsx

Projects

None yet

Development

Successfully merging this pull request may close these issues.