feat: route prompt-template experiments through batched create [SC-65568] by quinn-galileo · Pull Request #622 · rungalileo/galileo-js

quinn-galileo · 2026-06-03T19:51:25Z

Summary

Brings the JS SDK to parity with the Python SDK for the playground-batching work: prompt-template experiments now create and trigger in a single createExperiment(trigger=true, …) call, so they enter the batched playground path when the backend playground_batching flag is enabled.

Previously runExperiment's prompt-template path used a 3-call flow ending in an explicit createPromptRunJob (POST /jobs). That job-submission route is untouched by the batching backend, so a JS-created experiment never entered the batched path even with the flag on (no regression — it fell to the legacy single-job path — but no batching benefit either).

Changes

experiment-service.ts / galileo-client.ts: createExperiment widened by positional append (name, dataset, trigger?, scorers?, promptTemplateVersionId?, promptSettings?) — backward-compatible for the published API; the new fields are sent only when provided. createPromptRunJob kept but marked @deprecated.
utils/metrics.ts: createMetricConfigs accepts a nullish runId → resolve-only mode (resolves scorer configs without registering them server-side), mirroring Python's create_metric_configs(project_id, None, …).
entities/experiments.ts: runExperiment branches experiment creation by mode. Function path unchanged (untriggered create + scorer registration + local run). Prompt-template path resolves scorers (resolve-only) + dataset + prompt-version, then creates with trigger=true in one call, applies tags after, builds the link/message client-side, and no longer calls createPromptRunJob.
Docs updated (AGENTS.md, experiments-reference.md).

Behavior note

Return shape {experiment, link, message} is unchanged. The prompt-path message text now comes from a client-built string (was the server job message) — non-breaking, but flagged for anyone string-matching it.

Testing

Full Jest suite green (1750 tests); tsc + eslint clean. Updated the 8 prompt-path assertions and added resolve-only + create-body-shape tests, plus a local-metric-guard test and a PromptTemplateVersion-path test.

Local e2e — validated against a local stack in both flag states (the change is unconditional trigger=true, so it must be correct regardless of the flag):

`playground_batching`	Result	Path
enabled	1 `playground_run` job, `trace_id_count=5`, `batch_id` set, `completed`	batched ✓
disabled	1 `playground_run` job, `trace_id_count=0`, `batch_id=None`, `completed`	legacy single-job ✓

With the flag off, the API routes the trigger=true create to the legacy single-job path and the experiment completes normally — no regression for tenants without the flag. With it on, the run enters the batched path. Matches the Python SDK's behavior.

🤖 Generated with Claude Code

…568] Prompt-template experiments now create and trigger in a single createExperiment(trigger=true, scorers, promptTemplateVersionId, promptSettings) call instead of the legacy createExperiment + run-scorer-settings + explicit createPromptRunJob (POST /jobs) flow. The explicit job-submission route always used the single-job path, so JS-created experiments never entered the batched playground path even with the backend flag enabled; routing through trigger=true fixes that and matches the Python SDK. createMetricConfigs gains a resolve-only mode (nullish runId) so scorers can be resolved without server-side registration and passed in the create body. createPromptRunJob is kept but deprecated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-03T19:54:16Z

Codecov Report

❌ Patch coverage is 91.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.45%. Comparing base (61b9bf7) to head (514c24f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/api-client/galileo-client.ts	0.00%	2 Missing ⚠️
src/entities/experiments.ts	94.11%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #622      +/-   ##
==========================================
+ Coverage   80.36%   80.45%   +0.08%     
==========================================
  Files          85       85              
  Lines        7502     7521      +19     
  Branches     2250     2293      +43     
==========================================
+ Hits         6029     6051      +22     
+ Misses       1462     1459       -3     
  Partials       11       11

Flag	Coverage Δ
unittests	`80.45% <91.66%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/api-client/services/experiment-service.ts	`41.53% <100.00%> (+16.53%)`	⬆️
src/utils/metrics.ts	`79.42% <100.00%> (ø)`
src/api-client/galileo-client.ts	`43.99% <0.00%> (ø)`
src/entities/experiments.ts	`88.59% <94.11%> (+0.06%)`	⬆️

Files with missing lines	Coverage Δ
src/api-client/services/experiment-service.ts	`41.53% <100.00%> (+16.53%)`	⬆️
src/utils/metrics.ts	`79.42% <100.00%> (ø)`
src/api-client/galileo-client.ts	`43.99% <0.00%> (ø)`
src/entities/experiments.ts	`88.59% <94.11%> (+0.06%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

quinn-galileo · 2026-06-03T19:58:51Z

/astra review

galileo-astra

⚠️ This review was generated by an AI agent (Astra) and may contain mistakes. Please verify all suggestions independently.

Verdict: approve — Backward-compatible, type-safe refactor with sound logic and good test coverage of the behavior changes; only minor testing/maintainability nits remain.

Follow-ups

Suggested follow-up work that could be tracked as Shortcut stories:

src/api-client/services/experiment-service.ts:69-76: createExperiment now takes 6 positional parameters including a positional boolean trigger (a classic boolean-trap; call sites read ..., true, scorerConfigs, ...). Consider migrating the API-client/service createExperiment to an options object (matching the entity-level Experiments.createExperiment({...}) shape) for readability and to avoid future positional-append churn. Non-blocking and out of scope for this PR.

…ion path [SC-65568] Adds the two regression tests Astra flagged as uncovered: - prompt template + LocalMetricConfig rejects with the function-only guard message - prompt-template run accepts a PromptTemplateVersion passed directly Also adds .prettierignore for the semantic-release-generated CHANGELOG.md so `pre-commit run --all-files` stops failing on a file that is never hand-formatted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

quinn-galileo · 2026-06-03T20:41:17Z

Thanks for the review. Addressed the testing nits in 514c24f (added the local-metric-guard and PromptTemplateVersion tests).

On the createExperiment positional-boolean follow-up: noted, but keeping positional-append deliberately here. GalileoApiClient.createExperiment is part of the published API surface, so switching to an options object would be a breaking change (semantic-release major) unless overloaded — that's exactly why I appended params rather than reshaping the signature. Happy to file a separate story to migrate it to the {...} shape (matching the entity-level Experiments.createExperiment) if we want to take the major bump.

Also added .prettierignore for the generated CHANGELOG.md so pre-commit is green.

baz-reviewer Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/entities/experiments.ts

galileo-astra Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/entities/experiments.ts

quinn-galileo requested review from BipinShetty and jonahe June 3, 2026 20:31

baz-reviewer Bot approved these changes Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: route prompt-template experiments through batched create [SC-65568]#622

feat: route prompt-template experiments through batched create [SC-65568]#622
quinn-galileo wants to merge 2 commits into
mainfrom
quinn/sc-65568-js-batched-experiment-runs

quinn-galileo commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

quinn-galileo commented Jun 3, 2026

Uh oh!

galileo-astra Bot left a comment

Uh oh!

Uh oh!

quinn-galileo commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quinn-galileo commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Behavior note

Testing

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

quinn-galileo commented Jun 3, 2026

Uh oh!

galileo-astra Bot left a comment

Choose a reason for hiding this comment

Follow-ups

Uh oh!

Uh oh!

quinn-galileo commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

quinn-galileo commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading