Skip to content

feat: route prompt-template experiments through batched create [SC-65568]#622

Open
quinn-galileo wants to merge 2 commits into
mainfrom
quinn/sc-65568-js-batched-experiment-runs
Open

feat: route prompt-template experiments through batched create [SC-65568]#622
quinn-galileo wants to merge 2 commits into
mainfrom
quinn/sc-65568-js-batched-experiment-runs

Conversation

@quinn-galileo

@quinn-galileo quinn-galileo commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Brings the JS SDK to parity with the Python SDK for the playground-batching work: prompt-template experiments now create and trigger in a single createExperiment(trigger=true, …) call, so they enter the batched playground path when the backend playground_batching flag is enabled.

Previously runExperiment's prompt-template path used a 3-call flow ending in an explicit createPromptRunJob (POST /jobs). That job-submission route is untouched by the batching backend, so a JS-created experiment never entered the batched path even with the flag on (no regression — it fell to the legacy single-job path — but no batching benefit either).

Changes

  • experiment-service.ts / galileo-client.ts: createExperiment widened by positional append (name, dataset, trigger?, scorers?, promptTemplateVersionId?, promptSettings?) — backward-compatible for the published API; the new fields are sent only when provided. createPromptRunJob kept but marked @deprecated.
  • utils/metrics.ts: createMetricConfigs accepts a nullish runIdresolve-only mode (resolves scorer configs without registering them server-side), mirroring Python's create_metric_configs(project_id, None, …).
  • entities/experiments.ts: runExperiment branches experiment creation by mode. Function path unchanged (untriggered create + scorer registration + local run). Prompt-template path resolves scorers (resolve-only) + dataset + prompt-version, then creates with trigger=true in one call, applies tags after, builds the link/message client-side, and no longer calls createPromptRunJob.
  • Docs updated (AGENTS.md, experiments-reference.md).

Behavior note

Return shape {experiment, link, message} is unchanged. The prompt-path message text now comes from a client-built string (was the server job message) — non-breaking, but flagged for anyone string-matching it.

Testing

  • Full Jest suite green (1750 tests); tsc + eslint clean. Updated the 8 prompt-path assertions and added resolve-only + create-body-shape tests, plus a local-metric-guard test and a PromptTemplateVersion-path test.

  • Local e2e — validated against a local stack in both flag states (the change is unconditional trigger=true, so it must be correct regardless of the flag):

    playground_batching Result Path
    enabled 1 playground_run job, trace_id_count=5, batch_id set, completed batched ✓
    disabled 1 playground_run job, trace_id_count=0, batch_id=None, completed legacy single-job ✓

    With the flag off, the API routes the trigger=true create to the legacy single-job path and the experiment completes normally — no regression for tenants without the flag. With it on, the run enters the batched path. Matches the Python SDK's behavior.

🤖 Generated with Claude Code

…568]

Prompt-template experiments now create and trigger in a single
createExperiment(trigger=true, scorers, promptTemplateVersionId, promptSettings)
call instead of the legacy createExperiment + run-scorer-settings + explicit
createPromptRunJob (POST /jobs) flow. The explicit job-submission route always
used the single-job path, so JS-created experiments never entered the batched
playground path even with the backend flag enabled; routing through trigger=true
fixes that and matches the Python SDK. createMetricConfigs gains a resolve-only
mode (nullish runId) so scorers can be resolved without server-side registration
and passed in the create body. createPromptRunJob is kept but deprecated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.45%. Comparing base (61b9bf7) to head (514c24f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/api-client/galileo-client.ts 0.00% 2 Missing ⚠️
src/entities/experiments.ts 94.11% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #622      +/-   ##
==========================================
+ Coverage   80.36%   80.45%   +0.08%     
==========================================
  Files          85       85              
  Lines        7502     7521      +19     
  Branches     2250     2293      +43     
==========================================
+ Hits         6029     6051      +22     
+ Misses       1462     1459       -3     
  Partials       11       11              
Flag Coverage Δ
unittests 80.45% <91.66%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/api-client/services/experiment-service.ts 41.53% <100.00%> (+16.53%) ⬆️
src/utils/metrics.ts 79.42% <100.00%> (ø)
src/api-client/galileo-client.ts 43.99% <0.00%> (ø)
src/entities/experiments.ts 88.59% <94.11%> (+0.06%) ⬆️
Files with missing lines Coverage Δ
src/api-client/services/experiment-service.ts 41.53% <100.00%> (+16.53%) ⬆️
src/utils/metrics.ts 79.42% <100.00%> (ø)
src/api-client/galileo-client.ts 43.99% <0.00%> (ø)
src/entities/experiments.ts 88.59% <94.11%> (+0.06%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/entities/experiments.ts
@quinn-galileo

Copy link
Copy Markdown
Contributor Author

/astra review

@galileo-astra galileo-astra Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This review was generated by an AI agent (Astra) and may contain mistakes. Please verify all suggestions independently.

Verdict: approve — Backward-compatible, type-safe refactor with sound logic and good test coverage of the behavior changes; only minor testing/maintainability nits remain.

Follow-ups

Suggested follow-up work that could be tracked as Shortcut stories:

  • src/api-client/services/experiment-service.ts:69-76: createExperiment now takes 6 positional parameters including a positional boolean trigger (a classic boolean-trap; call sites read ..., true, scorerConfigs, ...). Consider migrating the API-client/service createExperiment to an options object (matching the entity-level Experiments.createExperiment({...}) shape) for readability and to avoid future positional-append churn. Non-blocking and out of scope for this PR.

Comment thread src/entities/experiments.ts
…ion path [SC-65568]

Adds the two regression tests Astra flagged as uncovered:
- prompt template + LocalMetricConfig rejects with the function-only guard message
- prompt-template run accepts a PromptTemplateVersion passed directly

Also adds .prettierignore for the semantic-release-generated CHANGELOG.md so
`pre-commit run --all-files` stops failing on a file that is never hand-formatted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@quinn-galileo

Copy link
Copy Markdown
Contributor Author

Thanks for the review. Addressed the testing nits in 514c24f (added the local-metric-guard and PromptTemplateVersion tests).

On the createExperiment positional-boolean follow-up: noted, but keeping positional-append deliberately here. GalileoApiClient.createExperiment is part of the published API surface, so switching to an options object would be a breaking change (semantic-release major) unless overloaded — that's exactly why I appended params rather than reshaping the signature. Happy to file a separate story to migrate it to the {...} shape (matching the entity-level Experiments.createExperiment) if we want to take the major bump.

Also added .prettierignore for the generated CHANGELOG.md so pre-commit is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant