Skip to content

docs: add experiment CI/CD integration guide#2888

Merged
wochinge merged 20 commits into
mainfrom
tobias/lfe-9366-documentation
May 6, 2026
Merged

docs: add experiment CI/CD integration guide#2888
wochinge merged 20 commits into
mainfrom
tobias/lfe-9366-documentation

Conversation

@wochinge

@wochinge wochinge commented May 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Added a dedicated CI/CD integration page for Langfuse experiments, covering GitHub Actions setup, action inputs/outputs, secrets, RunnerContext, regression failures, and non-GitHub CI patterns.
  • Documented the upcoming experiment(context: RunnerContext) contract so GitHub Action examples use context.runExperiment / context.run_experiment with action-injected dataset, SDK client, and metadata defaults.
  • Moved the existing Pytest/Vitest CI examples out of the SDK page into the new CI/CD page, and replaced the old SDK section with a concise cross-link.
  • Added explicit docs anchors for evaluator and CI/CD deep links, and registered the new page in the experiments docs navigation.

Linear

Major Decisions

  • Made the new CI/CD page the canonical place for experiment gating docs to avoid duplicating long CI examples across the SDK and CI/CD pages.
  • Kept GitHub Action docs focused on the action-specific RunnerContext flow, while keeping direct SDK-based examples for other CI/CD systems.

Disclaimer: Experimental PR review

Greptile Summary

This PR adds a dedicated CI/CD integration guide for Langfuse experiments, covering the langfuse/experiment-action GitHub Actions workflow, RunnerContext contract, regression gating via RegressionError, and equivalent Pytest/Vitest patterns for other CI systems. The existing CI examples are moved from the SDK page to the new page with clean cross-links.

  • The TypeScript Vitest examples use eval as an arrow-function parameter name (lines 534 and 557), which is a reserved identifier in TypeScript strict mode and will cause a compilation error for anyone copying the snippet.
  • The Python Pytest examples also shadow the built-in eval via loop variables, and next() without a default can raise StopIteration silently instead of a clear test failure.

Confidence Score: 3/5

Not safe to merge as-is — the TypeScript code examples contain a strict-mode syntax error that will fail compilation for any reader who copies them.

Two P1 findings (using eval as a TypeScript parameter name — a hard syntax error in strict mode) pull the score below the P1 ceiling of 4. The remaining findings are P2 style/robustness issues.

content/docs/evaluation/experiments/experiments-in-ci-cd.mdx — Vitest TypeScript example (lines 534, 557) and Pytest Python example (lines 418–420, 443–445).

Important Files Changed

Filename Overview
content/docs/evaluation/experiments/experiments-in-ci-cd.mdx New CI/CD integration guide — two TypeScript code examples use eval as a parameter name, which is a syntax error in strict mode; Python examples shadow the built-in eval; next() without a default can surface a confusing StopIteration; echo quoting in the action output YAML snippet is fragile.
content/docs/evaluation/experiments/experiments-via-sdk.mdx Removed the duplicated CI examples and replaced with cross-links to the new CI/CD page; added anchor IDs to the Evaluators and Testing in CI Environments headings — clean change with no issues.
content/docs/evaluation/experiments/meta.json Registers the new experiments-in-ci-cd page in the experiments navigation — trivial and correct.

Sequence Diagram

sequenceDiagram
    participant PR as Pull Request
    participant GHA as GitHub Actions
    participant Action as langfuse/experiment-action
    participant Script as experiment(context)
    participant LF as Langfuse API

    PR->>GHA: push / pull_request event
    GHA->>Action: run with inputs (dataset_name, keys, metadata)
    Action->>LF: fetch dataset items (dataset_name + dataset_version)
    Action->>Script: call experiment(RunnerContext)
    Script->>LF: run_experiment / runExperiment (task + evaluators)
    LF-->>Script: ExperimentResult (run_evaluations)
    Script-->>Action: return result (or raise RegressionError)
    Action->>GHA: set outputs (result_json, failed)
    Action->>PR: post/update PR comment (pass/regression/error + scores)
Loading
Prompt To Fix All With AI
Fix the following 5 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 5
content/docs/evaluation/experiments/experiments-in-ci-cd.mdx:534
**`eval` is a reserved identifier in TypeScript strict mode**

Using `eval` as an arrow-function parameter (`(eval) => ...`) is a syntax error in TypeScript strict mode (and ES5+ strict mode). TypeScript will reject the snippet with *"Parameter name 'eval' is not allowed in strict mode"*, so anyone copying this example will get an immediate compilation failure. The same issue appears again on line 557.

```suggestion
      (e) => e.name === "avg_accuracy",
```

### Issue 2 of 5
content/docs/evaluation/experiments/experiments-in-ci-cd.mdx:557
**Same `eval` reserved-identifier error in the second test case**

Same strict-mode syntax error as on line 534 — `eval` cannot be used as a parameter name.

```suggestion
      (e) => e.name === "avg_accuracy",
```

### Issue 3 of 5
content/docs/evaluation/experiments/experiments-in-ci-cd.mdx:418-420
**`next()` without a default raises `StopIteration` on missing evaluation**

If no evaluation named `"avg_accuracy"` is present (e.g. the run evaluator raised an exception), `next(...)` with no default will raise `StopIteration`, which pytest surfaces as a confusing internal error rather than a clear assertion failure. Adding a default of `None` and checking it explicitly makes the failure obvious. The same pattern on line 443 has the same issue.

### Issue 4 of 5
content/docs/evaluation/experiments/experiments-in-ci-cd.mdx:418-420
**`eval` shadows Python built-in in loop variable**

`eval` is a Python built-in function; using it as a loop variable name (`for eval in result.run_evaluations`) will shadow it and trigger linting warnings (e.g. `W0622` in pylint, `A001` in flake8-builtins). A minor rename like `evaluation` or `ev` avoids the issue. This same pattern also appears on line 444.

### Issue 5 of 5
content/docs/evaluation/experiments/experiments-in-ci-cd.mdx:342
**Single-quoted `echo` breaks if JSON contains a single quote**

`echo '${{ steps.experiment.outputs.result_json }}'` wraps the GitHub expression in single quotes. If any string value inside the JSON output contains a single-quote character, the shell command will fail or produce malformed JSON. A safer idiom is to assign to an env var first:

```yaml
- name: Store experiment result
  env:
    RESULT_JSON: ${{ steps.experiment.outputs.result_json }}
  run: echo "$RESULT_JSON" > experiment-result.json
```

Reviews (1): Last reviewed commit: "docs: add experiment CI/CD integration g..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

@vercel

vercel Bot commented May 4, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-docs Ready Ready Preview, Comment May 6, 2026 4:27pm

Request Review

@github-actions

github-actions Bot commented May 4, 2026

Copy link
Copy Markdown

@claude review

@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. docs labels May 4, 2026
Comment thread content/docs/evaluation/experiments/experiments-in-ci-cd.mdx Outdated
Comment thread content/guides/experiments-ci-cd.mdx
Comment thread content/guides/experiments-ci-cd.mdx
Comment thread content/guides/experiments-ci-cd.mdx
Comment thread content/guides/experiments-ci-cd.mdx

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Author addressed the substantive feedback from my prior review (echo quoting, TS eval parameter, Python eval loop variable) in commit e29c238 — remaining items are docs-only nits, approving.

Extended reasoning...

Overview

Docs-only PR adding a new CI/CD integration page (experiments-in-ci-cd.mdx) for Langfuse experiments, with the existing Pytest/Vitest CI examples relocated from experiments-via-sdk.mdx and replaced with cross-links there. meta.json registers the new page in the experiments nav. No application code, schemas, or auth surfaces are touched.

Security risks

None. The page is rendered as MDX docs content. The content does include CI/CD recipes — script-injection via single-quoted echo '${{ ... }}' was the one real footgun a reader might copy verbatim, and that has now been fixed in commit e29c238 to use the env: + printf pattern recommended by GitHub, so the remaining sample is safe.

Level of scrutiny

Light. This is documentation only; the worst-case impact of a defect is a confused reader rather than a broken production system. Bug-hunting found a few documentation-quality issues (TS strict-mode eval parameter, Python eval shadowing the built-in, fragile shell quoting) and the author addressed all three in e29c238. The remaining nits — next() without an explicit default in the moved Pytest snippets, and the actions/checkout@v6 / setup-python@v6 / setup-node@v6 pins — are either pre-existing patterns carried over from the source page or potentially valid given the time elapsed since my knowledge cutoff (today is 2026-05-04, cutoff January 2026).

Other factors

The PR has clear intent, is well-organized (canonicalizes CI gating docs in one place rather than duplicating across pages), and the author has been responsive to bot feedback within the same review cycle. No CODEOWNERS-protected files, no test coverage to worry about (it is the test guidance), and the cross-links between the two pages are coherent. Given the docs-only scope and the addressed feedback, this is comfortable to shadow-approve.

Comment thread content/guides/experiments-ci-cd.mdx
Comment thread content/docs/evaluation/experiments/experiments-in-ci-cd.mdx Outdated
@wochinge

wochinge commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

@greptileai

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs-only PR; all my prior substantive concerns (eval rename, shell quoting via env var, if: always() on the storage step, LangTabs label consistency) are reflected in the current diff.

Extended reasoning...

Overview

This PR adds a new content/docs/evaluation/experiments/experiments-in-ci-cd.mdx page documenting CI/CD integration for Langfuse experiments (GitHub Action workflow, RunnerContext contract, RegressionError, secrets, action inputs/outputs), moves the existing Pytest/Vitest examples out of experiments-via-sdk.mdx (replaced with a cross-link), registers the new page in meta.json, adds a changelog entry, and adds an author entry + headshot.

Security risks

None material. Modifications are confined to MDX docs, a JSON authors file, and a JPG headshot — no application code, no auth, crypto, or permissions code is touched. The most security-relevant aspect is the example shell snippet for storing result_json, which now correctly uses the env: indirection pattern recommended by GitHub's own security hardening guide rather than interpolating ${{ ... }} directly into a shell command.

Level of scrutiny

Low-to-medium. This is a documentation-only change (size:XL by line count, but all prose + examples). The page has been through two prior bot review rounds and an explicit author re-tag, with each substantive concern I raised in earlier passes addressed in the current diff. The remaining open thread (@v6 action pins) was resolved by the author and the bug hunting system did not re-flag it in this run — this is the kind of judgment call that does not benefit from another bot reiteration.

Other factors

The bug hunting system in this run reported no bugs. All my prior inline comments are marked resolved on the GitHub side, and the corresponding code in the current diff matches the suggested fixes (eval/evaluation rename in three Python comprehensions, env-var pattern on the storage step, if: always() on the storage step, uniform ["Python SDK", "JS/TS SDK"] LangTabs labels in all three locations on the new page). The remaining @v6 pins are the author's intentional choice given their current-date context, and re-flagging them would be noise per the broken-record rule.

Comment thread data/authors.json Outdated

@Lotte-Verheyden Lotte-Verheyden left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-wise the page is strong, but I would try to refrain from adding another docs page for it here (it should not live on the same level as experiments via SDKs). What about

  • making this a guide page
  • having a section on the experiments via SDK page that then links to the guide page for details on how to implement it. The section on the experiments via SDK page can be what's now the intro of your CI/CD page
    Wdyt?

Comment thread content/docs/evaluation/experiments/experiments-via-sdk.mdx
@wochinge

wochinge commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

@Lotte-Verheyden Done - main difference to your suggestion is that I kept a snippet of GH action usage in the main docs. This is in line how e.g. braintrust does it

@wochinge wochinge requested a review from Lotte-Verheyden May 6, 2026 10:08
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 6, 2026
@wochinge wochinge enabled auto-merge May 6, 2026 16:24
@dosubot dosubot Bot added the auto-merge This PR is set to be merged label May 6, 2026
@wochinge wochinge added this pull request to the merge queue May 6, 2026
Merged via the queue into main with commit 01ab727 May 6, 2026
13 checks passed
@wochinge wochinge deleted the tobias/lfe-9366-documentation branch May 6, 2026 16:29
@dosubot dosubot Bot removed the auto-merge This PR is set to be merged label May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants