docs: add experiment CI/CD integration guide#2888
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@claude review |
There was a problem hiding this comment.
Author addressed the substantive feedback from my prior review (echo quoting, TS eval parameter, Python eval loop variable) in commit e29c238 — remaining items are docs-only nits, approving.
Extended reasoning...
Overview
Docs-only PR adding a new CI/CD integration page (experiments-in-ci-cd.mdx) for Langfuse experiments, with the existing Pytest/Vitest CI examples relocated from experiments-via-sdk.mdx and replaced with cross-links there. meta.json registers the new page in the experiments nav. No application code, schemas, or auth surfaces are touched.
Security risks
None. The page is rendered as MDX docs content. The content does include CI/CD recipes — script-injection via single-quoted echo '${{ ... }}' was the one real footgun a reader might copy verbatim, and that has now been fixed in commit e29c238 to use the env: + printf pattern recommended by GitHub, so the remaining sample is safe.
Level of scrutiny
Light. This is documentation only; the worst-case impact of a defect is a confused reader rather than a broken production system. Bug-hunting found a few documentation-quality issues (TS strict-mode eval parameter, Python eval shadowing the built-in, fragile shell quoting) and the author addressed all three in e29c238. The remaining nits — next() without an explicit default in the moved Pytest snippets, and the actions/checkout@v6 / setup-python@v6 / setup-node@v6 pins — are either pre-existing patterns carried over from the source page or potentially valid given the time elapsed since my knowledge cutoff (today is 2026-05-04, cutoff January 2026).
Other factors
The PR has clear intent, is well-organized (canonicalizes CI gating docs in one place rather than duplicating across pages), and the author has been responsive to bot feedback within the same review cycle. No CODEOWNERS-protected files, no test coverage to worry about (it is the test guidance), and the cross-links between the two pages are coherent. Given the docs-only scope and the addressed feedback, this is comfortable to shadow-approve.
There was a problem hiding this comment.
Docs-only PR; all my prior substantive concerns (eval rename, shell quoting via env var, if: always() on the storage step, LangTabs label consistency) are reflected in the current diff.
Extended reasoning...
Overview
This PR adds a new content/docs/evaluation/experiments/experiments-in-ci-cd.mdx page documenting CI/CD integration for Langfuse experiments (GitHub Action workflow, RunnerContext contract, RegressionError, secrets, action inputs/outputs), moves the existing Pytest/Vitest examples out of experiments-via-sdk.mdx (replaced with a cross-link), registers the new page in meta.json, adds a changelog entry, and adds an author entry + headshot.
Security risks
None material. Modifications are confined to MDX docs, a JSON authors file, and a JPG headshot — no application code, no auth, crypto, or permissions code is touched. The most security-relevant aspect is the example shell snippet for storing result_json, which now correctly uses the env: indirection pattern recommended by GitHub's own security hardening guide rather than interpolating ${{ ... }} directly into a shell command.
Level of scrutiny
Low-to-medium. This is a documentation-only change (size:XL by line count, but all prose + examples). The page has been through two prior bot review rounds and an explicit author re-tag, with each substantive concern I raised in earlier passes addressed in the current diff. The remaining open thread (@v6 action pins) was resolved by the author and the bug hunting system did not re-flag it in this run — this is the kind of judgment call that does not benefit from another bot reiteration.
Other factors
The bug hunting system in this run reported no bugs. All my prior inline comments are marked resolved on the GitHub side, and the corresponding code in the current diff matches the suggested fixes (eval/evaluation rename in three Python comprehensions, env-var pattern on the storage step, if: always() on the storage step, uniform ["Python SDK", "JS/TS SDK"] LangTabs labels in all three locations on the new page). The remaining @v6 pins are the author's intentional choice given their current-date context, and re-flagging them would be noise per the broken-record rule.
Lotte-Verheyden
left a comment
There was a problem hiding this comment.
Content-wise the page is strong, but I would try to refrain from adding another docs page for it here (it should not live on the same level as experiments via SDKs). What about
- making this a guide page
- having a section on the experiments via SDK page that then links to the guide page for details on how to implement it. The section on the experiments via SDK page can be what's now the intro of your CI/CD page
Wdyt?
|
@Lotte-Verheyden Done - main difference to your suggestion is that I kept a snippet of GH action usage in the main docs. This is in line how e.g. braintrust does it |
Summary
RunnerContext, regression failures, and non-GitHub CI patterns.experiment(context: RunnerContext)contract so GitHub Action examples usecontext.runExperiment/context.run_experimentwith action-injected dataset, SDK client, and metadata defaults.Linear
Major Decisions
RunnerContextflow, while keeping direct SDK-based examples for other CI/CD systems.Disclaimer: Experimental PR review
Greptile Summary
This PR adds a dedicated CI/CD integration guide for Langfuse experiments, covering the
langfuse/experiment-actionGitHub Actions workflow,RunnerContextcontract, regression gating viaRegressionError, and equivalent Pytest/Vitest patterns for other CI systems. The existing CI examples are moved from the SDK page to the new page with clean cross-links.evalas an arrow-function parameter name (lines 534 and 557), which is a reserved identifier in TypeScript strict mode and will cause a compilation error for anyone copying the snippet.evalvia loop variables, andnext()without a default can raiseStopIterationsilently instead of a clear test failure.Confidence Score: 3/5
Not safe to merge as-is — the TypeScript code examples contain a strict-mode syntax error that will fail compilation for any reader who copies them.
Two P1 findings (using
evalas a TypeScript parameter name — a hard syntax error in strict mode) pull the score below the P1 ceiling of 4. The remaining findings are P2 style/robustness issues.content/docs/evaluation/experiments/experiments-in-ci-cd.mdx — Vitest TypeScript example (lines 534, 557) and Pytest Python example (lines 418–420, 443–445).
Important Files Changed
evalas a parameter name, which is a syntax error in strict mode; Python examples shadow the built-ineval;next()without a default can surface a confusing StopIteration; echo quoting in the action output YAML snippet is fragile.experiments-in-ci-cdpage in the experiments navigation — trivial and correct.Sequence Diagram
sequenceDiagram participant PR as Pull Request participant GHA as GitHub Actions participant Action as langfuse/experiment-action participant Script as experiment(context) participant LF as Langfuse API PR->>GHA: push / pull_request event GHA->>Action: run with inputs (dataset_name, keys, metadata) Action->>LF: fetch dataset items (dataset_name + dataset_version) Action->>Script: call experiment(RunnerContext) Script->>LF: run_experiment / runExperiment (task + evaluators) LF-->>Script: ExperimentResult (run_evaluations) Script-->>Action: return result (or raise RegressionError) Action->>GHA: set outputs (result_json, failed) Action->>PR: post/update PR comment (pass/regression/error + scores)Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "docs: add experiment CI/CD integration g..." | Re-trigger Greptile