Skip to content

Commit 80cdc4c

Browse files
enriquesanchez-elasticclaudeelasticmachine
authored
[Agent Builder] Skip esql specs in agent-builder evals run (elastic#267522)
## Summary - Adds `testIgnore: ['**/esql/**']` to the agent-builder Playwright eval config so the recursive `testDir` walk no longer picks up `evals/esql/esql.spec.ts`. - That spec is owned by the separate `esql-generation` suite (`esql.playwright.config.ts`), so today it runs twice every weekly LLM evals cycle — ~4 min/run × ~7 connectors = **~28 min/week of duplicate work** in the agent-builder child. - Same recursive-`testDir` bug pattern fixed for entity-analytics v1 in elastic#267451. Refs elastic/security-team#17139 ## Stacking Stacked on top of elastic#267451 (`evals/ea-v1-ignore-v2-specs`), which extends `createPlaywrightEvalsConfig` to forward the optional `testIgnore`. Will retarget to `main` once elastic#267451 merges. ## Test plan - [ ] `node scripts/type_check --project x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/tsconfig.json` clean - [ ] `node scripts/eslint x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/playwright.config.ts` clean - [ ] Next weekly `kibana-evals-weekly-llm-evals` run: agent-builder Playwright report no longer lists `evals/esql/esql.spec.ts`, p50 drops by ~4 min vs `build_13` baseline - [ ] `esql-generation` suite continues to pass on its own config 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
1 parent 587b97c commit 80cdc4c

2 files changed

Lines changed: 8 additions & 0 deletions

File tree

x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/playwright.config.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ import { createPlaywrightEvalsConfig } from '@kbn/evals';
99

1010
export default createPlaywrightEvalsConfig({
1111
testDir: Path.resolve(__dirname, './evals'),
12+
// The `esql-generation` suite (`esql.playwright.config.ts`) owns specs
13+
// under `evals/esql/`. Excluding them here prevents the recursive
14+
// `testDir` walk from picking them up and double-running them in the
15+
// agent-builder weekly cycle.
16+
testIgnore: ['**/esql/**'],
1217
// CI job timeout is ~1h; keep default low and use EVALUATION_REPETITIONS
1318
// for longer/higher-confidence runs.
1419
repetitions: 1,

x-pack/platform/packages/shared/kbn-evals/src/config/create_playwright_eval_config.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,13 @@ export interface EvaluationTestOptions extends ScoutTestOptions {
2424
*/
2525
export function createPlaywrightEvalsConfig({
2626
testDir,
27+
testIgnore,
2728
repetitions,
2829
timeout,
2930
runGlobalSetup,
3031
}: {
3132
testDir: string;
33+
testIgnore?: PlaywrightTestConfig['testIgnore'];
3234
repetitions?: number;
3335
timeout?: number;
3436
runGlobalSetup?: boolean;
@@ -106,5 +108,6 @@ export function createPlaywrightEvalsConfig({
106108
globalSetup: require.resolve('./setup.js'),
107109
globalTeardown: require.resolve('./teardown.js'),
108110
timeout: timeout ?? 5 * 60_000,
111+
...(testIgnore !== undefined ? { testIgnore } : {}),
109112
});
110113
}

0 commit comments

Comments
 (0)