From 6c3621fb7be4713d1e0310209f8bafab0ce0f446 Mon Sep 17 00:00:00 2001 From: Cesare de Cal Date: Fri, 29 May 2026 10:10:25 +0200 Subject: [PATCH 1/6] drop hardcoded artifact paths --- .../skills/flaky-test-investigator/SKILL.md | 24 ++----------------- .github/CODEOWNERS | 1 + .github/workflows/failed-test-investigator.md | 17 +++---------- 3 files changed, 6 insertions(+), 36 deletions(-) diff --git a/.agents/skills/flaky-test-investigator/SKILL.md b/.agents/skills/flaky-test-investigator/SKILL.md index 9e7395951d22b..501febdb377e1 100644 --- a/.agents/skills/flaky-test-investigator/SKILL.md +++ b/.agents/skills/flaky-test-investigator/SKILL.md @@ -42,8 +42,6 @@ For every failure, try to retrieve: - **Server logs** (`kibana.log`, `elasticsearch.log` when present). Cross-reference the failure timestamp with any errors in the logs — a server-side 500 or unexpected warning is strong evidence the failure is a product bug, not a test bug. - **Full session trace** when the framework supports it (Scout / Playwright). Lets you scrub through every step, locator query, network call, and DOM snapshot. -How to actually find and download each artifact type is framework-specific — see "Retrieve failure artifacts" below. - Things to specifically check in the artifacts before forming a root-cause hypothesis: - **Did the expected element render at all?** If yes and the selector missed it → flaky selector (Tier 2 fix territory). If no → real rendering / race / data issue (Tier 1 territory). @@ -53,27 +51,9 @@ Things to specifically check in the artifacts before forming a root-cause hypoth If artifacts are not available (expired, not uploaded, no `read_artifacts` token), say so in the report rather than fabricating a hypothesis. "Screenshot would have resolved this; not available" is a valid open question. -### Retrieve failure artifacts - -The standard recipe is **list → filter by path → download by ID**, always scoped to the failed job's UUID. Two Buildkite gotchas to know about first: - -- **Failed-attempt jobs are hidden by default.** `/builds/` returns only the latest attempt; append `?include_retried_jobs=true` to find the original failing job (the one cited in `failed-test` comments). `retried` and `retried_in_job_id` link the two. -- **Per-job artifacts use a different endpoint than build-wide artifacts.** If a build retried to green, failure artifacts only live on the failed job's listing (`bk artifacts list -p --job-uuid `). Don't conclude "no screenshot uploaded" until you've checked there. - -**Scout** (`@kbn/scout-reporting`, not standard Playwright output — `playwright-report/`, `trace.zip`, and video are NOT published): - -- `.scout/reports/scout-playwright-test-failures-/test-failures-summary.json` — maps test name → HTML report. Start here. -- `.scout/reports/scout-playwright-test-failures-/.html` — self-contained: error, stdout, embedded screenshot. Usually sufficient on its own. -- `.scout/reports/scout-playwright-test-failures-/scout-failures-.ndjson` — one record per failure (`id` = ``, `owner`, `location`, `error.*`) for programmatic use. -- `**/.scout/test-artifacts//test-failed-.png` — plain Playwright screenshot; the PNG doesn't carry ``, so correlate via spec path. - -**FTR** (a single content `` links every artifact for one failure): - -- `target/test_failures/_.{json,log,html}` — `.json` is source of truth; full Kibana/ES stdout lives in `system-out` (there is no separate `kibana.log`). Pull this first. -- `/screenshots/failure/*-.png` and `/failure_debug/html/*-.html` — UI tests only; fetch only when the failure is UI-side. -- `.es/*.log` — transport/cluster-shaped failures. +### List failure artifacts -`target/test_failures/` is shared with Scout; filter by `.jobName` (e.g. `FTR Configs #90` vs `Scout Lane #12`) to keep only FTR. On Cloud FTR pipelines the layout differs: one self-contained HTML per failure at `-/html/.html` — no `target/test_failures/`, screenshot, or DOM artifacts. +`bk artifacts list -p --job-uuid ` returns a JSON listing of every artifact uploaded for the failing job. Pass `--job-uuid ` for the failed attempt (without it, `bk` only returns the latest attempt and hides retried failures). If a build retried to green, failure artifacts only live on the failed job's listing; don't conclude "no screenshot" until you've scoped to the right job UUID. ### Understand the scope diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index b166907cd90f5..cb55c7c162927 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -3539,6 +3539,7 @@ x-pack/solutions/observability/plugins/synthetics/server/saved_objects/synthetic /src/cli/ @elastic/kibana-operations /src/cli_keystore/ @elastic/kibana-operations /.github/workflows/ @elastic/kibana-operations +/.github/workflows/failed-test-investigator.md @elastic/kibana-operations @elastic/appex-qa /.github/aw/ @elastic/kibana-operations /.buildkite/ @elastic/kibana-operations /moon.yml @elastic/kibana-operations diff --git a/.github/workflows/failed-test-investigator.md b/.github/workflows/failed-test-investigator.md index 131a144815d36..aa3e582f16377 100644 --- a/.github/workflows/failed-test-investigator.md +++ b/.github/workflows/failed-test-investigator.md @@ -143,28 +143,17 @@ Post exactly one comment. Keep the visible portion very short and easy to read: 1. **One-line bold headline** stating the result kind and one identifying detail. 2. **Diagnosis** (≤5 concise bullet points): what broke and where, the most likely root cause. -3. **Next steps** (≤5 concise bullet points). +3. **Recommended next steps** (≤5 concise bullet points). -Put the full `flaky-test-investigator` skill output inside a collapsed `
Investigation details ...
` block (not in the visible portion). Open the block with a `#### Findings` subsection containing exactly these four bullets in this order — downstream tooling parses them, so preserve keys, casing, and `` - `key`: value `` shape. These bullets must live **inside `
`**, never in the visible portion: - -- `classification`: `test-design` | `test-environment` | `application` | `external` | `inconclusive` -- `confidence`: `high` | `medium` | `low` -- `test.type`: `scout` (if `scout-playwright` label) | `ftr` | `jest` | `unknown` -- `test.file`: repo-relative path, or `unknown` +Put the full `flaky-test-investigator` skill output inside a collapsed `
Investigation details ...
` block (not in the visible portion). The skill's "Reporting" subsections should also be inside the collapsible section: - What the test does -- What failed and when - Where it ran - Root cause hypothesis - Evidence -- Failure screenshot -- Recommended next step +- Failure screenshot (omit this section if not available) - Open questions Blank lines around `` and `
` are required for the inner markdown to render. - -End the comment with this footer line (verbatim, on its own line after the `` block): - -`AI-generated, share feedback in [#appex-qa](https://elastic.slack.com/archives/C04HT4P1YS3)` From ff7adaa94bbbc6d55ddb6fbb335672080ad89c7f Mon Sep 17 00:00:00 2001 From: Cesare de Cal Date: Fri, 29 May 2026 14:47:18 +0200 Subject: [PATCH 2/6] consume `OPS_BUILDKITE_TOKEN` within workflow --- .../failed-test-investigator.lock.yml | 53 +++++++++++++------ .github/workflows/failed-test-investigator.md | 22 ++++++++ 2 files changed, 60 insertions(+), 15 deletions(-) diff --git a/.github/workflows/failed-test-investigator.lock.yml b/.github/workflows/failed-test-investigator.lock.yml index 4b9626f56e25b..a925531603b2a 100644 --- a/.github/workflows/failed-test-investigator.lock.yml +++ b/.github/workflows/failed-test-investigator.lock.yml @@ -1,5 +1,5 @@ -# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"7a6bfdcfebf53707c7f0d00bdf22f6dbbc733233b2fe1622d39b8af02417c824","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} -# gh-aw-manifest: {"version":1,"secrets":["GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GITHUB_TOKEN","LITELLM_API_KEY"],"actions":[{"repo":"actions/checkout","sha":"de0fac2e4500dabe0009e67214ff5f5447ce83dd","version":"v6.0.2"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"373c709c69115d41ff229c7e5df9f8788daa9553","version":"v9"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/setup-node","sha":"48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e","version":"v6.4.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"},{"repo":"github/gh-aw-actions/setup","sha":"d3abfe96a194bce3a523ed2093ddedd5704cdf62","version":"v0.74.4"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.25.46"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.9","digest":"sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388","pinned_image":"ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388"},{"image":"ghcr.io/github/github-mcp-server:v1.0.4"},{"image":"node:lts-alpine","digest":"sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f","pinned_image":"node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f"}]} +# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"c1d254ccb5dbe15c323956116a381168a55d5f29e1407886c74026ab8a51bb8a","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} +# gh-aw-manifest: {"version":1,"secrets":["GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GITHUB_TOKEN","LITELLM_API_KEY","OPS_BUILDKITE_TOKEN"],"actions":[{"repo":"actions/checkout","sha":"de0fac2e4500dabe0009e67214ff5f5447ce83dd","version":"v6.0.2"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"373c709c69115d41ff229c7e5df9f8788daa9553","version":"v9"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/setup-node","sha":"48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e","version":"v6.4.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"},{"repo":"github/gh-aw-actions/setup","sha":"d3abfe96a194bce3a523ed2093ddedd5704cdf62","version":"v0.74.4"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.25.46"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.9","digest":"sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388","pinned_image":"ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388"},{"image":"ghcr.io/github/github-mcp-server:v1.0.4"},{"image":"node:lts-alpine","digest":"sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f","pinned_image":"node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f"}]} # ___ _ _ # / _ \ | | (_) # | |_| | __ _ ___ _ __ | |_ _ ___ @@ -32,6 +32,7 @@ # - GH_AW_GITHUB_TOKEN # - GITHUB_TOKEN # - LITELLM_API_KEY +# - OPS_BUILDKITE_TOKEN # # Custom actions used: # - actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 @@ -221,20 +222,20 @@ jobs: run: | bash "${RUNNER_TEMP}/gh-aw/actions/create_prompt_first.sh" { - cat << 'GH_AW_PROMPT_5e3c69e2776b57d9_EOF' + cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' - GH_AW_PROMPT_5e3c69e2776b57d9_EOF + GH_AW_PROMPT_f01985087bd37644_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/xpia.md" cat "${RUNNER_TEMP}/gh-aw/prompts/temp_folder_prompt.md" cat "${RUNNER_TEMP}/gh-aw/prompts/markdown.md" cat "${RUNNER_TEMP}/gh-aw/prompts/safe_outputs_prompt.md" - cat << 'GH_AW_PROMPT_5e3c69e2776b57d9_EOF' + cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' Tools: add_comment, add_labels, missing_tool, missing_data, noop - GH_AW_PROMPT_5e3c69e2776b57d9_EOF + GH_AW_PROMPT_f01985087bd37644_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/mcp_cli_tools_prompt.md" - cat << 'GH_AW_PROMPT_5e3c69e2776b57d9_EOF' + cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' The following GitHub context information is available for this workflow: {{#if github.actor}} @@ -263,12 +264,12 @@ jobs: {{/if}} - GH_AW_PROMPT_5e3c69e2776b57d9_EOF + GH_AW_PROMPT_f01985087bd37644_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/github_mcp_tools_with_safeoutputs_prompt.md" - cat << 'GH_AW_PROMPT_5e3c69e2776b57d9_EOF' + cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' {{#runtime-import .github/workflows/failed-test-investigator.md}} - GH_AW_PROMPT_5e3c69e2776b57d9_EOF + GH_AW_PROMPT_f01985087bd37644_EOF } > "$GH_AW_PROMPT" - name: Interpolate variables and render templates uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 @@ -412,6 +413,27 @@ jobs: run: bash "${RUNNER_TEMP}/gh-aw/actions/configure_gh_for_ghe.sh" env: GH_TOKEN: ${{ github.token }} + - env: + BK_SHA256: 88867c0b983ad2afe1efc26f0df6b46b5673577c1aea95eba76992636fb9abe9 + BK_VERSION: 3.44.0 + OPS_BUILDKITE_TOKEN: ${{ secrets.OPS_BUILDKITE_TOKEN }} + name: Install Buildkite CLI and export BUILDKITE_API_TOKEN + run: |- + set -euo pipefail + tmp="$(mktemp -d)" + url="https://github.com/buildkite/cli/releases/download/v${BK_VERSION}/bk_${BK_VERSION}_linux_amd64.tar.gz" + curl -fsSL --retry 3 --retry-delay 2 "${url}" -o "${tmp}/bk.tgz" + echo "${BK_SHA256} ${tmp}/bk.tgz" | sha256sum -c - + tar -xzf "${tmp}/bk.tgz" -C "${tmp}" bk + install -d "${RUNNER_TEMP}/gh-aw/mcp-cli/bin" + install -m 0755 "${tmp}/bk" "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" + "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" --version + if [ -z "${OPS_BUILDKITE_TOKEN:-}" ]; then + echo "::error::OPS_BUILDKITE_TOKEN secret is not set" >&2 + exit 1 + fi + echo "BUILDKITE_API_TOKEN=${OPS_BUILDKITE_TOKEN}" >> "${GITHUB_ENV}" + - name: Configure Git credentials env: REPO_NAME: ${{ github.repository }} @@ -481,9 +503,9 @@ jobs: mkdir -p "${RUNNER_TEMP}/gh-aw/safeoutputs" mkdir -p /tmp/gh-aw/safeoutputs mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs - cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_378c4ec1b2b08951_EOF' + cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_2efcc88f2ce5920c_EOF' {"add_comment":{"hide_older_comments":true,"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"add_labels":{"allowed":["ai:auto-flaky-fix"],"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"create_report_incomplete_issue":{},"missing_data":{},"missing_tool":{},"noop":{"max":1,"report-as-issue":"false"},"report_incomplete":{}} - GH_AW_SAFE_OUTPUTS_CONFIG_378c4ec1b2b08951_EOF + GH_AW_SAFE_OUTPUTS_CONFIG_2efcc88f2ce5920c_EOF - name: Generate Safe Outputs Tools env: GH_AW_TOOLS_META_JSON: | @@ -696,7 +718,7 @@ jobs: export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host --add-host host.docker.internal:127.0.0.1 --user '"${MCP_GATEWAY_UID}"':'"${MCP_GATEWAY_GID}"' --group-add '"${DOCKER_SOCK_GID}"' -v '"${DOCKER_SOCK_PATH}"':/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DOCKER_HOST=unix:///var/run/docker.sock -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:v0.3.9' GH_AW_NODE=$(which node 2>/dev/null || command -v node 2>/dev/null || echo node) - cat << GH_AW_MCP_CONFIG_44518aa00d902aad_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" + cat << GH_AW_MCP_CONFIG_2e97112b03cd1882_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" { "mcpServers": { "github": { @@ -736,7 +758,7 @@ jobs: "payloadDir": "${MCP_GATEWAY_PAYLOAD_DIR}" } } - GH_AW_MCP_CONFIG_44518aa00d902aad_EOF + GH_AW_MCP_CONFIG_2e97112b03cd1882_EOF - name: Mount MCP servers as CLIs id: mount-mcp-clis continue-on-error: true @@ -910,11 +932,12 @@ jobs: const { main } = require('${{ runner.temp }}/gh-aw/actions/redact_secrets.cjs'); await main(); env: - GH_AW_SECRET_NAMES: 'GH_AW_GITHUB_MCP_SERVER_TOKEN,GH_AW_GITHUB_TOKEN,GITHUB_TOKEN,LITELLM_API_KEY' + GH_AW_SECRET_NAMES: 'GH_AW_GITHUB_MCP_SERVER_TOKEN,GH_AW_GITHUB_TOKEN,GITHUB_TOKEN,LITELLM_API_KEY,OPS_BUILDKITE_TOKEN' SECRET_GH_AW_GITHUB_MCP_SERVER_TOKEN: ${{ secrets.GH_AW_GITHUB_MCP_SERVER_TOKEN }} SECRET_GH_AW_GITHUB_TOKEN: ${{ secrets.GH_AW_GITHUB_TOKEN }} SECRET_GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} SECRET_LITELLM_API_KEY: ${{ secrets.LITELLM_API_KEY }} + SECRET_OPS_BUILDKITE_TOKEN: ${{ secrets.OPS_BUILDKITE_TOKEN }} - name: Append agent step summary if: always() run: bash "${RUNNER_TEMP}/gh-aw/actions/append_agent_step_summary.sh" diff --git a/.github/workflows/failed-test-investigator.md b/.github/workflows/failed-test-investigator.md index aa3e582f16377..6e44d54d7cc6a 100644 --- a/.github/workflows/failed-test-investigator.md +++ b/.github/workflows/failed-test-investigator.md @@ -60,6 +60,28 @@ network: - elastic.litellm-prod.ai sandbox: agent: awf # Migrated from deprecated network setting +steps: + - name: Install Buildkite CLI and export BUILDKITE_API_TOKEN + env: + BK_VERSION: 3.44.0 + BK_SHA256: 88867c0b983ad2afe1efc26f0df6b46b5673577c1aea95eba76992636fb9abe9 + OPS_BUILDKITE_TOKEN: ${{ secrets.OPS_BUILDKITE_TOKEN }} + run: | + set -euo pipefail + tmp="$(mktemp -d)" + url="https://github.com/buildkite/cli/releases/download/v${BK_VERSION}/bk_${BK_VERSION}_linux_amd64.tar.gz" + curl -fsSL --retry 3 --retry-delay 2 "${url}" -o "${tmp}/bk.tgz" + echo "${BK_SHA256} ${tmp}/bk.tgz" | sha256sum -c - + tar -xzf "${tmp}/bk.tgz" -C "${tmp}" bk + install -d "${RUNNER_TEMP}/gh-aw/mcp-cli/bin" + install -m 0755 "${tmp}/bk" "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" + "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" --version + if [ -z "${OPS_BUILDKITE_TOKEN:-}" ]; then + echo "::error::OPS_BUILDKITE_TOKEN secret is not set" >&2 + exit 1 + fi + echo "BUILDKITE_API_TOKEN=${OPS_BUILDKITE_TOKEN}" >> "${GITHUB_ENV}" + safe-outputs: noop: report-as-issue: false From f02033e1c1e602cf79c4ab06c76f17f532a92fef Mon Sep 17 00:00:00 2001 From: Cesare de Cal Date: Fri, 29 May 2026 15:18:27 +0200 Subject: [PATCH 3/6] simplify investigation comment format --- .../failed-test-investigator.lock.yml | 28 +++++------ .github/workflows/failed-test-investigator.md | 49 +++++++++++++------ 2 files changed, 49 insertions(+), 28 deletions(-) diff --git a/.github/workflows/failed-test-investigator.lock.yml b/.github/workflows/failed-test-investigator.lock.yml index a925531603b2a..6943328432843 100644 --- a/.github/workflows/failed-test-investigator.lock.yml +++ b/.github/workflows/failed-test-investigator.lock.yml @@ -1,4 +1,4 @@ -# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"c1d254ccb5dbe15c323956116a381168a55d5f29e1407886c74026ab8a51bb8a","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} +# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"64d07d9d19741d16650989366d5651ab1a464189997d3bed54375d3c4a7d127d","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} # gh-aw-manifest: {"version":1,"secrets":["GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GITHUB_TOKEN","LITELLM_API_KEY","OPS_BUILDKITE_TOKEN"],"actions":[{"repo":"actions/checkout","sha":"de0fac2e4500dabe0009e67214ff5f5447ce83dd","version":"v6.0.2"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"373c709c69115d41ff229c7e5df9f8788daa9553","version":"v9"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/setup-node","sha":"48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e","version":"v6.4.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"},{"repo":"github/gh-aw-actions/setup","sha":"d3abfe96a194bce3a523ed2093ddedd5704cdf62","version":"v0.74.4"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.25.46"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.9","digest":"sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388","pinned_image":"ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388"},{"image":"ghcr.io/github/github-mcp-server:v1.0.4"},{"image":"node:lts-alpine","digest":"sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f","pinned_image":"node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f"}]} # ___ _ _ # / _ \ | | (_) @@ -222,20 +222,20 @@ jobs: run: | bash "${RUNNER_TEMP}/gh-aw/actions/create_prompt_first.sh" { - cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' + cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' - GH_AW_PROMPT_f01985087bd37644_EOF + GH_AW_PROMPT_7e621c923ef88c74_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/xpia.md" cat "${RUNNER_TEMP}/gh-aw/prompts/temp_folder_prompt.md" cat "${RUNNER_TEMP}/gh-aw/prompts/markdown.md" cat "${RUNNER_TEMP}/gh-aw/prompts/safe_outputs_prompt.md" - cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' + cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' Tools: add_comment, add_labels, missing_tool, missing_data, noop - GH_AW_PROMPT_f01985087bd37644_EOF + GH_AW_PROMPT_7e621c923ef88c74_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/mcp_cli_tools_prompt.md" - cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' + cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' The following GitHub context information is available for this workflow: {{#if github.actor}} @@ -264,12 +264,12 @@ jobs: {{/if}} - GH_AW_PROMPT_f01985087bd37644_EOF + GH_AW_PROMPT_7e621c923ef88c74_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/github_mcp_tools_with_safeoutputs_prompt.md" - cat << 'GH_AW_PROMPT_f01985087bd37644_EOF' + cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' {{#runtime-import .github/workflows/failed-test-investigator.md}} - GH_AW_PROMPT_f01985087bd37644_EOF + GH_AW_PROMPT_7e621c923ef88c74_EOF } > "$GH_AW_PROMPT" - name: Interpolate variables and render templates uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 @@ -424,7 +424,7 @@ jobs: url="https://github.com/buildkite/cli/releases/download/v${BK_VERSION}/bk_${BK_VERSION}_linux_amd64.tar.gz" curl -fsSL --retry 3 --retry-delay 2 "${url}" -o "${tmp}/bk.tgz" echo "${BK_SHA256} ${tmp}/bk.tgz" | sha256sum -c - - tar -xzf "${tmp}/bk.tgz" -C "${tmp}" bk + tar -xzf "${tmp}/bk.tgz" -C "${tmp}" --strip-components=1 "bk_${BK_VERSION}_linux_amd64/bk" install -d "${RUNNER_TEMP}/gh-aw/mcp-cli/bin" install -m 0755 "${tmp}/bk" "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" --version @@ -503,9 +503,9 @@ jobs: mkdir -p "${RUNNER_TEMP}/gh-aw/safeoutputs" mkdir -p /tmp/gh-aw/safeoutputs mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs - cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_2efcc88f2ce5920c_EOF' + cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_7beb0cf3227a93f5_EOF' {"add_comment":{"hide_older_comments":true,"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"add_labels":{"allowed":["ai:auto-flaky-fix"],"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"create_report_incomplete_issue":{},"missing_data":{},"missing_tool":{},"noop":{"max":1,"report-as-issue":"false"},"report_incomplete":{}} - GH_AW_SAFE_OUTPUTS_CONFIG_2efcc88f2ce5920c_EOF + GH_AW_SAFE_OUTPUTS_CONFIG_7beb0cf3227a93f5_EOF - name: Generate Safe Outputs Tools env: GH_AW_TOOLS_META_JSON: | @@ -718,7 +718,7 @@ jobs: export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host --add-host host.docker.internal:127.0.0.1 --user '"${MCP_GATEWAY_UID}"':'"${MCP_GATEWAY_GID}"' --group-add '"${DOCKER_SOCK_GID}"' -v '"${DOCKER_SOCK_PATH}"':/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DOCKER_HOST=unix:///var/run/docker.sock -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:v0.3.9' GH_AW_NODE=$(which node 2>/dev/null || command -v node 2>/dev/null || echo node) - cat << GH_AW_MCP_CONFIG_2e97112b03cd1882_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" + cat << GH_AW_MCP_CONFIG_fba672d45aa69ac1_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" { "mcpServers": { "github": { @@ -758,7 +758,7 @@ jobs: "payloadDir": "${MCP_GATEWAY_PAYLOAD_DIR}" } } - GH_AW_MCP_CONFIG_2e97112b03cd1882_EOF + GH_AW_MCP_CONFIG_fba672d45aa69ac1_EOF - name: Mount MCP servers as CLIs id: mount-mcp-clis continue-on-error: true diff --git a/.github/workflows/failed-test-investigator.md b/.github/workflows/failed-test-investigator.md index 6e44d54d7cc6a..46abc3e1b2e53 100644 --- a/.github/workflows/failed-test-investigator.md +++ b/.github/workflows/failed-test-investigator.md @@ -72,7 +72,7 @@ steps: url="https://github.com/buildkite/cli/releases/download/v${BK_VERSION}/bk_${BK_VERSION}_linux_amd64.tar.gz" curl -fsSL --retry 3 --retry-delay 2 "${url}" -o "${tmp}/bk.tgz" echo "${BK_SHA256} ${tmp}/bk.tgz" | sha256sum -c - - tar -xzf "${tmp}/bk.tgz" -C "${tmp}" bk + tar -xzf "${tmp}/bk.tgz" -C "${tmp}" --strip-components=1 "bk_${BK_VERSION}_linux_amd64/bk" install -d "${RUNNER_TEMP}/gh-aw/mcp-cli/bin" install -m 0755 "${tmp}/bk" "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" "${RUNNER_TEMP}/gh-aw/mcp-cli/bin/bk" --version @@ -161,21 +161,42 @@ No other side-effects beyond posting the comment and updating the label. ## Comment format -Post exactly one comment. Keep the visible portion very short and easy to read: +Post exactly one comment on the issue. Keep it concise, actionable, and prioritize the most critical findings at the very top. Adapt the sections below to best fit the specific failure. -1. **One-line bold headline** stating the result kind and one identifying detail. -2. **Diagnosis** (≤5 concise bullet points): what broke and where, the most likely root cause. -3. **Recommended next steps** (≤5 concise bullet points). +Do not create standalone sections for "what the test does" "evidence," "where the test ran," or "failure screenshot". Integrate these details seamlessly into the sections below if it adds value. -Put the full `flaky-test-investigator` skill output inside a collapsed `
Investigation details ...
` block (not in the visible portion). +### 1. The TL;DR (Required) -The skill's "Reporting" subsections should also be inside the collapsible section: +Start with a clear heading, essential metadata, and a brief summary of the failure, followed by a horizontal rule. -- What the test does -- Where it ran -- Root cause hypothesis -- Evidence -- Failure screenshot (omit this section if not available) -- Open questions +``` +## {Classification}: {One-line description of what broke} -Blank lines around `` and `` are required for the inner markdown to render. +## **Classification:** {type} | **Confidence:** {level} | **Introduced by:** {commit/PR if known} + +**Summary:** One or two sentences explaining the exact failure point. +``` + +### 2. Proposed fix (required) + +Provide the most direct path to resolution immediately after the summary. + +- **Single file:** lead directly with the suggested code diff or specific action. +- **Multiple files:** use a brief table to list affected files, followed by the necessary changes. +- **No concrete fix:** clearly state what additional evidence or investigation is needed to propose one. + +### 3. Root Cause & Evidence (required) + +Explain _why_ the failure occurred, citing specific evidence. Choose the format that best fits the complexity of the bug: + +- Use concise paragraphs with inline Markdown links pointing to specific code lines, commits, or files. +- Use an ASCII timeline diagram for race conditions, multi-component bugs, or complex state leaks. +- Fold relevant evidence (like missing `data-test-subj` attributes, failing network calls, or screenshot descriptions) directly into this narrative. + +### 4. Additional context (optional) + +Include the following only if they provide high-value, actionable signal: + +- **Ruled out:** a brief note on alternative hypotheses that were investigated and dismissed. +- **Verification:** specific steps to reproduce the failure or confirm the fix. +- **Open questions:** unresolved design or environmental issues blocking a definitive fix. From fa332d2b4357abdb759fc8539f695ef31d3bf9fa Mon Sep 17 00:00:00 2001 From: Cesare de Cal Date: Fri, 29 May 2026 17:27:04 +0200 Subject: [PATCH 4/6] whitelist `buildkiteartifacts.com`, update comment format --- .../failed-test-investigator.lock.yml | 38 ++++++++++--------- .github/workflows/failed-test-investigator.md | 11 ++++-- 2 files changed, 27 insertions(+), 22 deletions(-) diff --git a/.github/workflows/failed-test-investigator.lock.yml b/.github/workflows/failed-test-investigator.lock.yml index 6943328432843..952ec5b7014fa 100644 --- a/.github/workflows/failed-test-investigator.lock.yml +++ b/.github/workflows/failed-test-investigator.lock.yml @@ -1,4 +1,4 @@ -# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"64d07d9d19741d16650989366d5651ab1a464189997d3bed54375d3c4a7d127d","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} +# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"7560f1fdf4c0cc9d8a374dc7a8f4ece8a8eb3af0b090f662e0844532317b7baa","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} # gh-aw-manifest: {"version":1,"secrets":["GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GITHUB_TOKEN","LITELLM_API_KEY","OPS_BUILDKITE_TOKEN"],"actions":[{"repo":"actions/checkout","sha":"de0fac2e4500dabe0009e67214ff5f5447ce83dd","version":"v6.0.2"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"373c709c69115d41ff229c7e5df9f8788daa9553","version":"v9"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/setup-node","sha":"48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e","version":"v6.4.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"},{"repo":"github/gh-aw-actions/setup","sha":"d3abfe96a194bce3a523ed2093ddedd5704cdf62","version":"v0.74.4"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.25.46"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.9","digest":"sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388","pinned_image":"ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388"},{"image":"ghcr.io/github/github-mcp-server:v1.0.4"},{"image":"node:lts-alpine","digest":"sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f","pinned_image":"node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f"}]} # ___ _ _ # / _ \ | | (_) @@ -25,6 +25,7 @@ # Investigate a failed-test issue, classify the failure, and propose a fix when appropriate. # # Frontmatter env variables: +# - BUILDKITE_ORGANIZATION_SLUG: (main workflow) # - ISSUE_NUMBER: (main workflow) # # Secrets used: @@ -79,6 +80,7 @@ concurrency: run-name: "Failed Test Investigator" env: + BUILDKITE_ORGANIZATION_SLUG: elastic ISSUE_NUMBER: ${{ github.event.issue.number || github.event.inputs.issue_number }} jobs: @@ -132,7 +134,7 @@ jobs: GH_AW_INFO_EXPERIMENTAL: "false" GH_AW_INFO_SUPPORTS_TOOLS_ALLOWLIST: "true" GH_AW_INFO_STAGED: "false" - GH_AW_INFO_ALLOWED_DOMAINS: '["defaults","buildkite.com","*.buildkite.com","ci-stats.kibana.dev","github.com","api.github.com","chatgpt.com","elastic.litellm-prod.ai"]' + GH_AW_INFO_ALLOWED_DOMAINS: '["defaults","buildkite.com","*.buildkite.com","buildkiteartifacts.com","ci-stats.kibana.dev","github.com","api.github.com","chatgpt.com","elastic.litellm-prod.ai"]' GH_AW_INFO_FIREWALL_ENABLED: "true" GH_AW_INFO_AWF_VERSION: "v0.25.46" GH_AW_INFO_AWMG_VERSION: "" @@ -197,7 +199,7 @@ jobs: id: sanitized uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 env: - GH_AW_ALLOWED_DOMAINS: "*.buildkite.com,*.githubusercontent.com,anthropic.com,api.anthropic.com,api.github.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,buildkite.com,cdn.playwright.dev,chatgpt.com,ci-stats.kibana.dev,codeload.github.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,elastic.litellm-prod.ai,files.pythonhosted.org,ghcr.io,github-cloud.githubusercontent.com,github-cloud.s3.amazonaws.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,lfs.github.com,objects.githubusercontent.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,playwright.download.prss.microsoft.com,ppa.launchpad.net,pypi.org,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,sentry.io,statsig.anthropic.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" + GH_AW_ALLOWED_DOMAINS: "*.buildkite.com,*.githubusercontent.com,anthropic.com,api.anthropic.com,api.github.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,buildkite.com,buildkiteartifacts.com,cdn.playwright.dev,chatgpt.com,ci-stats.kibana.dev,codeload.github.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,elastic.litellm-prod.ai,files.pythonhosted.org,ghcr.io,github-cloud.githubusercontent.com,github-cloud.s3.amazonaws.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,lfs.github.com,objects.githubusercontent.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,playwright.download.prss.microsoft.com,ppa.launchpad.net,pypi.org,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,sentry.io,statsig.anthropic.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" with: script: | const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); @@ -222,20 +224,20 @@ jobs: run: | bash "${RUNNER_TEMP}/gh-aw/actions/create_prompt_first.sh" { - cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' + cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' - GH_AW_PROMPT_7e621c923ef88c74_EOF + GH_AW_PROMPT_78b5446074d0dc57_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/xpia.md" cat "${RUNNER_TEMP}/gh-aw/prompts/temp_folder_prompt.md" cat "${RUNNER_TEMP}/gh-aw/prompts/markdown.md" cat "${RUNNER_TEMP}/gh-aw/prompts/safe_outputs_prompt.md" - cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' + cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' Tools: add_comment, add_labels, missing_tool, missing_data, noop - GH_AW_PROMPT_7e621c923ef88c74_EOF + GH_AW_PROMPT_78b5446074d0dc57_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/mcp_cli_tools_prompt.md" - cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' + cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' The following GitHub context information is available for this workflow: {{#if github.actor}} @@ -264,12 +266,12 @@ jobs: {{/if}} - GH_AW_PROMPT_7e621c923ef88c74_EOF + GH_AW_PROMPT_78b5446074d0dc57_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/github_mcp_tools_with_safeoutputs_prompt.md" - cat << 'GH_AW_PROMPT_7e621c923ef88c74_EOF' + cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' {{#runtime-import .github/workflows/failed-test-investigator.md}} - GH_AW_PROMPT_7e621c923ef88c74_EOF + GH_AW_PROMPT_78b5446074d0dc57_EOF } > "$GH_AW_PROMPT" - name: Interpolate variables and render templates uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 @@ -503,9 +505,9 @@ jobs: mkdir -p "${RUNNER_TEMP}/gh-aw/safeoutputs" mkdir -p /tmp/gh-aw/safeoutputs mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs - cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_7beb0cf3227a93f5_EOF' + cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_f5fcdc760153c6b8_EOF' {"add_comment":{"hide_older_comments":true,"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"add_labels":{"allowed":["ai:auto-flaky-fix"],"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"create_report_incomplete_issue":{},"missing_data":{},"missing_tool":{},"noop":{"max":1,"report-as-issue":"false"},"report_incomplete":{}} - GH_AW_SAFE_OUTPUTS_CONFIG_7beb0cf3227a93f5_EOF + GH_AW_SAFE_OUTPUTS_CONFIG_f5fcdc760153c6b8_EOF - name: Generate Safe Outputs Tools env: GH_AW_TOOLS_META_JSON: | @@ -718,7 +720,7 @@ jobs: export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host --add-host host.docker.internal:127.0.0.1 --user '"${MCP_GATEWAY_UID}"':'"${MCP_GATEWAY_GID}"' --group-add '"${DOCKER_SOCK_GID}"' -v '"${DOCKER_SOCK_PATH}"':/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DOCKER_HOST=unix:///var/run/docker.sock -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:v0.3.9' GH_AW_NODE=$(which node 2>/dev/null || command -v node 2>/dev/null || echo node) - cat << GH_AW_MCP_CONFIG_fba672d45aa69ac1_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" + cat << GH_AW_MCP_CONFIG_4579b1254a53c9d3_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" { "mcpServers": { "github": { @@ -758,7 +760,7 @@ jobs: "payloadDir": "${MCP_GATEWAY_PAYLOAD_DIR}" } } - GH_AW_MCP_CONFIG_fba672d45aa69ac1_EOF + GH_AW_MCP_CONFIG_4579b1254a53c9d3_EOF - name: Mount MCP servers as CLIs id: mount-mcp-clis continue-on-error: true @@ -862,7 +864,7 @@ jobs: printf '%s' "$(date +%s%3N)" > /tmp/gh-aw/agent_cli_start_ms.txt touch /tmp/gh-aw/agent-step-summary.md (umask 177 && touch /tmp/gh-aw/agent-stdio.log) - printf '%s\n' '{"$schema":"https://github.com/github/gh-aw-firewall/releases/download/v0.25.46/awf-config.schema.json","network":{"allowDomains":["*.buildkite.com","*.githubusercontent.com","anthropic.com","api.anthropic.com","api.github.com","api.snapcraft.io","archive.ubuntu.com","azure.archive.ubuntu.com","buildkite.com","cdn.playwright.dev","chatgpt.com","ci-stats.kibana.dev","codeload.github.com","crl.geotrust.com","crl.globalsign.com","crl.identrust.com","crl.sectigo.com","crl.thawte.com","crl.usertrust.com","crl.verisign.com","crl3.digicert.com","crl4.digicert.com","crls.ssl.com","elastic.litellm-prod.ai","files.pythonhosted.org","ghcr.io","github-cloud.githubusercontent.com","github-cloud.s3.amazonaws.com","github.com","host.docker.internal","json-schema.org","json.schemastore.org","keyserver.ubuntu.com","lfs.github.com","objects.githubusercontent.com","ocsp.digicert.com","ocsp.geotrust.com","ocsp.globalsign.com","ocsp.identrust.com","ocsp.sectigo.com","ocsp.ssl.com","ocsp.thawte.com","ocsp.usertrust.com","ocsp.verisign.com","packagecloud.io","packages.cloud.google.com","packages.microsoft.com","playwright.download.prss.microsoft.com","ppa.launchpad.net","pypi.org","raw.githubusercontent.com","registry.npmjs.org","s.symcb.com","s.symcd.com","security.ubuntu.com","sentry.io","statsig.anthropic.com","ts-crl.ws.symantec.com","ts-ocsp.ws.symantec.com","www.googleapis.com"]},"apiProxy":{"enabled":true,"enableTokenSteering":true,"maxRuns":500,"maxEffectiveTokens":25000000,"targets":{"anthropic":{"host":"elastic.litellm-prod.ai"}},"models":{"auto":["large"],"coding":["copilot/gpt-5*codex*","openai/gpt-5*codex*","gpt-5-codex"],"deep-research":["copilot/deep-research*","copilot/o3-deep-research*","copilot/o4-mini-deep-research*","google/deep-research*","gemini/deep-research*","openai/o3-deep-research*","openai/o4-mini-deep-research*"],"gemini-flash":["copilot/gemini-*flash*","google/gemini-*flash*","gemini/gemini-*flash*"],"gemini-flash-lite":["copilot/gemini-*flash*lite*","google/gemini-*flash*lite*","gemini/gemini-*flash*lite*"],"gemini-pro":["copilot/gemini-*pro*","google/gemini-*pro*","gemini/gemini-*pro*"],"gemma":["copilot/gemma*","google/gemma*","gemini/gemma*"],"gpt-4.1":["copilot/gpt-4.1*","openai/gpt-4.1*"],"gpt-5":["copilot/gpt-5*","openai/gpt-5*"],"gpt-5-codex":["copilot/gpt-5*codex*","openai/gpt-5*codex*"],"gpt-5-mini":["copilot/gpt-5*mini*","openai/gpt-5*mini*"],"gpt-5-nano":["copilot/gpt-5*nano*","openai/gpt-5*nano*"],"gpt-5-pro":["copilot/gpt-5*pro*","openai/gpt-5*pro*"],"haiku":["copilot/*haiku*","anthropic/*haiku*"],"large":["sonnet","gpt-5-pro","gpt-5","gemini-pro"],"mini":["haiku","gpt-5-mini","gpt-5-nano","gemini-flash-lite"],"opus":["copilot/*opus*","anthropic/*opus*"],"reasoning":["copilot/o1*","copilot/o3*","copilot/o4*","openai/o1*","openai/o3*","openai/o4*"],"small":["mini"],"sonnet":["copilot/*sonnet*","anthropic/*sonnet*"],"vision":["copilot/gemini-*image*","gemini/gemini-*image*","copilot/gemini-*flash*","gemini/gemini-*flash*"]}},"container":{"imageTag":"0.25.46"}}' > "${RUNNER_TEMP}/gh-aw/awf-config.json" && cp "${RUNNER_TEMP}/gh-aw/awf-config.json" /tmp/gh-aw/awf-config.json + printf '%s\n' '{"$schema":"https://github.com/github/gh-aw-firewall/releases/download/v0.25.46/awf-config.schema.json","network":{"allowDomains":["*.buildkite.com","*.githubusercontent.com","anthropic.com","api.anthropic.com","api.github.com","api.snapcraft.io","archive.ubuntu.com","azure.archive.ubuntu.com","buildkite.com","buildkiteartifacts.com","cdn.playwright.dev","chatgpt.com","ci-stats.kibana.dev","codeload.github.com","crl.geotrust.com","crl.globalsign.com","crl.identrust.com","crl.sectigo.com","crl.thawte.com","crl.usertrust.com","crl.verisign.com","crl3.digicert.com","crl4.digicert.com","crls.ssl.com","elastic.litellm-prod.ai","files.pythonhosted.org","ghcr.io","github-cloud.githubusercontent.com","github-cloud.s3.amazonaws.com","github.com","host.docker.internal","json-schema.org","json.schemastore.org","keyserver.ubuntu.com","lfs.github.com","objects.githubusercontent.com","ocsp.digicert.com","ocsp.geotrust.com","ocsp.globalsign.com","ocsp.identrust.com","ocsp.sectigo.com","ocsp.ssl.com","ocsp.thawte.com","ocsp.usertrust.com","ocsp.verisign.com","packagecloud.io","packages.cloud.google.com","packages.microsoft.com","playwright.download.prss.microsoft.com","ppa.launchpad.net","pypi.org","raw.githubusercontent.com","registry.npmjs.org","s.symcb.com","s.symcd.com","security.ubuntu.com","sentry.io","statsig.anthropic.com","ts-crl.ws.symantec.com","ts-ocsp.ws.symantec.com","www.googleapis.com"]},"apiProxy":{"enabled":true,"enableTokenSteering":true,"maxRuns":500,"maxEffectiveTokens":25000000,"targets":{"anthropic":{"host":"elastic.litellm-prod.ai"}},"models":{"auto":["large"],"coding":["copilot/gpt-5*codex*","openai/gpt-5*codex*","gpt-5-codex"],"deep-research":["copilot/deep-research*","copilot/o3-deep-research*","copilot/o4-mini-deep-research*","google/deep-research*","gemini/deep-research*","openai/o3-deep-research*","openai/o4-mini-deep-research*"],"gemini-flash":["copilot/gemini-*flash*","google/gemini-*flash*","gemini/gemini-*flash*"],"gemini-flash-lite":["copilot/gemini-*flash*lite*","google/gemini-*flash*lite*","gemini/gemini-*flash*lite*"],"gemini-pro":["copilot/gemini-*pro*","google/gemini-*pro*","gemini/gemini-*pro*"],"gemma":["copilot/gemma*","google/gemma*","gemini/gemma*"],"gpt-4.1":["copilot/gpt-4.1*","openai/gpt-4.1*"],"gpt-5":["copilot/gpt-5*","openai/gpt-5*"],"gpt-5-codex":["copilot/gpt-5*codex*","openai/gpt-5*codex*"],"gpt-5-mini":["copilot/gpt-5*mini*","openai/gpt-5*mini*"],"gpt-5-nano":["copilot/gpt-5*nano*","openai/gpt-5*nano*"],"gpt-5-pro":["copilot/gpt-5*pro*","openai/gpt-5*pro*"],"haiku":["copilot/*haiku*","anthropic/*haiku*"],"large":["sonnet","gpt-5-pro","gpt-5","gemini-pro"],"mini":["haiku","gpt-5-mini","gpt-5-nano","gemini-flash-lite"],"opus":["copilot/*opus*","anthropic/*opus*"],"reasoning":["copilot/o1*","copilot/o3*","copilot/o4*","openai/o1*","openai/o3*","openai/o4*"],"small":["mini"],"sonnet":["copilot/*sonnet*","anthropic/*sonnet*"],"vision":["copilot/gemini-*image*","gemini/gemini-*image*","copilot/gemini-*flash*","gemini/gemini-*flash*"]}},"container":{"imageTag":"0.25.46"}}' > "${RUNNER_TEMP}/gh-aw/awf-config.json" && cp "${RUNNER_TEMP}/gh-aw/awf-config.json" /tmp/gh-aw/awf-config.json GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS="" if [[ "${DOCKER_HOST:-}" =~ ^tcp:// ]]; then GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS="--docker-host-path-prefix /tmp/gh-aw" @@ -954,7 +956,7 @@ jobs: uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 env: GH_AW_SAFE_OUTPUTS: ${{ steps.set-runtime-paths.outputs.GH_AW_SAFE_OUTPUTS }} - GH_AW_ALLOWED_DOMAINS: "*.buildkite.com,*.githubusercontent.com,anthropic.com,api.anthropic.com,api.github.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,buildkite.com,cdn.playwright.dev,chatgpt.com,ci-stats.kibana.dev,codeload.github.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,elastic.litellm-prod.ai,files.pythonhosted.org,ghcr.io,github-cloud.githubusercontent.com,github-cloud.s3.amazonaws.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,lfs.github.com,objects.githubusercontent.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,playwright.download.prss.microsoft.com,ppa.launchpad.net,pypi.org,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,sentry.io,statsig.anthropic.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" + GH_AW_ALLOWED_DOMAINS: "*.buildkite.com,*.githubusercontent.com,anthropic.com,api.anthropic.com,api.github.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,buildkite.com,buildkiteartifacts.com,cdn.playwright.dev,chatgpt.com,ci-stats.kibana.dev,codeload.github.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,elastic.litellm-prod.ai,files.pythonhosted.org,ghcr.io,github-cloud.githubusercontent.com,github-cloud.s3.amazonaws.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,lfs.github.com,objects.githubusercontent.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,playwright.download.prss.microsoft.com,ppa.launchpad.net,pypi.org,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,sentry.io,statsig.anthropic.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" GITHUB_SERVER_URL: ${{ github.server_url }} GITHUB_API_URL: ${{ github.api_url }} with: @@ -1523,7 +1525,7 @@ jobs: uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 env: GH_AW_AGENT_OUTPUT: ${{ steps.setup-agent-output-env.outputs.GH_AW_AGENT_OUTPUT }} - GH_AW_ALLOWED_DOMAINS: "*.buildkite.com,*.githubusercontent.com,anthropic.com,api.anthropic.com,api.github.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,buildkite.com,cdn.playwright.dev,chatgpt.com,ci-stats.kibana.dev,codeload.github.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,elastic.litellm-prod.ai,files.pythonhosted.org,ghcr.io,github-cloud.githubusercontent.com,github-cloud.s3.amazonaws.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,lfs.github.com,objects.githubusercontent.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,playwright.download.prss.microsoft.com,ppa.launchpad.net,pypi.org,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,sentry.io,statsig.anthropic.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" + GH_AW_ALLOWED_DOMAINS: "*.buildkite.com,*.githubusercontent.com,anthropic.com,api.anthropic.com,api.github.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,buildkite.com,buildkiteartifacts.com,cdn.playwright.dev,chatgpt.com,ci-stats.kibana.dev,codeload.github.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,elastic.litellm-prod.ai,files.pythonhosted.org,ghcr.io,github-cloud.githubusercontent.com,github-cloud.s3.amazonaws.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,lfs.github.com,objects.githubusercontent.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,playwright.download.prss.microsoft.com,ppa.launchpad.net,pypi.org,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,sentry.io,statsig.anthropic.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" GITHUB_SERVER_URL: ${{ github.server_url }} GITHUB_API_URL: ${{ github.api_url }} GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG: "{\"add_comment\":{\"hide_older_comments\":true,\"max\":1,\"target\":\"${{ github.event.issue.number || github.event.inputs.issue_number }}\"},\"add_labels\":{\"allowed\":[\"ai:auto-flaky-fix\"],\"max\":1,\"target\":\"${{ github.event.issue.number || github.event.inputs.issue_number }}\"},\"create_report_incomplete_issue\":{},\"missing_data\":{},\"missing_tool\":{},\"noop\":{\"max\":1,\"report-as-issue\":\"false\"},\"report_incomplete\":{}}" diff --git a/.github/workflows/failed-test-investigator.md b/.github/workflows/failed-test-investigator.md index 46abc3e1b2e53..452193d18e632 100644 --- a/.github/workflows/failed-test-investigator.md +++ b/.github/workflows/failed-test-investigator.md @@ -27,6 +27,8 @@ concurrency: env: ISSUE_NUMBER: &issue_number ${{ github.event.issue.number || github.event.inputs.issue_number }} + # Lets the agent omit `-o elastic` on every `bk` invocation (see https://buildkite.com/docs/pipelines/configure/environment-variables) + BUILDKITE_ORGANIZATION_SLUG: elastic engine: id: claude @@ -53,6 +55,7 @@ network: - defaults - buildkite.com - '*.buildkite.com' + - buildkiteartifacts.com - ci-stats.kibana.dev - github.com - api.github.com @@ -111,7 +114,7 @@ Investigate a failed-test issue, classify the failure, and propose a fix when ap ## Investigate -Investigate the test failure(s) using the `flaky-test-investigator` skill. +Investigate the test failure(s) using the `flaky-test-investigator` skill. Use all of the data at your disposal to reach a conclusion (source code, logs, failure screenshots, etc.). Every conclusion must cite specific evidence. Do not guess. @@ -161,9 +164,9 @@ No other side-effects beyond posting the comment and updating the label. ## Comment format -Post exactly one comment on the issue. Keep it concise, actionable, and prioritize the most critical findings at the very top. Adapt the sections below to best fit the specific failure. +Post exactly one comment on the issue. Keep it concise, actionable, and prioritize the most critical findings at the very top. Adapt the sections below to best fit the specific failure. **Use `####` for all subsections** (e.g., `#### Proposed Fix`, `#### Root Cause`). -Do not create standalone sections for "what the test does" "evidence," "where the test ran," or "failure screenshot". Integrate these details seamlessly into the sections below if it adds value. +Do not create standalone sections for "what the test does" "evidence," "where the test ran," or "failure screenshot". Integrate these details seamlessly into the sections below if they add value. Do not also mention why the `ai:auto-flaky-fix` isn't added. ### 1. The TL;DR (Required) @@ -199,4 +202,4 @@ Include the following only if they provide high-value, actionable signal: - **Ruled out:** a brief note on alternative hypotheses that were investigated and dismissed. - **Verification:** specific steps to reproduce the failure or confirm the fix. -- **Open questions:** unresolved design or environmental issues blocking a definitive fix. +- **Open questions:** unresolved design or environmental issues blocking a definitive fix ("a screenshot would have helped troubleshoot this" is a valid open question). From 1e13cefb99858e9d87e3b48c5d09d2cd599a006e Mon Sep 17 00:00:00 2001 From: Cesare de Cal Date: Fri, 29 May 2026 17:53:29 +0200 Subject: [PATCH 5/6] recompile failed-test-investigator lock --- .../failed-test-investigator.lock.yml | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/.github/workflows/failed-test-investigator.lock.yml b/.github/workflows/failed-test-investigator.lock.yml index 952ec5b7014fa..158e98267bc66 100644 --- a/.github/workflows/failed-test-investigator.lock.yml +++ b/.github/workflows/failed-test-investigator.lock.yml @@ -1,4 +1,4 @@ -# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"7560f1fdf4c0cc9d8a374dc7a8f4ece8a8eb3af0b090f662e0844532317b7baa","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} +# gh-aw-metadata: {"schema_version":"v3","frontmatter_hash":"72a4305af19891df8b07f5956439b97a8a17d4e457af82613a6c70afac16c99d","compiler_version":"v0.74.4","agent_id":"claude","agent_model":"opus"} # gh-aw-manifest: {"version":1,"secrets":["GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GITHUB_TOKEN","LITELLM_API_KEY","OPS_BUILDKITE_TOKEN"],"actions":[{"repo":"actions/checkout","sha":"de0fac2e4500dabe0009e67214ff5f5447ce83dd","version":"v6.0.2"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"373c709c69115d41ff229c7e5df9f8788daa9553","version":"v9"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/setup-node","sha":"48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e","version":"v6.4.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"},{"repo":"github/gh-aw-actions/setup","sha":"d3abfe96a194bce3a523ed2093ddedd5704cdf62","version":"v0.74.4"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.25.46"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.25.46"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.9","digest":"sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388","pinned_image":"ghcr.io/github/gh-aw-mcpg:v0.3.9@sha256:64828b42a4482f58fab16509d7f8f495a6d97c972a98a68aff20543531ac0388"},{"image":"ghcr.io/github/github-mcp-server:v1.0.4"},{"image":"node:lts-alpine","digest":"sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f","pinned_image":"node:lts-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f"}]} # ___ _ _ # / _ \ | | (_) @@ -224,20 +224,20 @@ jobs: run: | bash "${RUNNER_TEMP}/gh-aw/actions/create_prompt_first.sh" { - cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' + cat << 'GH_AW_PROMPT_0a58fb2a045bcf35_EOF' - GH_AW_PROMPT_78b5446074d0dc57_EOF + GH_AW_PROMPT_0a58fb2a045bcf35_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/xpia.md" cat "${RUNNER_TEMP}/gh-aw/prompts/temp_folder_prompt.md" cat "${RUNNER_TEMP}/gh-aw/prompts/markdown.md" cat "${RUNNER_TEMP}/gh-aw/prompts/safe_outputs_prompt.md" - cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' + cat << 'GH_AW_PROMPT_0a58fb2a045bcf35_EOF' Tools: add_comment, add_labels, missing_tool, missing_data, noop - GH_AW_PROMPT_78b5446074d0dc57_EOF + GH_AW_PROMPT_0a58fb2a045bcf35_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/mcp_cli_tools_prompt.md" - cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' + cat << 'GH_AW_PROMPT_0a58fb2a045bcf35_EOF' The following GitHub context information is available for this workflow: {{#if github.actor}} @@ -266,12 +266,12 @@ jobs: {{/if}} - GH_AW_PROMPT_78b5446074d0dc57_EOF + GH_AW_PROMPT_0a58fb2a045bcf35_EOF cat "${RUNNER_TEMP}/gh-aw/prompts/github_mcp_tools_with_safeoutputs_prompt.md" - cat << 'GH_AW_PROMPT_78b5446074d0dc57_EOF' + cat << 'GH_AW_PROMPT_0a58fb2a045bcf35_EOF' {{#runtime-import .github/workflows/failed-test-investigator.md}} - GH_AW_PROMPT_78b5446074d0dc57_EOF + GH_AW_PROMPT_0a58fb2a045bcf35_EOF } > "$GH_AW_PROMPT" - name: Interpolate variables and render templates uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0 @@ -505,9 +505,9 @@ jobs: mkdir -p "${RUNNER_TEMP}/gh-aw/safeoutputs" mkdir -p /tmp/gh-aw/safeoutputs mkdir -p /tmp/gh-aw/mcp-logs/safeoutputs - cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_f5fcdc760153c6b8_EOF' + cat > "${RUNNER_TEMP}/gh-aw/safeoutputs/config.json" << 'GH_AW_SAFE_OUTPUTS_CONFIG_06c7d1cf96454c8f_EOF' {"add_comment":{"hide_older_comments":true,"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"add_labels":{"allowed":["ai:auto-flaky-fix"],"max":1,"target":"${{ github.event.issue.number || github.event.inputs.issue_number }}"},"create_report_incomplete_issue":{},"missing_data":{},"missing_tool":{},"noop":{"max":1,"report-as-issue":"false"},"report_incomplete":{}} - GH_AW_SAFE_OUTPUTS_CONFIG_f5fcdc760153c6b8_EOF + GH_AW_SAFE_OUTPUTS_CONFIG_06c7d1cf96454c8f_EOF - name: Generate Safe Outputs Tools env: GH_AW_TOOLS_META_JSON: | @@ -720,7 +720,7 @@ jobs: export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host --add-host host.docker.internal:127.0.0.1 --user '"${MCP_GATEWAY_UID}"':'"${MCP_GATEWAY_GID}"' --group-add '"${DOCKER_SOCK_GID}"' -v '"${DOCKER_SOCK_PATH}"':/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DOCKER_HOST=unix:///var/run/docker.sock -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:v0.3.9' GH_AW_NODE=$(which node 2>/dev/null || command -v node 2>/dev/null || echo node) - cat << GH_AW_MCP_CONFIG_4579b1254a53c9d3_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" + cat << GH_AW_MCP_CONFIG_0fbddf2536aa0e14_EOF | "$GH_AW_NODE" "${RUNNER_TEMP}/gh-aw/actions/start_mcp_gateway.cjs" { "mcpServers": { "github": { @@ -760,7 +760,7 @@ jobs: "payloadDir": "${MCP_GATEWAY_PAYLOAD_DIR}" } } - GH_AW_MCP_CONFIG_4579b1254a53c9d3_EOF + GH_AW_MCP_CONFIG_0fbddf2536aa0e14_EOF - name: Mount MCP servers as CLIs id: mount-mcp-clis continue-on-error: true From 1de53498f1e01a60575ed95c4f2e2e0d389acc98 Mon Sep 17 00:00:00 2001 From: Cesare de Cal Date: Mon, 1 Jun 2026 09:38:39 +0200 Subject: [PATCH 6/6] add `--json` to `bk artifacts list` example in SKILL.md --- .agents/skills/flaky-test-investigator/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.agents/skills/flaky-test-investigator/SKILL.md b/.agents/skills/flaky-test-investigator/SKILL.md index 501febdb377e1..695ce3e49893c 100644 --- a/.agents/skills/flaky-test-investigator/SKILL.md +++ b/.agents/skills/flaky-test-investigator/SKILL.md @@ -53,7 +53,7 @@ If artifacts are not available (expired, not uploaded, no `read_artifacts` token ### List failure artifacts -`bk artifacts list -p --job-uuid ` returns a JSON listing of every artifact uploaded for the failing job. Pass `--job-uuid ` for the failed attempt (without it, `bk` only returns the latest attempt and hides retried failures). If a build retried to green, failure artifacts only live on the failed job's listing; don't conclude "no screenshot" until you've scoped to the right job UUID. +`bk artifacts list -p --job-uuid --json` returns a JSON listing of every artifact uploaded for the failing job. Pass `--job-uuid ` for the failed attempt (without it, `bk` only returns the latest attempt and hides retried failures). If a build retried to green, failure artifacts only live on the failed job's listing; don't conclude "no screenshot" until you've scoped to the right job UUID. ### Understand the scope