Skip to content

ci: phase 1 merge-queue CI redesign#2877

Open
henrypark133 wants to merge 8 commits intostagingfrom
ci-merge-queue
Open

ci: phase 1 merge-queue CI redesign#2877
henrypark133 wants to merge 8 commits intostagingfrom
ci-merge-queue

Conversation

@henrypark133
Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 commented Apr 23, 2026

Summary

This PR implements Phase 1 of #2719 by redesigning CI on staging to support the future main merge-queue model.

This PR only changes CI behavior and supporting workflow/script wiring. It does not change the default branch, branch protections, staging-promotion flow, or merge-policy configuration yet.

What Changed

test.yml is now merge-queue aware, but direct PR triggering is limited to main

.github/workflows/test.yml now:

  • adds merge_group support for the future queue on main
  • supports workflow_call for the staging batch path
  • keeps one protected roll-up context: Run Tests
  • limits direct pull_request triggering to main

This means:

  • staging PRs keep the existing PR check surface for now
  • the new Run Tests behavior is exercised through workflow_call, future main PRs, and merge_group
  • the main-branch cutover remains a later phase

Code style supports merge queue without changing the required check name

.github/workflows/code_style.yml now:

  • adds merge_group
  • keeps the existing roll-up job name stable: Code Style (fmt + gateway-js-syntax + clippy + deny)
  • uses a lighter clippy matrix on PRs and fuller non-PR coverage on push / merge-group paths

Replay and E2E are reusable workflow dependencies

  • .github/workflows/replay-gate.yml is reusable so Run Tests can depend on replay safely
  • .github/workflows/e2e.yml is reusable with mode: smoke|full
  • smoke and full web E2E are now explicit workflow-call paths instead of overlapping PR triggers
  • the standalone PR trigger was removed from e2e.yml to avoid duplicate smoke + full PR runs

Version-bump skip behavior remains safe in merge queue

scripts/check-version-bumps.sh now supports ALLOW_SKIP_VERSION_CHECK

test.yml resolves skip-version-check labels for merge_group, but only enables skip behavior when the queue candidate can be attributed to exactly one PR. Multi-PR queue candidates fail closed so skip labels cannot leak across queued PRs.

Coverage Added In This PR

This branch now also closes a few deterministic coverage gaps that were already present in the repo but were not exercised by normal CI:

  • Slack Channel Tests
    • adds path-gated CI coverage for tests/slack_auth_integration.rs
  • broader Telegram integration coverage
    • the heavy runtime lane now runs the full telegram_auth_integration target instead of only one exact regression case
  • expanded reusable full Python E2E coverage
    • adds a web-regressions group:
      • test_auth_no_duplicate_response.py
      • test_message_persistence.py
      • test_pending_user_messages.py
    • adds a v2-engine group:
      • test_v2_engine_auth_flow.py
      • test_v2_engine_approval_flow.py
      • test_v2_engine_tool_lifecycle.py
      • test_v2_kernel_auth_preflight.py
      • test_v2_kernel_auth_gateway_flow.py
      • test_v2_thread_visibility.py

Parallelization Changes

To avoid turning the broader deterministic coverage into one long serial lane:

  • Python E2E remains matrix-parallel by group
  • the heavy runtime lane was split so it no longer serializes unrelated suites
  • e2e_thread_scheduling stays in Heavy Integration Tests
  • telegram_auth_integration now runs in a separate two-way sharded job using cargo nextest --partition hash:1/2 and hash:2/2

Important Warnings / Likely Failure Modes

These changes are expected to surface real failures that were previously untested in normal CI.

Most likely breakage points:

  • Slack channel regressions
    • tests/slack_auth_integration.rs was not previously wired into CI
    • failures here likely indicate real Slack webhook / auth / thread-handling regressions, not just CI churn
  • Telegram integration regressions beyond the old exact test
    • the full telegram_auth_integration target has much broader coverage than the previous single regression case
    • failures may expose attachment handling, pairing, polling, or metadata regressions that were previously hidden
  • v2 engine browser regressions
    • the added v2-engine E2E group exercises auth, approval, tool lifecycle, and thread-visibility flows that were not part of the normal reusable E2E suite
    • failures here likely reflect real ENGINE_V2=true behavior gaps
  • web history / optimistic rendering regressions
    • the added web-regressions group covers history persistence, pending user messages, and duplicate auth-response suppression
    • failures here likely indicate real gateway/frontend regressions that coverage-only CI was previously catching later
  • nextest partitioning / shard-specific issues
    • the Telegram integration suite now relies on cargo-nextest in CI
    • if this fails, check nextest installation first, then shard behavior

Operational caveat:

  • because this PR still targets staging, the new direct Run Tests PR path does not execute on this PR itself
  • the new Run Tests graph is primarily being prepared for:
    • workflow_call from staging batch CI
    • future main PRs
    • merge_group

Validation

Validated locally:

  • workflow YAML parses cleanly
  • git diff --check passes
  • the replay workflow action pin was corrected after live CI exposed a broken SHA
  • merge-group skip-label logic was exercised with a local stubbed-gh simulation:
    • single-PR queue candidate => skip can remain enabled
    • multi-PR queue candidate => skip is disabled

Not yet validated live on GitHub:

  • actual merge_group payload behavior
  • sacrificial queue execution on main
  • the full new Run Tests graph on a real main PR
  • the new Slack / Telegram / expanded E2E lanes all succeeding together in repo CI

Follow-Up / Phase 2

After this lands on staging:

  1. validate the new CI contract with sacrificial PRs and the staging batch flow
  2. create one final manual staging -> main promotion PR carrying the validated CI changes and remaining staged work
  3. after that promotion lands on main:
    • update/remove the Staging ruleset first so ~DEFAULT_BRANCH does not attach to main
    • switch the default branch to main
    • enable merge queue on main
    • bulk-retarget active non-bot PRs to main
    • close staging-promote/* PRs
    • remove staging-only promotion infrastructure

Closes the Phase 1 implementation work for #2719.

@github-actions github-actions Bot added size: XL 500+ changed lines scope: ci CI/CD workflows risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ALLOW_SKIP_VERSION_CHECK environment variable to conditionally control the version check skip mechanism in scripts/check-version-bumps.sh. A bug was identified in the git log command where the use of the symmetric difference operator (...) could cause version checks to be skipped incorrectly if the base branch contains the skip flag; it is recommended to use the double-dot syntax (..) instead to correctly isolate PR commits.

Comment thread scripts/check-version-bumps.sh Outdated
@henrypark133 henrypark133 marked this pull request as ready for review April 23, 2026 02:27
Copilot AI review requested due to automatic review settings April 23, 2026 02:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Phase 1 CI workflow redesign to prepare the repo for a future main-based GitHub merge queue, while keeping required check names stable and enabling reusable workflow composition for staging batching.

Changes:

  • Make test.yml merge-queue aware (merge_group) and reusable (workflow_call), with path/risk-based gating and a single protected roll-up check (Run Tests).
  • Update code_style.yml to run on merge_group while preserving the existing roll-up job name and adjusting clippy matrix behavior by event type.
  • Convert replay-gate.yml and e2e.yml into reusable workflows, including E2E mode: smoke|full selection and new CI wiring for expanded coverage.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/check-version-bumps.sh Adds ALLOW_SKIP_VERSION_CHECK gating for skip behavior to support merge-queue-safe label handling.
.github/workflows/test.yml Adds workflow_call + merge_group, expands change detection outputs, refactors job graph, and introduces new CI lanes (Slack + sharded Telegram + reusable dependencies).
.github/workflows/replay-gate.yml Makes replay gate reusable via workflow_call and updates action pins/permissions.
.github/workflows/e2e.yml Makes E2E reusable and adds mode-driven matrix configuration; removes overlapping PR trigger path.
.github/workflows/code_style.yml Adds merge_group support, adjusts change detection, and updates clippy/no-panics gating while keeping the roll-up job name stable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/test.yml Outdated
Comment thread .github/workflows/test.yml Outdated
Comment thread .github/workflows/replay-gate.yml
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7bcff52b9f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/test.yml
Copilot AI review requested due to automatic review settings April 23, 2026 02:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/check-version-bumps.sh
Comment thread .github/workflows/test.yml
Comment thread .github/workflows/test.yml
Comment thread .github/workflows/test.yml Outdated
Comment thread .github/workflows/test.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: ci CI/CD workflows size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants