-
Notifications
You must be signed in to change notification settings - Fork 2
feat: implement LLM cost savings in CI #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add concurrency control to kill duplicate eval runs - Add PR triggers with keyword detection for [llm], [agent], [eval] - Trigger on ready_for_review and reopened PR events - Add /run-evals slash command - Update welcome message to explain eval triggers - Use single pr-welcome.md for all contributors Fixes #124 Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Welcome to the Airbyte Connector Builder MCP!Thank you for your contribution! We're excited to have you in the Airbyte community. Testing This Branch via MCPTo test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration: {
"mcpServers": {
"connector-builder-mcp-dev": {
"command": "uvx",
"args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1760122951-llm-cost-savings-ci", "connector-builder-mcp"]
}
}
} Testing This Branch via CLIYou can test this version of the MCP Server using the following CLI snippet: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1760122951-llm-cost-savings-ci#egg=airbyte-connector-builder-mcp' --help PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
AI Builder EvaluationsAI builder evaluations run automatically under the following conditions:
Evaluations also run on a schedule (Mon/Wed/Fri at midnight UTC) and can be manually triggered via workflow dispatch. Helpful ResourcesIf you have any questions, feel free to ask in the PR comments or join our Slack community. |
📝 WalkthroughWalkthroughConsolidates PR welcome messaging into a single Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev as Developer
participant GH as GitHub
participant EvalsWF as "CI Run Evals\n(.github/workflows/ci-run-evals.yml)"
participant Runner as Runner (ubuntu-latest)
participant UV as uv/poe
Dev->>GH: Push to main or open PR (ready_for_review, reopened, synchronize) or commit with keywords
GH->>EvalsWF: Trigger workflow (path/title gating)
note right of EvalsWF #E6F0FF: Concurrency group cancels prior runs
EvalsWF->>Runner: Start job (checkout, setup uv, setup Python)
Runner->>UV: Install deps
Runner->>UV: Run evals (poe evals run --connector ...)
UV-->>Runner: Evals result (<=30m timeout)
Runner-->>GH: Job status
sequenceDiagram
autonumber
participant GH as GitHub
participant WelcomeWF as "Welcome Message WF\n(.github/workflows/welcome-message.yml)"
participant Template as ".github/pr-welcome.md"
GH->>WelcomeWF: PR opened / synchronized
WelcomeWF->>Template: Load fixed template path
Template-->>WelcomeWF: Template content
WelcomeWF-->>GH: Post unified welcome message
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Comment |
- ready_for_review and reopened should trigger evals unconditionally - Keywords [llm], [agent], [eval] now only filter synchronize events - This matches the requirements in issue #124 Co-Authored-By: AJ Steers <[email protected]>
…rkflow - Split run-evals job from ci-tests.yml into dedicated ci-run-evals.yml - Add concurrency control to prevent duplicate eval runs - Implement keyword-based triggers for synchronize events ([llm], [agent], [eval]) - Make ready_for_review and reopened triggers unconditional - Add OPENAI_AGENTS_LOG: info for reduced debug noise - Rename run-evals.yml to run-standard-evals.yml for clarity - Delete redundant build-connector-test.yml workflow - Update pr-welcome.md to advertise /build-connector for on-demand execution Fixes #124 Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
.github/workflows/run-standard-evals.yml (3)
23-37
: Set explicit GITHUB_TOKEN permissions for commentingIf the action posts PR comments, default token may be read-only for forks. Add minimal perms:
name: Run Standard Evals + +permissions: + contents: read + issues: write + pull-requests: writeConfirm this workflow needs to write PR comments; if not, drop issues/pull-requests scopes.
14-22
: Consider adding concurrency to prevent overlapping scheduled/manual runsTo cancel prior in‑progress runs of this workflow on same ref:
on: workflow_dispatch: @@ schedule: - cron: '0 0 * * 1,3,5' + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true
25-36
: Action tags verified; consider SHA pinningAll referenced action versions exist (actions/[email protected], actions/checkout@v5, aaronsteers/poe-command-processor@v1). For supply-chain stability, consider pinning to specific commit SHAs.
.github/workflows/ci-run-evals.yml (1)
23-56
: Add explicit permissions if evals post PR commentsIf the eval process writes to PRs/issues, set minimal token scopes.
name: CI Run Evals @@ on: @@ concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true + +permissions: + contents: read + issues: write + pull-requests: writeIf no commenting occurs, keep contents: read only.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/pr-welcome.md
(1 hunks).github/workflows/build-connector-test.yml
(0 hunks).github/workflows/ci-run-evals.yml
(1 hunks).github/workflows/ci-tests.yml
(0 hunks).github/workflows/run-standard-evals.yml
(1 hunks)
💤 Files with no reviewable changes (2)
- .github/workflows/ci-tests.yml
- .github/workflows/build-connector-test.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/pr-welcome.md
🔇 Additional comments (5)
.github/workflows/run-standard-evals.yml (1)
1-1
: Rename looks goodConsistent with new CI split.
.github/workflows/ci-run-evals.yml (4)
26-36
: PR gating logic matches requirements (case-sensitive keywords on synchronize)contains() is case-sensitive; only lowercase “[llm]”, “[agent]”, “[eval]” will match. ready_for_review/reopened run regardless of title, as intended.
Confirm lowercase-only tags are the policy (e.g., “[LLM]” will not trigger).
19-22
: Concurrency group aligns with objectivesThis will cancel in-progress runs per branch/PR. Good.
39-47
: Verify Action versions and pin SHAs if necessary
- actions/checkout@v5 and actions/setup-python@v6 are published majors (require Runner v2.327.1+)
- Confirm astral-sh/[email protected] exists or pin to a specific commit SHA for immutability
3-18
: Slash commands handled by separate dispatch workflow
ci-run-evals.yml rightly runs only on push and pull_request; slash commands are picked up by .github/workflows/slash-command-dispatch.yml (on issue_comment created) and routed via aaronsteers/poe-command-processor to run-standard-evals.yml.
Co-Authored-By: AJ Steers <[email protected]>
Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/ci-run-evals.yml (1)
38-38
: Job-level timeout is correctly applied (prior feedback addressed).
timeout-minutes: 30
at the job level is the right place; step-level timeouts are ignored.
🧹 Nitpick comments (5)
.github/workflows/ci-run-evals.yml (5)
13-17
: Add PR ‘edited’ event so title keyword changes can trigger runs.Currently, editing a PR title to add “[llm]”, “[agent]”, or “[eval]” won’t trigger this workflow. Include the edited action.
on: pull_request: types: - ready_for_review - reopened - synchronize + - edited
26-37
: Gate PR runs from forks to avoid secret-related failures; also handle ‘edited’ keyword runs.Forked PRs can’t access repo secrets; this job will fail or waste CI minutes. Add a fork guard and support title-edited keyword runs directly in the job IF.
if: | - github.event_name == 'push' || - (github.event_name == 'pull_request' && ( - github.event.action == 'ready_for_review' || - github.event.action == 'reopened' || - (github.event.action == 'synchronize' && ( - contains(github.event.pull_request.title, '[llm]') || - contains(github.event.pull_request.title, '[agent]') || - contains(github.event.pull_request.title, '[eval]') - )) - )) + github.event_name == 'push' || + ( + github.event_name == 'pull_request' && + !github.event.pull_request.head.repo.fork && + ( + github.event.action == 'ready_for_review' || + github.event.action == 'reopened' || + (github.event.action == 'edited' && ( + contains(github.event.pull_request.title, '[llm]') || + contains(github.event.pull_request.title, '[agent]') || + contains(github.event.pull_request.title, '[eval]') + )) || + (github.event.action == 'synchronize' && ( + contains(github.event.pull_request.title, '[llm]') || + contains(github.event.pull_request.title, '[agent]') || + contains(github.event.pull_request.title, '[eval]') + )) + ) + )Optional (for further savings): also skip drafts on synchronize by adding
&& !github.event.pull_request.draft
to that branch.
23-37
: Harden token permissions (least privilege).Explicitly set minimal permissions for the job.
jobs: run-evals: name: Run Evals (Single Connector) + permissions: + contents: read runs-on: ubuntu-latest
49-49
: Use the lockfile strictly in CI.Add
--frozen
to ensure deps match uv.lock and avoid accidental resolution.- - name: Install dependencies - run: uv sync --all-extras + - name: Install dependencies + run: uv sync --all-extras --frozen
50-57
: Optional guard at step-level if you don’t add the fork check.If you choose not to gate forks at the job IF, at least guard the eval step to skip cleanly without secrets.
- name: Run Evals + if: ${{ secrets.OPENAI_API_KEY != '' }} env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} OPENAI_AGENTS_LOG: info HTTPX_LOG_LEVEL: warning PHOENIX_API_KEY: ${{ secrets.AIRBYTE_CI_TESTING_PHOENIX_API_KEY }} PHOENIX_COLLECTOR_ENDPOINT: ${{ vars.AIRBYTE_CI_TESTING_PHOENIX_COLLECTOR_ENDPOINT }} run: uv run poe evals run --connector source-jsonplaceholder
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ci-run-evals.yml
(1 hunks)
🔇 Additional comments (3)
.github/workflows/ci-run-evals.yml (3)
19-22
: Concurrency group is correctly scoped and cancels duplicates.Using
${{ github.workflow }}-${{ github.ref }}
will cancel prior in-flight runs per workflow and ref (PR/main). Good.
26-36
: Confirm commit-text gating requirement
This workflow currently only matches keywords in the PR title (on synchronize); no commit-message trigger is implemented. If commit-text matching is still required per the linked issue, add a precheck job to read commit messages via the API and gate this job accordingly.
1-3
: Confirm external triggers in other workflows
- run-standard-evals.yml defines the Mon/Wed/Fri
cron
andworkflow_dispatch
.- I didn’t find a
/run-evals
slash command in slash-command-dispatch.yml—please confirm it’s defined elsewhere or add it.
- Fetch commits using GitHub API on synchronize events - Check commit messages for [llm], [agent], [eval] keywords - Skip expensive steps when keywords not found - Keep ready_for_review and reopened events unconditional - Update documentation to reflect commit message checking Co-Authored-By: AJ Steers <[email protected]>
- Simplified to use git log -1 instead of gh api for all commits - Removed GitHub token dependency from check step - Updated step name to reflect checking latest commit only Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/build-connector-command.yml (1)
12-57
: Connector input ignoredWe expose an
inputs.connector
but the command still hardcodessource-jsonplaceholder
, so every run ignores the provided connector name. Wire the input through:- command: poe evals run --connector source-jsonplaceholder + command: poe evals run --connector ${{ inputs.connector }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/pr-welcome.md
(1 hunks).github/workflows/build-connector-command.yml
(3 hunks).github/workflows/ci-run-evals.yml
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-10T17:09:05.957Z
Learnt from: aaronsteers
PR: airbytehq/connector-builder-mcp#116
File: .github/workflows/ci-tests.yml:110-113
Timestamp: 2025-10-10T17:09:05.957Z
Learning: The connector-builder-mcp repository at airbytehq/connector-builder-mcp is a private/internal repository that does not accept external contributions or forks.
Applied to files:
.github/pr-welcome.md
📚 Learning: 2025-10-10T17:09:02.099Z
Learnt from: aaronsteers
PR: airbytehq/connector-builder-mcp#116
File: .github/workflows/ci-tests.yml:110-113
Timestamp: 2025-10-10T17:09:02.099Z
Learning: The connector-builder-mcp repository (airbytehq/connector-builder-mcp) is an internal/private repository that doesn't accept external contributions, so there are no forked PRs to consider in workflow configurations.
Applied to files:
.github/pr-welcome.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Evals (Single Connector)
- GitHub Check: Pytest (Fast)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ci-run-evals.yml
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pytest (Fast)
- Replace hardcoded 'source-jsonplaceholder' with dynamic parameter - Fixes issue where connector input was ignored in workflow dispatch Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/build-connector-command.yml (1)
30-30
: Scope concurrency by connector to avoid canceling parallel buildsCurrent group uses only workflow and ref, so two dispatches on the same branch for different connectors will cancel each other. Include the connector in the group key.
Example:
concurrency: group: ${{ github.workflow }}-${{ github.ref }}-${{ inputs.connector }} cancel-in-progress: true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/build-connector-command.yml
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
.github/workflows/build-connector-command.yml (1)
47-51
: Secrets/vars propagation looks correct; verify no accidental loggingUsing secrets for OPENAI_API_KEY and PHOENIX_API_KEY is good; endpoint via vars is fine if non-sensitive. Ensure the action doesn’t echo these envs at info/debug levels.
Can you confirm aaronsteers/poe-command-processor@v1 does not print env values and that PHOENIX_COLLECTOR_ENDPOINT is not sensitive? If unsure, I can suggest a redaction step or env whitelist for logs.
/autofix
|
feat: implement LLM cost savings in CI
Summary
This PR implements LLM cost savings in CI by refactoring GitHub Actions workflows to add targeted evaluation triggers and reduce unnecessary LLM API usage. The main changes include:
[llm]
,[agent]
, or[eval]
to conditionally run expensive eval steps on PR synchronize eventsOPENAI_AGENTS_LOG: info
)build-connector-test.yml
) and consolidated documentationready_for_review
andreopened
events to ensure proper quality gatesThe core optimization is that for PR synchronize events (new commits), expensive eval steps are only executed when commit messages contain LLM-related keywords, significantly reducing API costs for routine code changes.
Review & Testing Checklist for Human
[llm]
,[agent]
, or[eval]
and verify that eval steps run on synchronize events. Then test with commits without keywords to confirm skipping works.ready_for_review
andreopened
PR events (without keyword requirements).steps.check-commits.outputs.found == 'true'
works correctly and doesn't have edge cases where steps are unexpectedly skipped or run.ci-tests.yml
to the newci-run-evals.yml
doesn't break existing processes or expectations.Notes
git log -1 --pretty=%B
to check only the latest commit message (per feedback)Summary by CodeRabbit
Documentation
Chores
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.