-
Notifications
You must be signed in to change notification settings - Fork 280
feat: add azd CLI evaluation and testing framework #7202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
spboyer
wants to merge
8
commits into
main
Choose a base branch
from
feat/eval-framework
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+7,314
−0
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
bce8473
feat: add azd CLI evaluation and testing framework
spboyer a3e7f2b
docs: add authentication and secrets section to eval README
spboyer d134f20
docs: add comprehensive how-to guides for creating evals, graders, an…
spboyer 26fba65
fix: resolve CI failures in eval unit tests and cspell
spboyer a4736a4
fix: stop command-sequencing tests from overriding AZD_CONFIG_DIR
spboyer c5113c4
docs: expand auth section with subscription config and no-popup guara…
spboyer 839be3c
refactor: address review feedback from @jongio and Copilot
spboyer 24d2af3
fix: address round 2 review feedback from @jongio
spboyer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| name: "Eval: E2E Lifecycle" | ||
|
|
||
| on: | ||
| schedule: | ||
| # 6am UTC Monday | ||
| - cron: "0 6 * * 1" | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| id-token: write | ||
| contents: read | ||
|
|
||
| jobs: | ||
| e2e-lifecycle: | ||
| runs-on: ubuntu-latest | ||
| env: | ||
| AZURE_ENV_NAME: eval-e2e-${{ github.run_id }} | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-go@v5 | ||
| with: | ||
| go-version-file: "cli/azd/go.mod" | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Build azd | ||
| working-directory: cli/azd | ||
| run: go build -o ./azd . | ||
|
|
||
| - name: Add azd to PATH | ||
| run: echo "${{ github.workspace }}/cli/azd" >> "$GITHUB_PATH" | ||
|
|
||
| - name: Azure Login (OIDC) | ||
| uses: azure/login@v2 | ||
| with: | ||
| client-id: ${{ secrets.AZURE_CLIENT_ID }} | ||
| tenant-id: ${{ secrets.AZURE_TENANT_ID }} | ||
| subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} | ||
|
|
||
| - name: Install Waza CLI | ||
| run: npm install -g waza | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Run lifecycle evaluations | ||
| working-directory: cli/azd/test/eval | ||
| continue-on-error: true | ||
| env: | ||
| COPILOT_CLI_TOKEN: ${{ secrets.COPILOT_CLI_TOKEN }} | ||
| AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }} | ||
| run: waza run --executor copilot-sdk --filter "tasks/lifecycle/" | ||
|
|
||
| - name: Upload E2E results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: e2e-results-${{ github.run_id }} | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 30 | ||
|
|
||
| - name: Cleanup Azure resources | ||
| if: always() | ||
| working-directory: cli/azd/test/eval | ||
| run: | | ||
| cd /tmp | ||
| azd down --purge --force --no-prompt 2>/dev/null || true | ||
| env: | ||
| AZURE_ENV_NAME: eval-e2e-${{ github.run_id }} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| name: "Eval: Weekly Report" | ||
|
|
||
| on: | ||
| schedule: | ||
| # 8am UTC Monday, after E2E completes | ||
| - cron: "0 8 * * 1" | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: read | ||
| actions: read | ||
|
|
||
| jobs: | ||
| generate-report: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Download recent Waza artifacts | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| mkdir -p cli/azd/test/eval/reports/waza | ||
| RUN_ID=$(gh api repos/${{ github.repository }}/actions/workflows/eval-waza.yml/runs \ | ||
| --jq '.workflow_runs | map(select(.conclusion == "success")) | .[0].id // empty' 2>/dev/null) | ||
| if [ -n "$RUN_ID" ]; then | ||
| gh run download "$RUN_ID" -D cli/azd/test/eval/reports/waza 2>/dev/null || echo "No waza artifacts found" | ||
| else | ||
| echo "No successful waza runs found, skipping" | ||
| fi | ||
|
|
||
| - name: Download recent E2E artifacts | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| mkdir -p cli/azd/test/eval/reports/e2e | ||
| RUN_ID=$(gh api repos/${{ github.repository }}/actions/workflows/eval-e2e.yml/runs \ | ||
| --jq '.workflow_runs | map(select(.conclusion == "success")) | .[0].id // empty' 2>/dev/null) | ||
| if [ -n "$RUN_ID" ]; then | ||
| gh run download "$RUN_ID" -D cli/azd/test/eval/reports/e2e 2>/dev/null || echo "No e2e artifacts found" | ||
| else | ||
| echo "No successful e2e runs found, skipping" | ||
| fi | ||
|
|
||
| # TODO: Implement report generation script (scripts/generate-report.ts) | ||
| # that diffs Waza result JSON files and produces regression-issues.json. | ||
| # Once implemented, add a step to create GitHub issues from regressions. | ||
|
|
||
| - name: Upload aggregated artifacts | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: eval-weekly-report-${{ github.run_id }} | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 90 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: "Eval: Unit Tests" | ||
|
|
||
| on: | ||
| pull_request: | ||
| paths: | ||
| - "cli/azd/test/eval/**" | ||
| - "cli/azd/internal/mcp/**" | ||
| - "cli/azd/cmd/mcp.go" | ||
| - "cli/azd/cmd/root.go" | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| unit-tests: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-go@v5 | ||
| with: | ||
| go-version-file: "cli/azd/go.mod" | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Build azd | ||
| working-directory: cli/azd | ||
| run: go build -o ./azd . | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Run unit tests | ||
| working-directory: cli/azd/test/eval | ||
| run: npm run test:unit -- --ci | ||
|
|
||
spboyer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - name: Validate Waza task YAML | ||
| working-directory: cli/azd/test/eval | ||
| run: npm run waza:validate | ||
| continue-on-error: true | ||
|
|
||
| - name: Upload test results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: eval-unit-results | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 30 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| name: "Eval: Waza Runs" | ||
|
|
||
| on: | ||
| schedule: | ||
| # 5am, 12pm, 8pm UTC, Tuesday through Saturday | ||
| - cron: "0 5,12,20 * * 2-6" | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| waza-run: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - uses: actions/setup-go@v5 | ||
| with: | ||
| go-version-file: "cli/azd/go.mod" | ||
|
|
||
| - uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: "22" | ||
|
|
||
| - name: Build azd | ||
| working-directory: cli/azd | ||
| run: go build -o ./azd . | ||
|
|
||
| - name: Add azd to PATH | ||
| run: echo "${{ github.workspace }}/cli/azd" >> "$GITHUB_PATH" | ||
|
|
||
| - name: Install Waza CLI | ||
| run: npm install -g waza | ||
|
|
||
| - name: Install eval dependencies | ||
| working-directory: cli/azd/test/eval | ||
| run: npm ci | ||
|
|
||
| - name: Run Waza evaluations | ||
| working-directory: cli/azd/test/eval | ||
| continue-on-error: true | ||
| env: | ||
| COPILOT_CLI_TOKEN: ${{ secrets.COPILOT_CLI_TOKEN }} | ||
| run: waza run --executor copilot-sdk | ||
|
|
||
| - name: Upload Waza results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: waza-results-${{ github.run_id }} | ||
| path: cli/azd/test/eval/reports/ | ||
| retention-days: 30 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| node_modules/ | ||
| dist/ | ||
| reports/*.json | ||
| reports/*.md | ||
| reports/junit.xml | ||
| !reports/.gitkeep |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.