Skip to content

Commit 11b1b12

Browse files
authored
feature(#1101): idempotent Foundry portal-tracking agent ensure (all 52) (#1101)
Adds scripts/ops/ensure_foundry_tracking_agents.py: an idempotent upsert script that scans apps/<svc>/agent.yaml for direct-model tracking agents, composes published instructions (prompt + hardening block, byte-identical to holiday_peak_lib.agents.prompt_loader), and POSTs create-or-update against the Foundry assistants API for both -fast and -rich roles. Wires the script into deploy-azd-core as a new ensure-foundry-agents job that runs after provision/deploy-foundry-models and self-heals on every deploy. Idempotent: re-running produces noop=52 errors=0. Closes the gap exposed when the truth-* and search-enrichment-agent services were added: those 10 agents had never been registered in Foundry. Also realigns the 42 stale agents (created 2026-02-26 with phi-4-mini-instruct and empty instructions) onto gpt-5-nano/gpt-5 with populated instructions, matching the workflow env. Validation: pre-push gate (8 checks, 705+ tests, 206s) PASSED. New unit tests (11) green. Live dev Foundry reconcile applied -- now shows 52 agents (26 gpt-5-nano + 26 gpt-5); re-run yields noop=52.
1 parent 9b13d57 commit 11b1b12

4 files changed

Lines changed: 665 additions & 0 deletions

File tree

.github/workflows/deploy-azd.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1753,6 +1753,49 @@ jobs:
17531753
AZURE_RESOURCE_GROUP: ${{ inputs.projectName }}-${{ inputs.environment }}-rg
17541754
AI_SERVICES_NAME: ${{ needs.provision.outputs.AI_SERVICES_NAME }}
17551755

1756+
ensure-foundry-agents:
1757+
runs-on: ubuntu-latest
1758+
needs:
1759+
- provision
1760+
- deploy-foundry-models
1761+
if: ${{ always() && !inputs.uiOnly && !inputs.skipProvision && needs.provision.result == 'success' && (needs.deploy-foundry-models.result == 'success' || needs.deploy-foundry-models.result == 'skipped') && needs.provision.outputs.PROJECT_ENDPOINT != '' }}
1762+
environment: ${{ inputs.githubEnvironment }}
1763+
env:
1764+
AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
1765+
AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
1766+
AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
1767+
PROJECT_ENDPOINT: ${{ needs.provision.outputs.PROJECT_ENDPOINT }}
1768+
MODEL_DEPLOYMENT_NAME_FAST: gpt-5-nano
1769+
MODEL_DEPLOYMENT_NAME_RICH: gpt-5
1770+
steps:
1771+
- uses: actions/checkout@v4
1772+
with:
1773+
ref: ${{ env.DEPLOY_SOURCE_CHECKOUT_REF }}
1774+
1775+
- name: Set up Python
1776+
uses: actions/setup-python@v5
1777+
with:
1778+
python-version: "3.12"
1779+
1780+
- name: Install ensure script dependencies
1781+
run: |
1782+
python -m pip install --upgrade pip
1783+
python -m pip install "azure-identity>=1.17" "PyYAML>=6.0"
1784+
1785+
- name: Azure login (OIDC)
1786+
uses: azure/login@v2
1787+
with:
1788+
client-id: ${{ env.AZURE_CLIENT_ID }}
1789+
tenant-id: ${{ env.AZURE_TENANT_ID }}
1790+
subscription-id: ${{ env.AZURE_SUBSCRIPTION_ID }}
1791+
1792+
- name: Ensure Foundry portal-tracking agents (idempotent)
1793+
run: |
1794+
python scripts/ops/ensure_foundry_tracking_agents.py \
1795+
--project-endpoint "$PROJECT_ENDPOINT" \
1796+
--fast-model "$MODEL_DEPLOYMENT_NAME_FAST" \
1797+
--rich-model "$MODEL_DEPLOYMENT_NAME_RICH"
1798+
17561799
deploy-crud:
17571800
runs-on: ubuntu-latest
17581801
needs:
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Workflow permission-cap linter (`scripts/ci/lint_workflow_permissions.py`)
2+
3+
## What it enforces
4+
GitHub Actions rule: in a `workflow_call` chain, callee permissions can only be **maintained or reduced** by the caller. Violations → `startup_failure` before runner allocation.
5+
6+
## How effective permissions are computed (CRITICAL)
7+
For each callee job, effective permissions = **per-job `permissions:` if present, else workflow-level `permissions:`**. The linter MUST seed the required-set from callee workflow-level perms; per-job maps override per-key. This was missing pre-PR #1100 and caused a false-negative on `deploy-azd-truth.yml`.
8+
9+
## Caller-side fallback
10+
Same semantics: caller workflow-level `permissions:` are the fallback for jobs that omit `permissions:`. Already implemented.
11+
12+
## Failure mode if you miss the fallback
13+
- Linter passes locally + in PR CI
14+
- GitHub orchestrator rejects with `startup_failure` (7s run, no logs)
15+
- Looks identical to a transient queue issue
16+
- PR #1097 → 2 days undetected → issue #1099
17+
18+
## Tests
19+
`scripts/ci/tests/test_lint_workflow_permissions.py` — 5 cases:
20+
1. Caller grants required perms → pass
21+
2. Caller missing `pull-requests: write` (PR #1097 regression) → flag
22+
3. Caller `contents: read` vs callee per-job `contents: write` → flag
23+
4. Callee with no per-job perms → pass
24+
5. **Callee workflow-level `contents: write` (no per-job override), caller `contents: read` → flag** ← prevents recurrence of deploy-azd-truth.yml class
25+
26+
## Operational notes
27+
- Required check `Permission-cap lint (cross-file nested-workflow rule)` in `.github/workflows/lint-actions.yml`
28+
- Emits `::error file=...::` GitHub annotations, exit 1 on violation
29+
- `actionlint` used with `-shellcheck=` to disable shellcheck (catches unrelated pre-existing SC2034/SC2129/SC2153)

0 commit comments

Comments
 (0)