Add continuous agent evaluation engine#974
Conversation
Agent evaluation advisory gatePilot agent evaluations completed in advisory mode. Download the This gate is intentionally non-blocking while baselines are calibrated. |
| evaluation_history.append(result_payload) | ||
| try: | ||
| tracer.record_evaluation(result_payload) | ||
| except (AttributeError, TypeError): |
976c340 to
4b02af8
Compare
Agent evaluation advisory gatePilot agent evaluations completed in advisory mode. Download the This gate is intentionally non-blocking while baselines are calibrated. |
Agent evaluation advisory gatePilot agent evaluations completed in advisory mode. Download the This gate is intentionally non-blocking while baselines are calibrated. |
Agent evaluation advisory gatePilot agent evaluations completed in advisory mode. Download the This gate is intentionally non-blocking while baselines are calibrated. |
Continuous evaluation PR validation statusCurrent branch head: Validated so far:
Deployment validation still pending:
Known PR blocker:
|
Deployment validation updateRe-dispatched after the previous queued replacement run cancelled cleanly.
Conclusion: image build availability is confirmed, and the remaining live validation gap is workflow execution through rendered manifest commit, Flux reconciliation, and post-reconcile Foundry strict runtime validation. |
Agent evaluation advisory gatePilot agent evaluations completed in advisory mode. Download the This gate is intentionally non-blocking while baselines are calibrated. |
…ates
Refactors wait-flux-reconciliation to actively poll until both kustomizations
apply the published manifest revision (post commit-rendered-manifests SHA)
AND any migrated HelmReleases reach Ready=UpgradeSucceeded. Without this,
ensure-foundry-agents and validate-agc-readiness observed the previous main
revision while preview manifests sat unapplied due to dependency cascades
(holiday-peak-gitops-holiday-peak-agents waits on holiday-peak-gitops-holiday-peak-crud).
Changes:
- commit-rendered-manifests: expose published_sha output (post-commit HEAD SHA)
- wait-flux-reconciliation:
* Force-reconcile GitRepository source until artifact contains published_sha
* Active poll for both kustomizations to lastAppliedRevision==published_sha
AND Ready=True (with reconcile triggered in dependency order: crud first,
agents after)
* For each migrated HelmRelease (changed agent service), force reconcile
the HR and wait for Ready=True with InstallSucceeded/UpgradeSucceeded/
ReconciliationSucceeded/TestSucceeded reason
restore-flux-source-default-branch already runs after gates via needs:, so
no reorder is required there.
Issue: #897
Agent evaluation advisory gatePilot agent evaluations completed in advisory mode. Download the This gate is intentionally non-blocking while baselines are calibrated. |
…s secret-masked GitHub Actions secret-masks AGC_SUBNET_ID outputs that contain a subscription GUID matching a configured secret. The render publication-context steps then fall through to live-cluster recovery, which fails when the ApplicationLoadBalancer was previously pruned, producing a CRUD manifest without ALB+Gateway. Flux applies the truncated manifest, prunes them again, and validate-agc-readiness fails. Fix: add an Azure CLI fallback that resolves the AGC delegated subnet directly via 'az network vnet subnet show' so the render context is authoritative even when both the masked output and the live cluster recovery are empty. Hard-fail when AGC_SUBNET_ID still cannot be resolved instead of silently rendering an incomplete manifest.
Agent evaluation advisory gatePilot agent evaluations completed in advisory mode. Download the This gate is intentionally non-blocking while baselines are calibrated. |
Adds the foundation for continuous agent response evaluation using Azure AI Foundry evaluation with deterministic local fallback.
Summary:
Validation:
Live validation note:
Fixes #897