Skip to content

Conversation

@danehans
Copy link
Contributor

@danehans danehans commented Nov 4, 2025

Description

Fixes an issue where the inference extension did not program the managed Envoy proxy until controller restart due to mismatched endpoint sources between DP and IR translation paths.

Changes:

  • Updated processPoolBackendObjIR to use endpoints from IR instead of the pod index.
  • Removed pod index threading from plugin and collection initialization.
  • Relaxed route pass to not fail on empty endpoint sets (aligns with fail-open/503 behavior).
  • Simplified unit tests to seed IR endpoints directly, removing mock KRT index setup.
  • Passes errors to DP to ensure proper cluster/routing management.

Fixes: #12265

Change Type

/kind fix

Changelog

Fixes endpoint synchronization for inference extension plugin.

Additional Notes

This is not a backport from main since the Envoy-based data plane is being removed in v2.2.

@github-actions github-actions bot added kind/fix Categorizes issue or PR as related to a bug. release-note labels Nov 4, 2025
@danehans danehans force-pushed the issue_12265 branch 3 times, most recently from 48cc1ab to fc451d6 Compare November 4, 2025 19:35
@danehans danehans changed the title Inference: Fix EPP Endpoint Sync Inference: Fix EPP Endpoint Sync and Race Condition Nov 4, 2025
- Stores endpoints via atomic.Value and adds setEndpoints/getEndpoints to snapshot
  safely without locks.
- Updates Equals to compare endpoint snapshots without locks, fixing race condition
  in krt.Equal/DeepEqual.
- Switches error handling to hasErrors/snapshotErrors/setErrors. The backend path now
  returns empty ClusterLoadAssinment when errors exist.
- Updates tests to seed errors via setErrors and avoid direct field access.
- Keeps DP collection returning Backend IR on empty endpoints and relaxes route pass
  to allow empty endpoint sets.
- Passes errors to DP to ensure cluster management.

Signed-off-by: Daneyon Hansen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/fix Categorizes issue or PR as related to a bug. release-note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant