-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
kind/bugSomething isn't workingSomething isn't working
Description
What happened?
- When a server pod fails to run due to quota, the quota error can only be seen in the spark controller operator logs. There are no events or errors in the sparkconnect crd, as is done for sparkapplication
- Additionally, the Spark Connect server pod can start if it has enough resources, but the executor pods will not start if they have no more resources left.
Reproduction Code
Add -enable-resource-quota-enforcement=true to spark operator webhook and configure SparkConnect so that it requests more resources than are available
Expected behavior
The error should be displayed in the Spark Connect crd
Actual behavior
Spark Connect's CRD is stuck in an empty state, pods aren't starting, the status doesn't change, and the Spark operator is cyclically trying to start Spark Connect.
Environment & Versions
- Kubernetes Version: 1.25.11
- Spark Operator Version: 2.3.0
- Apache Spark Version: >= 3.5.0
Additional context
Full quota support is needed in Spark Connect, as is done for Spark application.
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
GrimRanger, airkhin and kydim
Metadata
Metadata
Assignees
Labels
kind/bugSomething isn't workingSomething isn't working