Skip to content

[BUG] No resource quota support in Spark Connect #2790

@Vadim-elo

Description

@Vadim-elo

What happened?

  • When a server pod fails to run due to quota, the quota error can only be seen in the spark controller operator logs. There are no events or errors in the sparkconnect crd, as is done for sparkapplication
  • Additionally, the Spark Connect server pod can start if it has enough resources, but the executor pods will not start if they have no more resources left.

Reproduction Code

Add -enable-resource-quota-enforcement=true to spark operator webhook and configure SparkConnect so that it requests more resources than are available

Expected behavior

The error should be displayed in the Spark Connect crd

Actual behavior

Spark Connect's CRD is stuck in an empty state, pods aren't starting, the status doesn't change, and the Spark operator is cyclically trying to start Spark Connect.

Environment & Versions

  • Kubernetes Version: 1.25.11
  • Spark Operator Version: 2.3.0
  • Apache Spark Version: >= 3.5.0

Additional context

Full quota support is needed in Spark Connect, as is done for Spark application.

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions