-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
kind/bugSomething isn't workingSomething isn't working
Description
What happened?
- ✋ I have searched the open/closed issues and my issue is not listed.
When I restart spark-operator, sometimes job fails with the error of "driver pod already exist". I checked the kubernetes event, and find the following event: - T0: got the event of SparkApplicationSubmitted
- T1: old spark-operator exit
- T2: new spark-operator started
- T3: got the event of SparkApplicationAdded
- T4: got the event of SparkApplicationSubmissionFailed
- T5: got the event of SparkApplicationFailed
I guess this bug happens when spark-operator abruptly exist while spark-submit command is done but updateSparkApplicationStatus is not executed so that SparkApplication status is still "new"(""). Then the new spark-operator is up and try to re-submit again as the status is "new".
Reproduction Code
Keep submitting lots of jobs and restart the spark-operator
Expected behavior
jobs can be started
Actual behavior
some jobs fail with the error of "driver pod already exist"
Environment & Versions
- Kubernetes Version: 1.33
- Spark Operator Version: 2.3.0
- Apache Spark Version:
Additional context
No response
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
tasier and rahul810050
Metadata
Metadata
Assignees
Labels
kind/bugSomething isn't workingSomething isn't working