Skip to content

Add defaultServiceAccountName option in Helm #1405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jason810496
Copy link

Why

I'm using Spark Operator with Enterprise Gateway, with the kernel.shareGatewayNamespace=true option enabled via Helm.
The Enterprise Gateway runs under the enterprise-gateway namespace, and I need to set EG_DEFAULT_KERNEL_SERVICE_ACCOUNT_NAME="spark-operator-spark".

Without this, the Driver Pods launched by the Spark Operator encounter the following error:

5/05/01 07:52:21 INFO Utils: Successfully started service 'SparkUI' on port 4040.
25/05/01 07:52:21 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-bdda30968acbb132-driver-svc.enterprise-gateway.svc:4040
25/05/01 07:52:21 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
25/05/01 07:52:21 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
        at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2979)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:559)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:750)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/enterprise-gateway/pods/jovyan-7927a7ed-7627-4639-bf30-0118614dc390-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "jovyan-7927a7ed-7627-4639-bf30-0118614dc390-driver" is forbidden: User "system:serviceaccount:enterprise-gateway:default" cannot get resource "pods" in API group "" in the namespace "enterprise-gateway".
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:639)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:576)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:543)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:471)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:453)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:947)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:221)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:187)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:86)
        at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:79)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:78)
        at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:118)
        at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2973)
        ... 14 more
25/05/01 07:52:21 INFO SparkUI: Stopped Spark web UI at http://spark-bdda30968acbb132-driver-svc.enterprise-gateway.svc:4040
25/05/01 07:52:21 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

What

Currently, I have to manually patch the Enterprise Gateway deployment to include EG_DEFAULT_KERNEL_SERVICE_ACCOUNT_NAME="spark-operator-spark" in order to resolve service account permission issues.

To make this easier and more configurable, we should introduce a Helm option to set the EG_DEFAULT_KERNEL_SERVICE_ACCOUNT_NAME environment variable directly.

@patilsanket48
Copy link

@jason810496 you can directly set KERNEL_SERVICE_ACCOUNT_NAME variable and it should work or you have any specific case ?

@jason810496
Copy link
Author

jason810496 commented May 8, 2025

@jason810496 you can directly set KERNEL_SERVICE_ACCOUNT_NAME variable and it should work or you have any specific case ?

Actually, I worked around it by setting EG_DEFAULT_KERNEL_SERVICE_ACCOUNT_NAME through extraEnv.
However, I think it would be more user-friendly if all EG_DEFAULT_KERNEL_* environment variables could be configured directly via Helm values.

extraEnv: {
    EG_DEFAULT_KERNEL_SERVICE_ACCOUNT_NAME: "spark-operator-spark"
}

@lresende
Copy link
Member

lresende commented May 8, 2025

Both ways should accomplish the same, the idea is that you can set a global/default one, and have the ability to overwrite on the kernel level. @jason810496, this looks good, thanks for the contribution, I need to find some time to go over the build issues (not related to your changes) before I can merge your pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants