Spark K8s Constructor — Quick Reference
Date: 2025-01-26
Version: 0.1.0
Spark: 3.5.7, 4.1.0
Quick reference for Spark K8s Constructor commands, recipes, and presets
# Install Spark 4.1
helm install spark charts/spark-4.1 -n spark --create-namespace
# Install Spark 3.5 Connect
helm install spark-connect charts/spark-3.5/charts/spark-connect -n spark
# Install Spark 3.5 Standalone
helm install spark-standalone charts/spark-3.5/charts/spark-standalone -n spark
# Upgrade release
helm upgrade spark charts/spark-4.1 -n spark
# Uninstall
helm uninstall spark -n spark
# Get values
helm get values spark -n spark
Installation with Presets
# Spark 4.1
helm install spark charts/spark-4.1 \
-f charts/spark-4.1/values-scenario-jupyter-connect-k8s.yaml \
-n spark
# Spark 3.5
helm install spark-connect charts/spark-3.5/charts/spark-connect \
-f charts/spark-3.5/charts/spark-connect/values-scenario-jupyter-connect-k8s.yaml \
-n spark
# Enable components
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.enabled=true \
--set jupyter.enabled=true
# Configure resources
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.resources.requests.memory=2Gi \
--set connect.resources.limits.memory=4Gi
# S3 configuration
helm upgrade spark charts/spark-4.1 -n spark \
--set global.s3.endpoint=http://minio:9000 \
--set global.s3.accessKey=minioadmin \
--set global.s3.secretKey=minioadmin
# Local validation (without installation)
helm template test charts/spark-4.1 -f values.yaml --dry-run
# Check presets
./scripts/validate-presets.sh
# Check security policies
./scripts/validate-policy.sh
# Show all values
helm show values charts/spark-4.1
# Render manifests to file
helm template spark charts/spark-4.1 -n spark > rendered.yaml
# Render with debug output
helm template spark charts/spark-4.1 -n spark --debug
# Release status
helm status spark -n spark
# Release history
helm history spark -n spark
# Enable/Disable
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.enabled=true
# Backend mode: k8s (dynamic executors)
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.backendMode=k8s
# Backend mode: standalone (fixed cluster)
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.backendMode=standalone \
--set connect.standalone.masterService=spark-sa-spark-standalone-master
# Backend mode: operator (Spark Operator)
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.backendMode=operator
# Spark 3.5 - Master + Workers
helm install spark-standalone charts/spark-3.5/charts/spark-standalone \
-n spark \
--set sparkMaster.enabled=true \
--set sparkWorker.replicas=3
# Spark 4.1 - via Connect
helm install spark charts/spark-4.1 -n spark \
--set connect.enabled=true \
--set connect.backendMode=standalone \
--set standalone.enabled=true
# Enable
helm upgrade spark charts/spark-4.1 -n spark \
--set historyServer.enabled=true
# Configure event log
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.eventLog.enabled=true \
--set connect.eventLog.dir=s3a://spark-logs/4.1/events
# Enable
helm upgrade spark charts/spark-4.1 -n spark \
--set jupyter.enabled=true
# Set Connect URL
helm upgrade spark charts/spark-4.1 -n spark \
--set jupyter.env.SPARK_CONNECT_URL=sc://spark-connect:15002
# Check driver logs
kubectl logs -n spark spark-driver-xxx -c spark-kubernetes-driver
# Describe driver pod
kubectl describe pod -n spark spark-driver-xxx
# Check events
kubectl get events -n spark --sort-by=' .lastTimestamp'
# Diagnostic script
./scripts/recipes/troubleshoot/check-driver-logs.sh spark
# Test connection
./scripts/recipes/troubleshoot/test-s3-connection.sh spark
# Check secret
kubectl get secret -n spark s3-credentials -o yaml
# Test from pod
kubectl exec -n spark spark-connect-0 -- curl -sf http://minio:9000/minio/health/live
# Find OOMKilled pods
./scripts/recipes/troubleshoot/check-executor-logs.sh spark
# Increase executor memory
helm upgrade spark charts/spark-4.1 -n spark \
--set connect.sparkConf.' spark.executor.memory' =4g \
--set connect.sparkConf.' spark.executor.memoryOverhead' =1g
# Check event log
kubectl exec -n spark spark-history-server-0 -- ls -la /spark-events
# Check configuration
kubectl logs -n spark spark-history-server-0
# See recipe
cat docs/recipes/troubleshoot/history-server-empty.md
# Check permissions
./scripts/recipes/troubleshoot/check-rbac.sh spark spark
# Test can-i
kubectl auth can-i create pods -n spark \
--as=system:serviceaccount:spark:spark
# List all Spark pods
kubectl get pods -n spark -l ' app in (spark-connect,spark-standalone)'
# Driver pods
kubectl get pods -n spark -l spark-role=driver
# Executor pods
kubectl get pods -n spark -l spark-role=executor
# Pods with errors
kubectl get pods -n spark | grep -E ' (Error|CrashLoopBackOff|OOMKilled)'
# Logs (last 100 lines)
kubectl logs -n spark < pod> --tail=100
# Follow logs
kubectl logs -n spark < pod> -f
# All containers logs
kubectl logs -n spark < pod> --all-containers=true
# Spark UI (driver on 4040)
kubectl port-forward -n spark < driver-pod> 4040:4040
# Spark Connect (15002)
kubectl port-forward -n spark svc/spark-connect 15002:15002
# Jupyter (8888)
kubectl port-forward -n spark svc/jupyter 8888:8888
# History Server (18080)
kubectl port-forward -n spark svc/spark-history-server 18080:18080
# MinIO Console (9001)
kubectl port-forward -n spark svc/minio 9001:9001
# All services
kubectl get svc -n spark
# Service endpoints
kubectl get endpoints -n spark spark-connect
# Ingress
kubectl get ingress -n spark
# ConfigMap
kubectl get cm -n spark spark-connect-configmap -o yaml
# Secrets
kubectl get secrets -n spark
# Show expanded ConfigMap
kubectl get cm -n spark spark-connect-configmap -o jsonpath=' {.data.spark-defaults\.conf}' | jq -r .
Variable
Description
SPARK_CONNECT_URL
Spark Connect URL (sc://host:port)
SPARK_REMOTE
Alias for SPARK_CONNECT_URL
SPARK_HOME
Path to Spark
Variable
Description
AWS_ACCESS_KEY_ID
Access key
AWS_SECRET_ACCESS_KEY
Secret key
SPARK_S3_ACCESS_KEY
Spark S3 access key
SPARK_S3_SECRET_KEY
Spark S3 secret key
SPARK_S3_ENDPOINT
S3 endpoint URL
Variable
Description
HIVE_METASTORE_URIS
Metastore thrift URIs
HIVE_METASTORE_WAREHOUSE_DIR
Warehouse location
Preset
Description
Components
values-scenario-jupyter-connect-k8s.yaml
Jupyter + Connect + K8s backend
Connect, Jupyter
values-scenario-jupyter-connect-standalone.yaml
Jupyter + Connect + Standalone
Connect, Jupyter, Standalone
values-scenario-airflow-connect-k8s.yaml
Airflow + Connect + K8s backend
Connect, Airflow
values-scenario-airflow-connect-standalone.yaml
Airflow + Connect + Standalone
Connect, Airflow, Standalone
values-scenario-airflow-k8s-submit.yaml
Airflow + K8s submit mode
Airflow
values-scenario-airflow-operator.yaml
Airflow + Spark Operator
Airflow, Operator
Preset
Description
Components
values-scenario-jupyter-connect-k8s.yaml
Jupyter + Connect + K8s
Connect, Jupyter
values-scenario-jupyter-connect-standalone.yaml
Jupyter + Connect + Standalone
Connect, Jupyter, Standalone
Preset
Description
Components
values-scenario-airflow-connect.yaml
Airflow + Standalone
Standalone, Airflow
values-scenario-airflow-k8s-submit.yaml
Airflow + K8s submit
Airflow
values-scenario-airflow-operator.yaml
Airflow + Operator
Airflow
# Copy local script to pod
kubectl cp ./script.py spark-driver-xxx:/tmp/script.py -n spark
# Copy from pod
kubectl cp spark-driver-xxx:/tmp/output.csv ./output.csv -n spark
# Execute in pod
kubectl exec -n spark spark-connect-0 -- python -c " print('test')"
# Interactive shell
kubectl exec -n spark spark-connect-0 -it -- /bin/bash
# Edit ConfigMap in vi
kubectl edit cm -n spark spark-connect-configmap