Skip to content

Commit 271a002

Browse files
authored
feat: add oauth2-proxy sidecar support (#77)
### What type of PR is this? Enhancement ### What this PR does / why we need it: Adds comprehensive oauth2-proxy sidecar support for Spark Driver UI authentication. OAuth2-proxy Integration: Implements native OAuth2-proxy sidecar containers with 35+ configuration options, supporting OIDC discovery, custom providers, and secure cookie management. Enables authenticated access to Spark Driver UI through configurable authentication providers. Ingress Support: Adds TLS-enabled ingress configuration with intelligent port routing - OAuth proxy serves on port 4180, Spark UI on port 4040, with proper precedence resolution (CLI > Template > Default). ### Does this PR introduce a user-facing change? Check the `docs/architecture.md` and `docs/ui.md` for changes. `spark.armada.oauth` contains configuration settings for the oauth proxy. --------- Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
1 parent 27aefe0 commit 271a002

File tree

13 files changed

+1128
-137
lines changed

13 files changed

+1128
-137
lines changed

docs/architecture.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,50 @@ They can be set in the [conf](../conf/spark-defaults.conf) file.
235235
Annotations should be in the format key=value, e.g. `nginx.ingress.kubernetes.io/rewrite-target=/`.
236236
- `spark.armada.driver.ingress.certName` - The name of the TLS certificate to use for the Ingress resource.
237237
This is used when `spark.armada.driver.ingress.tls.enabled` is set to true.
238+
- `spark.armada.driver.ingress.port` - The port to expose via Ingress. If not set, defaults to OAuth proxy port (if enabled) or Spark UI port.
239+
240+
### OAuth2 Authentication Configuration
241+
242+
`armada-spark` supports OAuth2-based authentication for the Spark Driver WebUI using OAuth2-Proxy as a native sidecar.
243+
For detailed setup instructions and examples, see [UI Access Documentation](./ui.md).
244+
245+
- `spark.armada.oauth.enabled` - Enable OAuth2 authentication for Spark UI.
246+
- `spark.armada.oauth.clientId` - OAuth2 client ID.
247+
- `spark.armada.oauth.clientSecret` - OAuth2 client secret.
248+
- `spark.armada.oauth.clientSecretK8s` - Name of Kubernetes secret containing client secret.
249+
- `spark.armada.oauth.issuerUrl` - OIDC issuer URL.
250+
- `spark.armada.oauth.redirectUrl` - OAuth redirect URL.
251+
- `spark.armada.oauth.proxy.image` - OAuth2-proxy Docker image.
252+
- `spark.armada.oauth.proxy.port` - Port for OAuth2-proxy to listen on.
253+
- `spark.armada.oauth.providerDisplayName` - Provider name shown in OAuth UI.
254+
- `spark.armada.oauth.skipProviderDiscovery` - Skip OIDC discovery and use explicit endpoints.
255+
- `spark.armada.oauth.loginUrl` - OIDC authorization endpoint.
256+
- `spark.armada.oauth.redeemUrl` - OIDC token endpoint.
257+
- `spark.armada.oauth.validateUrl` - OIDC userinfo endpoint.
258+
- `spark.armada.oauth.jwksUrl` - OIDC JWKS endpoint.
259+
- `spark.armada.oauth.extraAudiences` - Comma-separated list of additional OIDC audiences.
260+
- `spark.armada.oauth.emailDomain` - Allowed email domains.
261+
- `spark.armada.oauth.skipJwtBearerTokens` - Skip JWT bearer token validation.
262+
- `spark.armada.oauth.skipProviderButton` - Skip provider selection button.
263+
- `spark.armada.oauth.skipAuthPreflight` - Skip authentication for OPTIONS requests.
264+
- `spark.armada.oauth.passHostHeader` - Pass Host header to upstream.
265+
- `spark.armada.oauth.whitelistDomain` - Whitelist redirect domains.
266+
- `spark.armada.oauth.cookieName` - OAuth session cookie name.
267+
- `spark.armada.oauth.cookiePath` - Cookie path.
268+
- `spark.armada.oauth.cookieSecure` - Require HTTPS for cookies.
269+
- `spark.armada.oauth.cookieSamesite` - SameSite cookie attribute.
270+
- `spark.armada.oauth.cookieCsrfPerRequest` - Enable CSRF per request.
271+
- `spark.armada.oauth.cookieCsrfExpire` - CSRF cookie expiration duration.
272+
- `spark.armada.oauth.tls.caCertPath` - Path to CA certificate for custom TLS validation.
273+
- `spark.armada.oauth.tls.caBundlePath` - Path to CA bundle for custom TLS validation.
274+
- `spark.armada.oauth.skipVerify` - Skip TLS certificate verification.
275+
- `spark.armada.oauth.insecureSkipIssuerVerification` - Skip OIDC issuer verification.
276+
- `spark.armada.oauth.insecureAllowUnverifiedEmail` - Allow unverified email addresses.
277+
- `spark.armada.oauth.codeChallengeMethod` - PKCE code challenge method.
278+
- `spark.armada.oauth.resources.cpu` - CPU resource limit/request for OAuth proxy.
279+
- `spark.armada.oauth.resources.memory` - Memory resource limit/request for OAuth proxy.
280+
281+
See [UI Access Documentation](./ui.md) for examples and troubleshooting.
238282

239283
# Building `armada-spark`
240284

docs/ui.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# Spark Driver UI Access
2+
3+
## Direct Access
4+
5+
### Port-forward (simplest)
6+
7+
```bash
8+
kubectl -n <namespace> port-forward <driver-pod-name> 4040:4040
9+
```
10+
11+
Then open: `http://localhost:4040`
12+
13+
**Finding the pod:** Check Lookout UI for job details. Pod name is typically `armada-<job-id>-0`.
14+
15+
### Basic Ingress (no auth)
16+
17+
```bash
18+
--conf spark.armada.driver.ingress.enabled=true
19+
```
20+
21+
**Warning:** Exposes UI publicly without authentication!
22+
23+
---
24+
25+
## OAuth2-Protected Access
26+
27+
Uses [oauth2-proxy](https://oauth2-proxy.github.io/oauth2-proxy/) as a native sidecar for authentication.
28+
29+
### Quick Start
30+
31+
```bash
32+
/opt/spark/bin/spark-submit \
33+
--master armada://localhost:50051 \
34+
--deploy-mode cluster \
35+
--name my-secure-job \
36+
--class org.apache.spark.examples.SparkPi \
37+
--conf spark.armada.container.image=armada-spark \
38+
--conf spark.armada.oauth.enabled=true \
39+
--conf spark.armada.oauth.clientId=spark-oauth-client \
40+
--conf spark.armada.oauth.clientSecret=your-secret \
41+
--conf spark.armada.oauth.issuerUrl=https://keycloak.example.com/realms/spark \
42+
--conf spark.armada.driver.ingress.enabled=true \
43+
--conf spark.armada.driver.ingress.tls.enabled=true \
44+
--conf spark.armada.driver.ingress.certName=my-tls-cert \
45+
local:///opt/spark/examples/jars/spark-examples.jar
46+
```
47+
48+
**What happens:**
49+
1. `oauth` sidecar container added to driver pod
50+
2. Ingress → oauth2-proxy (port 4180) → authenticates user → Spark UI (localhost:4040)
51+
3. oauth2-proxy terminates when driver completes
52+
53+
### Configuration Examples
54+
55+
See [OAuth2 Authentication Configuration](./architecture.md#oauth2-authentication-configuration) for all parameters.
56+
57+
**Using OIDC discovery:**
58+
```bash
59+
--conf spark.armada.oauth.enabled=true \
60+
--conf spark.armada.oauth.clientId=my-client \
61+
--conf spark.armada.oauth.clientSecret=my-secret \
62+
--conf spark.armada.oauth.issuerUrl=https://provider.com/realms/spark \
63+
--conf spark.armada.driver.ingress.enabled=true \
64+
--conf spark.armada.driver.ingress.tls.enabled=true
65+
```
66+
67+
**Manual endpoints (no discovery):**
68+
```bash
69+
--conf spark.armada.oauth.enabled=true \
70+
--conf spark.armada.oauth.skipProviderDiscovery=true \
71+
--conf spark.armada.oauth.loginUrl=https://provider.com/auth \
72+
--conf spark.armada.oauth.redeemUrl=http://provider.svc.cluster.local/token \
73+
--conf spark.armada.oauth.validateUrl=http://provider.svc.cluster.local/userinfo \
74+
--conf spark.armada.oauth.jwksUrl=http://provider.svc.cluster.local/certs
75+
```
76+
77+
**Use cluster-internal URLs** for `redeemUrl`/`validateUrl`/`jwksUrl`, external URL for `loginUrl`.
78+
79+
**Using K8s secrets (recommended):**
80+
```bash
81+
kubectl create secret generic spark-oauth-secret \
82+
--from-literal=client-secret=your-secret -n spark-jobs
83+
84+
--conf spark.armada.oauth.clientId=my-client \
85+
--conf spark.armada.oauth.clientSecretK8s=spark-oauth-secret
86+
```
87+
88+
---
89+
90+
## Troubleshooting
91+
92+
### 502 Bad Gateway after login
93+
94+
**Cause:** Spark UI not running (job finished too quickly or UI disabled)
95+
96+
**Check logs:**
97+
```bash
98+
kubectl logs -n <namespace> <driver-pod> -c oauth
99+
```
100+
101+
Look for: `Error proxying to upstream server: dial tcp 127.0.0.1:4040: connect: connection refused`
102+
103+
**Solutions:**
104+
- Use longer-running job (Spark Pi finishes in seconds)
105+
- Spark UI has 90s delay after job completion by default
106+
- Verify `spark.ui.enabled=true` (default)
107+
108+
### Authentication keeps redirecting
109+
110+
**Cause:** Cookie config or OIDC provider issues
111+
112+
**Solutions:**
113+
```bash
114+
# For HTTP (dev only):
115+
--conf spark.armada.oauth.cookieSecure=false
116+
117+
# Check SameSite:
118+
--conf spark.armada.oauth.cookieSamesite=lax
119+
120+
# Verify redirect URL matches OIDC provider config:
121+
--conf spark.armada.oauth.redirectUrl=https://your-host/oauth2/callback
122+
```
123+
124+
### Finding ingress URL
125+
126+
In Lookout, under Result tab, as soon as a Job is leased to a Cluster and bound to a Node, the Ingress URL will be accessible in that tab.
127+
128+
Or alternatively, the Ingress URL can be looked up by fetching the Ingress from the namespace where the Job is scheduled.
129+
```bash
130+
kubectl get ingress -n <namespace>
131+
# Output: oauth-4180-armada-<job-id>-0.namespace.svc
132+
```
133+
134+
---
135+
136+
## Resources
137+
138+
- [OAuth2 Configuration Reference](./architecture.md#oauth2-authentication-configuration)
139+
- [oauth2-proxy docs](https://oauth2-proxy.github.io/oauth2-proxy/)
140+
- [Spark UI docs](https://spark.apache.org/docs/latest/web-ui.html)

0 commit comments

Comments
 (0)