Skip to content

Commit c3657cd

Browse files
committed
fix(auth): harden sandbox identity checks
Require persisted sandbox records before IssueSandboxToken and RefreshSandboxToken mint gateway JWTs. This closes the stale-token path where a deleted sandbox identity could continue refreshing itself until token expiry windows were repeatedly extended. Pin PushSandboxLogs streams to the first validated sandbox id. A sandbox now validates scope and sandbox existence once, then any later batch that changes sandbox_id is rejected instead of being accepted under the original validation. For Kubernetes bootstrap, add service_account_name to the Kubernetes driver config, set it on sandbox pod specs, and require TokenReview usernames to match system:serviceaccount:<sandbox-namespace>:<service-account>. The Helm chart provisions a dedicated sandbox ServiceAccount, places it in the sandbox namespace, scopes sandbox RBAC there, and writes the generated name into gateway.toml. Update Helm unit coverage, Helm README, gateway/driver docs, architecture notes, and debug-openshell-cluster guidance for the new sandbox ServiceAccount behavior. Validation: mise run pre-commit; Kubernetes smoke e2e via helm-dev-environment/k3d; Docker smoke e2e; Podman smoke e2e.
1 parent 5601273 commit c3657cd

24 files changed

Lines changed: 469 additions & 63 deletions

File tree

.agents/skills/debug-openshell-cluster/SKILL.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,18 @@ helm -n openshell get values openshell | grep sandboxNamespace
212212

213213
Then inspect sandbox resources in that namespace.
214214

215+
Check the configured sandbox service account when TokenReview bootstrap or
216+
sandbox registration fails. Helm creates a dedicated sandbox service account by
217+
default and writes it to `[openshell.drivers.kubernetes].service_account_name`;
218+
the gateway rejects projected tokens from other service accounts.
219+
220+
```bash
221+
helm -n openshell get values openshell | grep -A3 sandboxServiceAccount
222+
kubectl -n <sandbox-namespace> get serviceaccount openshell-sandbox
223+
kubectl -n openshell get configmap openshell-config -o jsonpath='{.data.gateway\.toml}'
224+
kubectl -n <sandbox-namespace> get sandbox <sandbox-name> -o jsonpath='{.spec.template.spec.serviceAccountName}{"\n"}'
225+
```
226+
215227
### Step 6: Check VM-Backed Gateways
216228

217229
Use the VM driver logs and host diagnostics available in the user's environment. Verify:

architecture/gateway.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,11 @@ Sandbox secrets are gateway-signed JWTs bound to a single sandbox ID. Docker,
5050
Podman, and VM drivers deliver the initial token through supervisor-only
5151
runtime material; Kubernetes supervisors exchange a projected ServiceAccount
5252
token through `IssueSandboxToken`. The gateway validates that projected token
53-
with Kubernetes `TokenReview`, checks the returned pod binding against the live
54-
pod UID, and reads the pod's sandbox annotation before minting the gateway JWT.
55-
Supervisors renew gateway JWTs in memory before expiry. Older tokens are not
56-
server-revoked; deployments bound replay exposure with short
53+
with Kubernetes `TokenReview`, requires the configured sandbox service account,
54+
checks the returned pod binding against the live pod UID, and reads the pod's
55+
sandbox annotation before minting the gateway JWT. Supervisors renew gateway
56+
JWTs in memory before expiry only while the sandbox record still exists. Older
57+
tokens are not server-revoked; deployments bound replay exposure with short
5758
`gateway_jwt.ttl_secs` lifetimes.
5859

5960
Sandbox JWTs are not user credentials. The gRPC router accepts

crates/openshell-driver-kubernetes/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,12 @@ The driver injects gateway callback configuration, sandbox identity, TLS client
3838
material, and the supervisor SSH socket path into the workload. Driver-owned
3939
values must override image-provided environment variables.
4040

41+
Sandbox pods run as `service_account_name` and keep
42+
`automountServiceAccountToken: false`. The only Kubernetes token exposed to the
43+
supervisor is an explicit, audience-bound projected token mounted at
44+
`/var/run/secrets/openshell/token` for the one-shot `IssueSandboxToken`
45+
bootstrap exchange.
46+
4147
The gateway uses the supervisor relay for connect, exec, and file sync. Sandbox
4248
pods do not need direct external ingress for SSH.
4349

crates/openshell-driver-kubernetes/src/config.rs

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ use serde::{Deserialize, Serialize};
77
/// Default Kubernetes namespace for sandbox resources.
88
pub const DEFAULT_K8S_NAMESPACE: &str = "openshell";
99

10+
/// Default Kubernetes `ServiceAccount` assigned to sandbox pods.
11+
pub const DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME: &str = "default";
12+
1013
/// Default storage size for the workspace PVC.
1114
pub const DEFAULT_WORKSPACE_STORAGE_SIZE: &str = "2Gi";
1215

@@ -51,6 +54,9 @@ impl std::str::FromStr for SupervisorSideloadMethod {
5154
#[serde(default, deny_unknown_fields)]
5255
pub struct KubernetesComputeConfig {
5356
pub namespace: String,
57+
/// Kubernetes `ServiceAccount` assigned to sandbox pods and accepted by
58+
/// the gateway's `TokenReview` bootstrap authenticator.
59+
pub service_account_name: String,
5460
pub default_image: String,
5561
pub image_pull_policy: String,
5662
/// Image that provides the `openshell-sandbox` supervisor binary.
@@ -91,6 +97,7 @@ impl Default for KubernetesComputeConfig {
9197
fn default() -> Self {
9298
Self {
9399
namespace: DEFAULT_K8S_NAMESPACE.to_string(),
100+
service_account_name: DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME.to_string(),
94101
default_image: openshell_core::image::default_sandbox_image(),
95102
// Default empty so the gateway omits `imagePullPolicy` from pod
96103
// specs and Kubernetes applies its own default (Always for `latest`,
@@ -139,6 +146,15 @@ mod tests {
139146
);
140147
}
141148

149+
#[test]
150+
fn default_service_account_name_is_default() {
151+
let cfg = KubernetesComputeConfig::default();
152+
assert_eq!(
153+
cfg.service_account_name,
154+
DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME
155+
);
156+
}
157+
142158
#[test]
143159
fn serde_override_workspace_storage_size() {
144160
let json = serde_json::json!({
@@ -147,4 +163,13 @@ mod tests {
147163
let cfg: KubernetesComputeConfig = serde_json::from_value(json).unwrap();
148164
assert_eq!(cfg.workspace_default_storage_size, "10Gi");
149165
}
166+
167+
#[test]
168+
fn serde_override_service_account_name() {
169+
let json = serde_json::json!({
170+
"service_account_name": "openshell-sandbox"
171+
});
172+
let cfg: KubernetesComputeConfig = serde_json::from_value(json).unwrap();
173+
assert_eq!(cfg.service_account_name, "openshell-sandbox");
174+
}
150175
}

crates/openshell-driver-kubernetes/src/driver.rs

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
//! Kubernetes compute driver.
55
66
use crate::config::{
7-
DEFAULT_WORKSPACE_STORAGE_SIZE, KubernetesComputeConfig, SupervisorSideloadMethod,
7+
DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME, DEFAULT_WORKSPACE_STORAGE_SIZE, KubernetesComputeConfig,
8+
SupervisorSideloadMethod,
89
};
910
use futures::{Stream, StreamExt, TryStreamExt};
1011
use k8s_openapi::api::core::v1::{Event as KubeEventObj, Node};
@@ -320,6 +321,7 @@ impl KubernetesComputeDriver {
320321
supervisor_image: &self.config.supervisor_image,
321322
supervisor_image_pull_policy: &self.config.supervisor_image_pull_policy,
322323
supervisor_sideload_method: self.config.supervisor_sideload_method,
324+
service_account_name: &self.config.service_account_name,
323325
sandbox_id: &sandbox.id,
324326
sandbox_name: &sandbox.name,
325327
grpc_endpoint: &self.config.grpc_endpoint,
@@ -1048,6 +1050,7 @@ struct SandboxPodParams<'a> {
10481050
supervisor_image: &'a str,
10491051
supervisor_image_pull_policy: &'a str,
10501052
supervisor_sideload_method: SupervisorSideloadMethod,
1053+
service_account_name: &'a str,
10511054
sandbox_id: &'a str,
10521055
sandbox_name: &'a str,
10531056
grpc_endpoint: &'a str,
@@ -1069,6 +1072,7 @@ impl Default for SandboxPodParams<'_> {
10691072
supervisor_image: "",
10701073
supervisor_image_pull_policy: "",
10711074
supervisor_sideload_method: SupervisorSideloadMethod::default(),
1075+
service_account_name: DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME,
10721076
sandbox_id: "",
10731077
sandbox_name: "",
10741078
grpc_endpoint: "",
@@ -1223,6 +1227,13 @@ fn sandbox_template_to_k8s(
12231227
}
12241228
}
12251229

1230+
if !params.service_account_name.is_empty() {
1231+
spec.insert(
1232+
"serviceAccountName".to_string(),
1233+
serde_json::json!(params.service_account_name),
1234+
);
1235+
}
1236+
12261237
// Disable service account token auto-mounting for security hardening.
12271238
// Sandbox pods should not have access to the Kubernetes API by default.
12281239
spec.insert(
@@ -2492,6 +2503,32 @@ mod tests {
24922503
);
24932504
}
24942505

2506+
#[test]
2507+
fn sandbox_template_sets_configured_service_account_name() {
2508+
let params = SandboxPodParams {
2509+
service_account_name: "openshell-sandbox",
2510+
..Default::default()
2511+
};
2512+
let pod_template = sandbox_template_to_k8s(
2513+
&SandboxTemplate::default(),
2514+
false,
2515+
&std::collections::HashMap::new(),
2516+
true,
2517+
&params,
2518+
);
2519+
2520+
assert_eq!(
2521+
pod_template["spec"]["serviceAccountName"],
2522+
serde_json::json!("openshell-sandbox"),
2523+
"sandbox pods must run under the configured service account"
2524+
);
2525+
assert_eq!(
2526+
pod_template["spec"]["automountServiceAccountToken"],
2527+
serde_json::json!(false),
2528+
"explicit service account selection must not re-enable default token automounting"
2529+
);
2530+
}
2531+
24952532
#[test]
24962533
fn platform_config_bool_extracts_value() {
24972534
let template = SandboxTemplate {

crates/openshell-driver-kubernetes/src/lib.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ pub mod driver;
66
pub mod grpc;
77

88
pub use config::{
9-
DEFAULT_WORKSPACE_STORAGE_SIZE, KubernetesComputeConfig, SupervisorSideloadMethod,
9+
DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME, DEFAULT_WORKSPACE_STORAGE_SIZE, KubernetesComputeConfig,
10+
SupervisorSideloadMethod,
1011
};
1112
pub use driver::{KubernetesComputeDriver, KubernetesDriverError};
1213
pub use grpc::ComputeDriverService;

crates/openshell-driver-kubernetes/src/main.rs

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ use tracing_subscriber::EnvFilter;
1010
use openshell_core::VERSION;
1111
use openshell_core::proto::compute::v1::compute_driver_server::ComputeDriverServer;
1212
use openshell_driver_kubernetes::{
13-
ComputeDriverService, KubernetesComputeConfig, KubernetesComputeDriver,
14-
SupervisorSideloadMethod,
13+
ComputeDriverService, DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME, KubernetesComputeConfig,
14+
KubernetesComputeDriver, SupervisorSideloadMethod,
1515
};
1616

1717
#[derive(Parser, Debug)]
@@ -31,6 +31,13 @@ struct Args {
3131
#[arg(long, env = "OPENSHELL_SANDBOX_NAMESPACE", default_value = "default")]
3232
sandbox_namespace: String,
3333

34+
#[arg(
35+
long,
36+
env = "OPENSHELL_K8S_SANDBOX_SERVICE_ACCOUNT",
37+
default_value = DEFAULT_SANDBOX_SERVICE_ACCOUNT_NAME
38+
)]
39+
sandbox_service_account: String,
40+
3441
#[arg(long, env = "OPENSHELL_SANDBOX_IMAGE")]
3542
sandbox_image: Option<String>,
3643

@@ -88,6 +95,7 @@ async fn main() -> Result<()> {
8895

8996
let driver = KubernetesComputeDriver::new(KubernetesComputeConfig {
9097
namespace: args.sandbox_namespace,
98+
service_account_name: args.sandbox_service_account,
9199
default_image: args.sandbox_image.unwrap_or_default(),
92100
image_pull_policy: args.sandbox_image_pull_policy.unwrap_or_default(),
93101
supervisor_image: args

crates/openshell-server/src/auth/k8s_sa.rs

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -137,17 +137,24 @@ pub struct LiveK8sResolver {
137137
pods_api: Api<Pod>,
138138
expected_audience: String,
139139
sandbox_namespace: String,
140+
expected_service_account: String,
140141
}
141142

142143
impl LiveK8sResolver {
143-
pub fn new(client: kube::Client, namespace: &str, expected_audience: String) -> Self {
144+
pub fn new(
145+
client: kube::Client,
146+
namespace: &str,
147+
expected_audience: String,
148+
expected_service_account: String,
149+
) -> Self {
144150
let token_reviews_api: Api<TokenReview> = Api::all(client.clone());
145151
let pods_api: Api<Pod> = Api::namespaced(client, namespace);
146152
Self {
147153
token_reviews_api,
148154
pods_api,
149155
expected_audience,
150156
sandbox_namespace: namespace.to_string(),
157+
expected_service_account,
151158
}
152159
}
153160
}
@@ -175,15 +182,20 @@ impl K8sIdentityResolver for LiveK8sResolver {
175182
let status = review
176183
.status
177184
.ok_or_else(|| Status::internal("TokenReview response missing status"))?;
178-
let Some(identity) =
179-
token_review_identity(&status, &self.expected_audience, &self.sandbox_namespace)?
185+
let Some(identity) = token_review_identity(
186+
&status,
187+
&self.expected_audience,
188+
&self.sandbox_namespace,
189+
&self.expected_service_account,
190+
)?
180191
else {
181192
return Ok(None);
182193
};
183194

184195
info!(
185196
pod_name = %identity.pod_name,
186197
pod_uid = %identity.pod_uid,
198+
service_account = %self.expected_service_account,
187199
"validated K8s SA token via TokenReview"
188200
);
189201

@@ -243,6 +255,7 @@ fn token_review_identity(
243255
status: &TokenReviewStatus,
244256
expected_audience: &str,
245257
sandbox_namespace: &str,
258+
expected_service_account: &str,
246259
) -> Result<Option<TokenReviewIdentity>, Status> {
247260
if status.authenticated != Some(true) {
248261
debug!(
@@ -270,15 +283,17 @@ fn token_review_identity(
270283
.username
271284
.as_deref()
272285
.ok_or_else(|| Status::permission_denied("TokenReview response missing username"))?;
273-
let expected_prefix = format!("system:serviceaccount:{sandbox_namespace}:");
274-
if !username.starts_with(&expected_prefix) {
286+
let expected_username =
287+
format!("system:serviceaccount:{sandbox_namespace}:{expected_service_account}");
288+
if username != expected_username {
275289
warn!(
276290
username = %username,
277291
sandbox_namespace = %sandbox_namespace,
278-
"K8s TokenReview principal is outside the sandbox namespace"
292+
service_account = %expected_service_account,
293+
"K8s TokenReview principal is not the configured sandbox service account"
279294
);
280295
return Err(Status::permission_denied(
281-
"SA token is outside the sandbox namespace",
296+
"SA token is not from the configured sandbox service account",
282297
));
283298
}
284299

@@ -388,7 +403,7 @@ mod tests {
388403
],
389404
);
390405

391-
let identity = token_review_identity(&status, "openshell-gateway", "openshell")
406+
let identity = token_review_identity(&status, "openshell-gateway", "openshell", "default")
392407
.unwrap()
393408
.expect("authenticated token should resolve");
394409

@@ -405,7 +420,7 @@ mod tests {
405420
};
406421

407422
assert!(
408-
token_review_identity(&status, "openshell-gateway", "openshell")
423+
token_review_identity(&status, "openshell-gateway", "openshell", "default")
409424
.unwrap()
410425
.is_none()
411426
);
@@ -423,7 +438,7 @@ mod tests {
423438
],
424439
);
425440

426-
let err = token_review_identity(&status, "openshell-gateway", "openshell")
441+
let err = token_review_identity(&status, "openshell-gateway", "openshell", "default")
427442
.expect_err("wrong audience must fail closed");
428443
assert_eq!(err.code(), tonic::Code::Unauthenticated);
429444
}
@@ -440,11 +455,28 @@ mod tests {
440455
],
441456
);
442457

443-
let err = token_review_identity(&status, "openshell-gateway", "openshell")
458+
let err = token_review_identity(&status, "openshell-gateway", "openshell", "default")
444459
.expect_err("other namespace must be rejected");
445460
assert_eq!(err.code(), tonic::Code::PermissionDenied);
446461
}
447462

463+
#[test]
464+
fn token_review_identity_requires_configured_service_account() {
465+
let status = token_review_status(
466+
true,
467+
vec!["openshell-gateway"],
468+
"system:serviceaccount:openshell:other",
469+
vec![
470+
(POD_NAME_EXTRA, "openshell-sandbox-a"),
471+
(POD_UID_EXTRA, "uid-a"),
472+
],
473+
);
474+
475+
let err = token_review_identity(&status, "openshell-gateway", "openshell", "default")
476+
.expect_err("other service account must be rejected");
477+
assert_eq!(err.code(), tonic::Code::PermissionDenied);
478+
}
479+
448480
#[test]
449481
fn token_review_identity_requires_pod_bound_extras() {
450482
let status = token_review_status(
@@ -454,7 +486,7 @@ mod tests {
454486
vec![],
455487
);
456488

457-
let err = token_review_identity(&status, "openshell-gateway", "openshell")
489+
let err = token_review_identity(&status, "openshell-gateway", "openshell", "default")
458490
.expect_err("non pod-bound tokens must be rejected");
459491
assert_eq!(err.code(), tonic::Code::PermissionDenied);
460492
}

0 commit comments

Comments
 (0)