Skip to content

Commit a42ce8d

Browse files
committed
fix(auth): harden sandbox gateway authentication
Address PR review feedback on the per-sandbox authentication changes. Remove the implicit permissive user fallback once sandbox or user auth is configured. Missing credentials now fail closed unless an explicit local mode is selected. Keep mTLS user auth as a local single-player option for Docker, Podman, and VM gateways, reject it for Kubernetes, and add an explicit unsafe unauthenticated-user switch for trusted local Skaffold/Kubernetes development. Deliver sandbox JWTs through driver-owned token files for Docker, Podman, and VM sandboxes instead of placing raw bearers in container or guest environment metadata. Strip token env overrides from user-provided sandbox environments and update debug-rpc helpers to print token fingerprints, expiry, and claims rather than raw bearer values. Make certgen upgrades recover existing TLS-only installs by creating just the missing gateway JWT signing material while preserving existing TLS certificates and keys. Keep partial-state failures for inconsistent TLS or JWT sets. Improve supervisor token refresh behavior for short JWT TTLs by removing the 60-second refresh floor, using shorter retry backoff, and re-running the Kubernetes ServiceAccount bootstrap path after unauthenticated refresh failures. Update Helm defaults, Skaffold values, e2e gateway setup, Python gateway metadata handling, architecture notes, published docs, and generated chart docs to describe the new auth modes and local development behavior. Validation: mise run pre-commit; Docker smoke e2e; Podman smoke e2e; Kubernetes smoke e2e.
1 parent f0e500d commit a42ce8d

41 files changed

Lines changed: 1463 additions & 275 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

architecture/gateway.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,12 @@ identity.
2323
## Protocol and Auth
2424

2525
The gateway listens on one service port and multiplexes gRPC and HTTP traffic.
26-
The default deployment mode is mTLS: clients and sandbox workloads present a
27-
certificate signed by the deployment CA before reaching application handlers.
26+
The default local single-user deployment mode is mTLS user authentication:
27+
clients present a certificate signed by the local deployment CA, and the
28+
gateway maps the verified certificate subject to a user principal. Kubernetes
29+
deployments use mTLS for transport only and require OIDC or a trusted access
30+
proxy for user authentication unless the explicit unsafe local-development
31+
`allow_unauthenticated_users` switch is enabled.
2832
When that service port is bound to loopback, the listener can also accept
2933
plaintext HTTP on the same port for sandbox service subdomains only. That local
3034
browser path is enabled by default and disabled with
@@ -37,14 +41,15 @@ Supported auth modes:
3741

3842
| Mode | Use |
3943
|---|---|
40-
| mTLS | Default direct gateway access for CLI, SDK, TUI, and sandbox callbacks. |
44+
| mTLS user auth | Local single-user Docker, Podman, and VM gateway access. |
4145
| Plaintext | Local development or a trusted reverse proxy boundary. |
46+
| Unauthenticated local users | Trusted Kubernetes dev or fully trusted proxy deployments only. |
4247
| Cloudflare JWT | Edge-authenticated deployments where Cloudflare Access supplies identity. |
4348
| OIDC | Bearer-token auth for users, with browser PKCE or client credentials login. |
4449

45-
Sandbox supervisor RPCs authenticate with either mTLS material or a sandbox
46-
secret depending on the runtime and deployment mode. User-facing mutations are
47-
authorized by role policy when OIDC or edge identity is enabled.
50+
Sandbox supervisor RPCs authenticate with gateway-minted sandbox JWTs when that
51+
authenticator is configured; mTLS does not grant sandbox identity. User-facing
52+
mutations are authorized by role policy when OIDC or edge identity is enabled.
4853

4954
Sandbox secrets are gateway-signed JWTs bound to a single sandbox ID. Docker,
5055
Podman, and VM drivers deliver the initial token through supervisor-only

crates/openshell-bootstrap/src/metadata.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ pub struct GatewayMetadata {
2525
#[serde(skip_serializing_if = "Option::is_none", default)]
2626
pub resolved_host: Option<String>,
2727

28-
/// Auth mode: `None` or `"mtls"` = mTLS (default), `"plaintext"` = direct HTTP,
28+
/// Auth mode: `None` or `"mtls"` = mTLS, `"plaintext"` = direct HTTP,
2929
/// `"cloudflare_jwt"` = CF JWT.
3030
#[serde(default, skip_serializing_if = "Option::is_none")]
3131
pub auth_mode: Option<String>,

crates/openshell-cli/src/main.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,9 +122,9 @@ fn resolve_gateway_name(gateway_flag: &Option<String>) -> Option<String> {
122122

123123
/// Apply authentication token from local storage based on gateway auth mode.
124124
///
125-
/// Handles both Cloudflare Access (`edge_token`) and OIDC (`oidc_token`)
126-
/// auth modes by loading the stored token and setting it on `TlsOptions`.
127-
/// For OIDC, automatically refreshes the token if it's near expiry.
125+
/// Handles Cloudflare Access and OIDC auth modes by loading the stored token
126+
/// and setting it on `TlsOptions`. For OIDC, automatically refreshes the token
127+
/// if it's near expiry.
128128
fn apply_auth(tls: &mut TlsOptions, gateway_name: &str) {
129129
let Some(meta) = get_gateway_metadata(gateway_name) else {
130130
return;

crates/openshell-cli/src/tls.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ impl TlsOptions {
100100
}
101101
}
102102

103-
/// Returns `true` when using bearer token auth (edge or OIDC).
103+
/// Returns `true` when using bearer token auth.
104104
pub fn is_bearer_auth(&self) -> bool {
105105
self.edge_token.is_some() || self.oidc_token.is_some()
106106
}
@@ -397,9 +397,9 @@ pub async fn build_channel(server: &str, tls: &TlsOptions) -> Result<Channel> {
397397
.keep_alive_while_idle(true);
398398

399399
let tls_config = if tls.oidc_token.is_some() {
400-
// OIDC bearer auth over HTTPS: use mTLS certs for the transport layer
401-
// when available (server may still require client certs), and layer
402-
// the Bearer token on top via the interceptor.
400+
// Bearer auth over HTTPS: use mTLS certs for the transport layer when
401+
// available (server may still require client certs), and layer the
402+
// Bearer token on top via the interceptor.
403403
require_tls_materials(server, tls).map_or_else(
404404
|_| {
405405
let resolved = tls.with_default_paths(server);

crates/openshell-core/src/auth.rs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ use miette::Result;
77

88
/// Interceptor that injects authentication headers into every outgoing gRPC request.
99
///
10-
/// Supports OIDC Bearer tokens (standard `authorization` header) and
11-
/// Cloudflare Access tokens (custom headers). When no token is set, acts
12-
/// as a no-op. OIDC takes precedence over edge tokens.
10+
/// Supports application-layer Bearer tokens (standard `authorization`
11+
/// header) and Cloudflare Access tokens (custom headers). When no token is
12+
/// set, acts as a no-op. OIDC takes precedence over edge tokens.
1313
#[derive(Clone)]
1414
#[allow(clippy::struct_field_names)]
1515
pub struct EdgeAuthInterceptor {
@@ -21,14 +21,14 @@ pub struct EdgeAuthInterceptor {
2121
impl EdgeAuthInterceptor {
2222
/// Create an interceptor from optional token strings.
2323
///
24-
/// OIDC bearer token takes precedence over edge token. Returns a no-op
25-
/// interceptor when neither token is provided.
24+
/// OIDC bearer tokens take precedence over edge tokens. Returns a no-op
25+
/// interceptor when no token is provided.
2626
pub fn new(oidc_token: Option<&str>, edge_token: Option<&str>) -> Result<Self> {
2727
if let Some(token) = oidc_token {
2828
let bearer: tonic::metadata::MetadataValue<tonic::metadata::Ascii> =
2929
format!("Bearer {token}")
3030
.parse()
31-
.map_err(|_| miette::miette!("invalid OIDC token value"))?;
31+
.map_err(|_| miette::miette!("invalid bearer token value"))?;
3232
return Ok(Self {
3333
bearer_value: Some(bearer),
3434
header_value: None,

crates/openshell-core/src/config.rs

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,17 @@ pub struct Config {
205205
#[serde(default)]
206206
pub oidc: Option<OidcConfig>,
207207

208+
/// Gateway user authentication behavior.
209+
#[serde(default)]
210+
pub auth: GatewayAuthConfig,
211+
212+
/// mTLS user authentication configuration. When enabled, a verified TLS
213+
/// client certificate can authenticate CLI/SDK callers as a
214+
/// `Principal::User`. This is for local single-user gateways only;
215+
/// sandbox identity is always carried by gateway-minted sandbox JWTs.
216+
#[serde(default)]
217+
pub mtls_auth: MtlsAuthConfig,
218+
208219
/// Gateway-minted sandbox JWT configuration. When `Some`, the gateway
209220
/// loads the signing key from disk and accepts gateway-issued sandbox
210221
/// JWTs as `Principal::Sandbox`. Required for the per-sandbox identity
@@ -320,6 +331,27 @@ pub struct OidcConfig {
320331
pub scopes_claim: String,
321332
}
322333

334+
/// mTLS user authentication for local, single-user gateways.
335+
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
336+
pub struct MtlsAuthConfig {
337+
/// When true, the gateway maps a verified TLS client certificate into a
338+
/// user principal. Keep disabled for Kubernetes deployments because
339+
/// Kubernetes sandbox pods and external users must not share user auth.
340+
#[serde(default)]
341+
pub enabled: bool,
342+
}
343+
344+
/// Gateway user authentication settings.
345+
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
346+
pub struct GatewayAuthConfig {
347+
/// When true, unauthenticated user/CLI calls are accepted as a local
348+
/// developer principal. This is an unsafe local-development escape hatch
349+
/// for trusted, non-shared gateways. Sandbox supervisor calls still use
350+
/// gateway-minted sandbox JWTs.
351+
#[serde(default)]
352+
pub allow_unauthenticated_users: bool,
353+
}
354+
323355
const fn default_jwks_ttl_secs() -> u64 {
324356
3600
325357
}
@@ -378,6 +410,8 @@ impl Config {
378410
log_level: default_log_level(),
379411
tls,
380412
oidc: None,
413+
auth: GatewayAuthConfig::default(),
414+
mtls_auth: MtlsAuthConfig::default(),
381415
gateway_jwt: None,
382416
database_url: String::new(),
383417
compute_drivers: vec![],
@@ -606,6 +640,12 @@ mod tests {
606640
assert!(cfg.health_bind_address.is_none());
607641
}
608642

643+
#[test]
644+
fn config_disables_unauthenticated_users_by_default() {
645+
let cfg = Config::new(None);
646+
assert!(!cfg.auth.allow_unauthenticated_users);
647+
}
648+
609649
#[test]
610650
fn service_routing_allows_loopback_plaintext_http_by_default() {
611651
let cfg = Config::new(None);

crates/openshell-core/src/lib.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,10 @@ pub mod sandbox_env;
2626
pub mod settings;
2727
pub mod time;
2828

29-
pub use config::{ComputeDriverKind, Config, GatewayJwtConfig, OidcConfig, TlsConfig};
29+
pub use config::{
30+
ComputeDriverKind, Config, GatewayAuthConfig, GatewayJwtConfig, MtlsAuthConfig, OidcConfig,
31+
TlsConfig,
32+
};
3033
pub use error::{ComputeDriverError, Error, Result};
3134
pub use metadata::{GetResourceVersion, ObjectId, ObjectLabels, ObjectName, SetResourceVersion};
3235

crates/openshell-driver-docker/src/lib.rs

Lines changed: 123 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ const SUPERVISOR_MOUNT_PATH: &str = "/opt/openshell/bin/openshell-sandbox";
5555
const TLS_CA_MOUNT_PATH: &str = "/etc/openshell/tls/client/ca.crt";
5656
const TLS_CERT_MOUNT_PATH: &str = "/etc/openshell/tls/client/tls.crt";
5757
const TLS_KEY_MOUNT_PATH: &str = "/etc/openshell/tls/client/tls.key";
58+
const SANDBOX_TOKEN_MOUNT_PATH: &str = "/etc/openshell/auth/sandbox.jwt";
5859
const SANDBOX_COMMAND: &str = "sleep infinity";
5960
const SUPERVISOR_PATH: &str = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin";
6061
const HOST_OPENSHELL_INTERNAL: &str = "host.openshell.internal";
@@ -417,6 +418,7 @@ impl DockerComputeDriver {
417418
.and_then(|spec| spec.template.as_ref())
418419
.expect("validated sandbox has template");
419420
self.ensure_image_available(&template.image).await?;
421+
let token_file_created = write_sandbox_token_file(sandbox, &self.config).await?;
420422

421423
let container_name = container_name_for_sandbox(sandbox);
422424
let create_body = build_container_create_body(sandbox, &self.config)?;
@@ -431,6 +433,9 @@ impl DockerComputeDriver {
431433
)
432434
.await
433435
.map_err(|err| {
436+
if token_file_created {
437+
cleanup_sandbox_token_file(sandbox, &self.config);
438+
}
434439
create_status_from_docker_error("create docker sandbox container", err)
435440
})?;
436441

@@ -450,6 +455,9 @@ impl DockerComputeDriver {
450455
"Failed to clean up Docker container after start failure"
451456
);
452457
}
458+
if token_file_created {
459+
cleanup_sandbox_token_file(sandbox, &self.config);
460+
}
453461
return Err(create_status_from_docker_error(
454462
"start docker sandbox container",
455463
err,
@@ -482,8 +490,14 @@ impl DockerComputeDriver {
482490
)
483491
.await
484492
{
485-
Ok(()) => Ok(true),
486-
Err(err) if is_not_found_error(&err) => Ok(false),
493+
Ok(()) => {
494+
cleanup_sandbox_token_file_by_id(sandbox_id, &self.config);
495+
Ok(true)
496+
}
497+
Err(err) if is_not_found_error(&err) => {
498+
cleanup_sandbox_token_file_by_id(sandbox_id, &self.config);
499+
Ok(false)
500+
}
487501
Err(err) => Err(internal_status("delete docker sandbox container", err)),
488502
}
489503
}
@@ -905,7 +919,10 @@ impl ComputeDriver for DockerComputeDriver {
905919
}
906920
}
907921

908-
fn build_binds(config: &DockerDriverRuntimeConfig) -> Vec<String> {
922+
fn build_binds(
923+
sandbox: &DriverSandbox,
924+
config: &DockerDriverRuntimeConfig,
925+
) -> Result<Vec<String>, Status> {
909926
let mut binds = vec![format!(
910927
"{}:{}:ro,z",
911928
config.supervisor_bin.display(),
@@ -920,7 +937,101 @@ fn build_binds(config: &DockerDriverRuntimeConfig) -> Vec<String> {
920937
));
921938
binds.push(format!("{}:{}:ro,z", tls.key.display(), TLS_KEY_MOUNT_PATH));
922939
}
923-
binds
940+
if sandbox
941+
.spec
942+
.as_ref()
943+
.is_some_and(|spec| !spec.sandbox_token.is_empty())
944+
{
945+
binds.push(format!(
946+
"{}:{}:ro,z",
947+
sandbox_token_host_path(sandbox, config)?.display(),
948+
SANDBOX_TOKEN_MOUNT_PATH
949+
));
950+
}
951+
Ok(binds)
952+
}
953+
954+
fn sandbox_token_host_path(
955+
sandbox: &DriverSandbox,
956+
config: &DockerDriverRuntimeConfig,
957+
) -> Result<PathBuf, Status> {
958+
sandbox_token_host_path_by_id(&sandbox.id, config)
959+
}
960+
961+
fn sandbox_token_host_path_by_id(
962+
sandbox_id: &str,
963+
config: &DockerDriverRuntimeConfig,
964+
) -> Result<PathBuf, Status> {
965+
let base = openshell_core::paths::xdg_state_dir().map_err(|err| {
966+
Status::internal(format!(
967+
"resolve sandbox token state directory failed: {err}"
968+
))
969+
})?;
970+
Ok(base
971+
.join("openshell")
972+
.join("docker-sandbox-tokens")
973+
.join(config.sandbox_namespace.replace(['/', '\\'], "-"))
974+
.join(sandbox_id)
975+
.join("sandbox.jwt"))
976+
}
977+
978+
async fn write_sandbox_token_file(
979+
sandbox: &DriverSandbox,
980+
config: &DockerDriverRuntimeConfig,
981+
) -> Result<bool, Status> {
982+
let Some(spec) = sandbox.spec.as_ref() else {
983+
return Ok(false);
984+
};
985+
if spec.sandbox_token.is_empty() {
986+
return Ok(false);
987+
}
988+
let path = sandbox_token_host_path(sandbox, config)?;
989+
if let Some(parent) = path.parent() {
990+
openshell_core::paths::create_dir_restricted(parent).map_err(|err| {
991+
Status::internal(format!(
992+
"create sandbox token directory {} failed: {err}",
993+
parent.display()
994+
))
995+
})?;
996+
}
997+
tokio::fs::write(&path, format!("{}\n", spec.sandbox_token))
998+
.await
999+
.map_err(|err| {
1000+
Status::internal(format!(
1001+
"write sandbox token file {} failed: {err}",
1002+
path.display()
1003+
))
1004+
})?;
1005+
openshell_core::paths::set_file_owner_only(&path).map_err(|err| {
1006+
Status::internal(format!(
1007+
"restrict sandbox token file {} failed: {err}",
1008+
path.display()
1009+
))
1010+
})?;
1011+
Ok(true)
1012+
}
1013+
1014+
fn cleanup_sandbox_token_file(sandbox: &DriverSandbox, config: &DockerDriverRuntimeConfig) {
1015+
cleanup_sandbox_token_file_by_id(&sandbox.id, config);
1016+
}
1017+
1018+
fn cleanup_sandbox_token_file_by_id(sandbox_id: &str, config: &DockerDriverRuntimeConfig) {
1019+
let Ok(path) = sandbox_token_host_path_by_id(sandbox_id, config) else {
1020+
return;
1021+
};
1022+
if let Err(err) = std::fs::remove_file(&path)
1023+
&& err.kind() != std::io::ErrorKind::NotFound
1024+
{
1025+
warn!(
1026+
sandbox_id = %sandbox_id,
1027+
path = %path.display(),
1028+
error = %err,
1029+
"Failed to remove Docker sandbox token file"
1030+
);
1031+
}
1032+
if let Some(dir) = path.parent() {
1033+
let _ = std::fs::remove_dir(dir);
1034+
}
9241035
}
9251036

9261037
fn build_environment(sandbox: &DriverSandbox, config: &DockerDriverRuntimeConfig) -> Vec<String> {
@@ -979,16 +1090,17 @@ fn build_environment(sandbox: &DriverSandbox, config: &DockerDriverRuntimeConfig
9791090
);
9801091
}
9811092

982-
// Gateway-minted sandbox JWT (PR 3 of the per-sandbox identity series).
983-
// Passed via env var since Docker has no native secret mount that is
984-
// simpler than the existing bind-mount pattern; the trust boundary
985-
// (`docker inspect` access) is already equivalent to the TLS key mount.
1093+
environment.remove(openshell_core::sandbox_env::SANDBOX_TOKEN);
1094+
environment.remove(openshell_core::sandbox_env::SANDBOX_TOKEN_FILE);
1095+
1096+
// Gateway-minted sandbox JWT. Keep the raw bearer out of container
1097+
// metadata; the supervisor reads it from this driver-owned bind mount.
9861098
if let Some(spec) = sandbox.spec.as_ref()
9871099
&& !spec.sandbox_token.is_empty()
9881100
{
9891101
environment.insert(
990-
openshell_core::sandbox_env::SANDBOX_TOKEN.to_string(),
991-
spec.sandbox_token.clone(),
1102+
openshell_core::sandbox_env::SANDBOX_TOKEN_FILE.to_string(),
1103+
SANDBOX_TOKEN_MOUNT_PATH.to_string(),
9921104
);
9931105
}
9941106

@@ -1052,7 +1164,7 @@ fn build_container_create_body(
10521164
nano_cpus: resource_limits.nano_cpus,
10531165
memory: resource_limits.memory_bytes,
10541166
device_requests: docker_gpu_device_requests(spec.gpu, &spec.gpu_device),
1055-
binds: Some(build_binds(config)),
1167+
binds: Some(build_binds(sandbox, config)?),
10561168
restart_policy: Some(RestartPolicy {
10571169
name: Some(RestartPolicyNameEnum::UNLESS_STOPPED),
10581170
maximum_retry_count: None,

0 commit comments

Comments
 (0)