Skip to content

Commit 5374644

Browse files
lukekimkrinart
andauthored
Harden /v1/tools and /v1/nsql against unauthenticated / LLM-driven SQL (spiceai#10365)
* Harden /v1/tools and /v1/nsql against unauthenticated / LLM-driven SQL Addresses threat model items #50 and #51 (docs/threat_models/v2.0.0.md): - Add strict read-only SQL validator (validate_sql_query_read_only) that rejects every DDL/DML/COPY/non-prepared Statement node regardless of per-catalog writability. - Plumb a read_only flag through QueryBuilder/Query and apply the validator at all three plan execution sites (local, Ballista, async). - Default the built-in `sql` tool to read-only; operators may opt in via SqlTool::allow_writes(). LLM tool-use can no longer mutate data through the sql tool. - Run LLM-generated SQL from /v1/nsql under the read-only validator so prompt-injection-driven writes cannot reach writable catalogs. - Gate /v1/tools/* behind a require_auth_configured middleware: when runtime.auth is not set, these routes return 401 rather than invoking tool.call anonymously with attacker-controlled bodies. - Record the new mitigations in the v2.0.0 threat model. * refactor: clarify read-only SQL validation comments and enhance documentation for DDL/DML restrictions * Refactor authentication error response to use JSON format and add SQL tool descriptions for read-only and writable modes * Fix collapsible_if clippy lint in read-only validation path * Reject write-capable extension nodes in read-only validator Spice's planner can represent DDL/DML as LogicalPlan::Extension nodes (DdlExtensionNode, DmlExtensionNode, DistributedCayenne{Insert,Update, Delete,Merge}Node, CayenneMergeNode). The previous read-only validator only matched Ddl/Dml/Copy/Statement and would have let those plan shapes through, defeating the read-only guarantee on /v1/tools/sql and /v1/nsql. - Add Extension arm to validate_sql_query_read_only that denies any node whose UserDefinedLogicalNodeCore::name matches a curated list of write-capable extension names. - Test the deny mechanism with a stub UserDefinedLogicalNode and verify a non-write extension name is still allowed. - Add an integration test that exercises Spice's create_logical_plan wrapper end-to-end (cfg(not(windows))). - Reflect the PREPARE/EXECUTE/DEALLOCATE rejection in the SqlTool read-only description so LLM/tool-selection logic knows the posture. - Replace the PR-contextual 'Unverified in this review' phrasing in the threat model with the durable 'Unverified mitigation'. * Bypass SQL results cache for read-only query paths When ctx.read_only is set (e.g. the /v1/tools/sql read-only tool and the /v1/nsql LLM SQL path), both the SQL-keyed and plan-keyed results-cache lookups are now skipped inside get_plan_or_cached, and the returned RequestCacheManager is forced to CacheDisabled. Previously, a cache hit from a prior writable execution could short-circuit validate_sql_query_read_only, letting a cached result produced by a write-capable plan (e.g. LogicalPlan::Extension nodes like DmlExtension or DistributedCayenneInsert) be served on a read-only surface. Also move WRITE_CAPABLE_EXTENSION_NAMES into the cache crate as the single source of truth, and extend cache_is_enabled_for_plan to reject write-capable LogicalPlan::Extension nodes. Defense-in-depth: even on writable paths, write-capable extension plans are now never cached or populated in the results cache. * fix: flatten write-capable extension check to match guard in cache eligibility Removes one level of nesting as requested in review. --------- Co-authored-by: Viktor Yershov <viktor@spice.ai>
1 parent 924691c commit 5374644

9 files changed

Lines changed: 875 additions & 42 deletions

File tree

crates/cache/src/lib.rs

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,31 @@ pub use utils::filter_transient_error_responses;
5959
pub use utils::get_logical_plan_input_tables;
6060
pub use utils::to_cached_record_batch_stream;
6161

62+
/// Stable [`datafusion::logical_expr::UserDefinedLogicalNodeCore::name`] values for
63+
/// every Spice logical-plan extension node that performs (or dispatches) a write,
64+
/// a schema mutation, or any other side-effect that must not be reachable via a
65+
/// read-only SQL path and must not be served from or populated into the SQL
66+
/// results cache.
67+
///
68+
/// Keep this list in sync with:
69+
/// - `datafusion_ddl::DdlExtensionNode` → `"DdlExtension"`
70+
/// - `datafusion_dml::DmlExtensionNode` → `"DmlExtension"`
71+
/// - `cayenne::ddl::logical_nodes::CayenneMergeNode` → `"CayenneMerge"`
72+
/// - `runtime::datafusion::cayenne_ddl::logical_nodes::DistributedCayenne{Insert,Update,Delete}Node`
73+
/// → `"CayenneInsert"` / `"CayenneUpdate"` / `"CayenneDelete"` (they reuse the
74+
/// non-distributed names by design)
75+
/// - `runtime::datafusion::cayenne_ddl::logical_nodes::DistributedCayenneMergeNode`
76+
/// → `"DistributedCayenneMerge"`
77+
pub const WRITE_CAPABLE_EXTENSION_NAMES: &[&str] = &[
78+
"DdlExtension",
79+
"DmlExtension",
80+
"CayenneInsert",
81+
"CayenneUpdate",
82+
"CayenneDelete",
83+
"CayenneMerge",
84+
"DistributedCayenneMerge",
85+
];
86+
6287
use crate::result::embeddings::CachedEmbeddingResult;
6388

6489
#[derive(Debug, Snafu)]
@@ -551,6 +576,11 @@ impl QueryResultsCacheProvider {
551576
| LogicalPlan::Dml(..)
552577
| LogicalPlan::Copy { .. }
553578
| LogicalPlan::Statement(..) => return false,
579+
LogicalPlan::Extension(ext)
580+
if WRITE_CAPABLE_EXTENSION_NAMES.contains(&ext.node.name()) =>
581+
{
582+
return false;
583+
}
554584
_ => {}
555585
}
556586

crates/runtime/src/datafusion/query.rs

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,9 @@ use super::{
8181

8282
use super::managed_runtime;
8383
use crate::datafusion::{
84-
DataFusion, query::cache::RequestCacheManager, sql_validator::validate_sql_query_operations,
84+
DataFusion,
85+
query::cache::RequestCacheManager,
86+
sql_validator::{validate_sql_query_operations, validate_sql_query_read_only},
8587
};
8688
use managed_runtime::ManagedRuntimeError;
8789
use opentelemetry::KeyValue;
@@ -190,6 +192,11 @@ pub struct Query {
190192
df: Arc<crate::datafusion::DataFusion>,
191193
sql: QueryMethod,
192194
tracker: Option<QueryTracker>,
195+
/// When true, the validator additionally rejects DDL, DML, COPY, or any
196+
/// `LogicalPlan::Statement` node (including PREPARE/EXECUTE/DEALLOCATE),
197+
/// regardless of per-catalog writability. Set via [`QueryBuilder::read_only`];
198+
/// used by `/v1/tools/sql` and `/v1/nsql` to contain LLM-generated SQL.
199+
read_only: bool,
193200
}
194201

195202
macro_rules! handle_error {
@@ -310,14 +317,17 @@ impl Query {
310317
sql, parameters, ..
311318
} => {
312319
// Use the existing get_plan_or_cached which handles all cache control,
313-
// stale-while-revalidate, and query tracking
320+
// stale-while-revalidate, and query tracking. `read_only` is
321+
// threaded through so cached results cannot bypass
322+
// `validate_sql_query_read_only` below.
314323
match Query::get_plan_or_cached(
315324
&self.df,
316325
&session,
317326
Arc::clone(&request_context),
318327
sql,
319328
parameters.clone(),
320329
tracker,
330+
self.read_only,
321331
)
322332
.await?
323333
{
@@ -389,6 +399,12 @@ impl Query {
389399
let e = find_datafusion_root(e);
390400
return Err(Error::UnableToExecuteQuery { source: e });
391401
}
402+
if self.read_only
403+
&& let Err(e) = validate_sql_query_read_only(&plan)
404+
{
405+
let e = find_datafusion_root(e);
406+
return Err(Error::UnableToExecuteQuery { source: e });
407+
}
392408

393409
// Get the schema from the logical plan
394410
let schema = Arc::new(plan.schema().as_arrow().clone());
@@ -589,6 +605,7 @@ impl Query {
589605
sql,
590606
parameters.clone(),
591607
tracker,
608+
ctx.read_only,
592609
)
593610
.await?
594611
{
@@ -619,6 +636,19 @@ impl Query {
619636
)
620637
}
621638

639+
if ctx.read_only
640+
&& let Err(e) = validate_sql_query_read_only(&plan)
641+
{
642+
let e = find_datafusion_root(e);
643+
handle_error!(
644+
tracker,
645+
&request_context,
646+
ErrorCode::QueryPlanningError,
647+
e,
648+
UnableToExecuteQuery
649+
)
650+
}
651+
622652
// Proactively invalidate cached query state for tables affected by
623653
// DML mutations (INSERT, DELETE, UPDATE).
624654
// - results cache must be cleared so repeated SQL does not replay
@@ -878,6 +908,7 @@ impl Query {
878908
df: Arc::clone(df),
879909
sql: QueryMethod::Plan(Box::new(plan.clone())),
880910
tracker: None,
911+
read_only: false,
881912
}
882913
}
883914

@@ -930,6 +961,13 @@ impl Query {
930961
self.handle_schema_error(&request_context, &e);
931962
return Err(e);
932963
}
964+
if self.read_only
965+
&& let Err(e) = validate_sql_query_read_only(&plan)
966+
{
967+
let e = find_datafusion_root(e);
968+
self.handle_schema_error(&request_context, &e);
969+
return Err(e);
970+
}
933971
let dataset_schema = plan.schema().as_arrow().clone();
934972
let parameter_schema = parameter_schema_for_plan(&plan)?;
935973

crates/runtime/src/datafusion/query/builder.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ pub struct QueryBuilder<'a> {
3131
parameters: Option<ParamValues>,
3232
table_allowlist: Option<ResolvedTableAwareAllowlist>,
3333
query_id: Uuid,
34+
read_only: bool,
3435
}
3536

3637
impl<'a> QueryBuilder<'a> {
@@ -41,6 +42,7 @@ impl<'a> QueryBuilder<'a> {
4142
parameters: None,
4243
query_id: Uuid::new_v4(),
4344
table_allowlist: None,
45+
read_only: false,
4446
}
4547
}
4648

@@ -62,6 +64,22 @@ impl<'a> QueryBuilder<'a> {
6264
self
6365
}
6466

67+
/// Enforce read-only SQL execution.
68+
///
69+
/// When enabled, the planned query is additionally checked with
70+
/// [`crate::datafusion::sql_validator::validate_sql_query_read_only`] and rejected if it
71+
/// contains any DDL, DML, COPY, or `LogicalPlan::Statement` node (including
72+
/// `PREPARE`/`EXECUTE`/`DEALLOCATE`) — regardless of whether the target
73+
/// catalogs/datasets are individually marked writable.
74+
///
75+
/// Used by surfaces that execute SQL on behalf of an LLM or unauthenticated caller
76+
/// (the built-in `sql` tool, `/v1/nsql`).
77+
#[must_use]
78+
pub fn read_only(mut self, read_only: bool) -> Self {
79+
self.read_only = read_only;
80+
self
81+
}
82+
6583
#[must_use]
6684
pub fn build(self) -> Query {
6785
let sql: Arc<str> = self.sql.into();
@@ -91,6 +109,7 @@ impl<'a> QueryBuilder<'a> {
91109
table_allowlist: self.table_allowlist,
92110
},
93111
tracker,
112+
read_only: self.read_only,
94113
}
95114
}
96115
}

crates/runtime/src/datafusion/query/cache.rs

Lines changed: 66 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -96,13 +96,27 @@ enum CacheResult {
9696

9797
impl Query {
9898
/// Returns a `LogicalPlan` if the result is not cached and needs to be executed, otherwise returns a cached `QueryResult`.
99+
///
100+
/// When `read_only` is true, both the SQL-keyed and plan-keyed results-cache
101+
/// lookups are skipped, and the returned [`RequestCacheManager`] is forced to
102+
/// [`CacheStatus::CacheDisabled`]. This is required because the read-only
103+
/// contract (enforced by [`super::validate_sql_query_read_only`]) runs on the
104+
/// [`LogicalPlan`] *after* `get_plan_or_cached` returns — a cache hit would
105+
/// otherwise short-circuit validation and let write-capable plans served from
106+
/// a prior cache-store bypass the read-only guarantee on `/v1/tools/sql` and
107+
/// `/v1/nsql`. The existing cache-eligibility check
108+
/// ([`cache::QueryResultsCacheProvider::cache_is_enabled_for_plan`]) only
109+
/// filters the classic DDL/DML/Copy/Statement plan variants and does not
110+
/// cover Spice's write-capable [`LogicalPlan::Extension`] nodes (e.g.
111+
/// `DmlExtension`, `DistributedCayenneInsert`).
99112
pub(super) async fn get_plan_or_cached(
100113
df: &Arc<DataFusion>,
101114
session: &SessionState,
102115
request_context: Arc<RequestContext>,
103116
sql: &str,
104117
parameters: Option<ParamValues>,
105118
tracker: Option<QueryTracker>,
119+
read_only: bool,
106120
) -> super::Result<PlanOrCached> {
107121
let cache_control = request_context.cache_control();
108122
let sql_cache_key = CacheKey::Query(sql, parameters.as_ref());
@@ -117,25 +131,32 @@ impl Query {
117131
_ => sql_cache_key,
118132
};
119133

120-
// Try to get cached results from SQL or client key
134+
// Try to get cached results from SQL or client key. When `read_only` is
135+
// true, skip the cache lookup entirely so read-only validation always
136+
// gets a chance to run on the freshly-planned query.
121137
let CacheResponse {
122138
tracker,
123139
raw_key: sql_or_client_raw_key,
124140
..
125-
} = match Self::try_get_cached_result(
126-
df,
127-
&request_context,
128-
tracker,
129-
&sql_or_user_cache_key,
130-
sql,
131-
)
132-
.await?
133-
{
134-
CacheResponse {
135-
result: CacheResult::Hit(result),
136-
..
137-
} => return Ok(PlanOrCached::Cached(result)),
138-
response => response,
141+
} = if read_only {
142+
CacheResponse::from(CacheResult::MissOrSkipped, CacheStatus::CacheDisabled)
143+
.with_query_tracker(tracker)
144+
} else {
145+
match Self::try_get_cached_result(
146+
df,
147+
&request_context,
148+
tracker,
149+
&sql_or_user_cache_key,
150+
sql,
151+
)
152+
.await?
153+
{
154+
CacheResponse {
155+
result: CacheResult::Hit(result),
156+
..
157+
} => return Ok(PlanOrCached::Cached(result)),
158+
response => response,
159+
}
139160
};
140161

141162
let sql_raw_cache_key = sql_cache_key.as_raw_key(Self::plan_hasher(df));
@@ -154,26 +175,32 @@ impl Query {
154175
}
155176
};
156177

157-
// Try to get cached results from plan
178+
// Try to get cached results from plan (skipped for read-only, same
179+
// reasoning as the SQL-keyed lookup above).
158180
let CacheResponse {
159181
mut tracker,
160182
raw_key: plan_raw_cache_key,
161183
status,
162184
..
163-
} = match Self::try_get_cached_result(
164-
df,
165-
&request_context,
166-
tracker,
167-
&CacheKey::LogicalPlan(&plan),
168-
sql,
169-
)
170-
.await?
171-
{
172-
CacheResponse {
173-
result: CacheResult::Hit(result),
174-
..
175-
} => return Ok(PlanOrCached::Cached(result)),
176-
response => response,
185+
} = if read_only {
186+
CacheResponse::from(CacheResult::MissOrSkipped, CacheStatus::CacheDisabled)
187+
.with_query_tracker(tracker)
188+
} else {
189+
match Self::try_get_cached_result(
190+
df,
191+
&request_context,
192+
tracker,
193+
&CacheKey::LogicalPlan(&plan),
194+
sql,
195+
)
196+
.await?
197+
{
198+
CacheResponse {
199+
result: CacheResult::Hit(result),
200+
..
201+
} => return Ok(PlanOrCached::Cached(result)),
202+
response => response,
203+
}
177204
};
178205

179206
let request_raw_cache_key = match request_context.cache_control() {
@@ -185,7 +212,15 @@ impl Query {
185212
}
186213
.unwrap_or(sql_raw_cache_key);
187214

188-
let cache_status = Self::should_cache_results(df, &plan, status);
215+
// Read-only requests must also not populate the results cache — the
216+
// plan has not yet been validated at this point, and we don't want a
217+
// writable surface's cached output to leak back through a read-only
218+
// caller on a later identical query.
219+
let cache_status = if read_only {
220+
CacheStatus::CacheDisabled
221+
} else {
222+
Self::should_cache_results(df, &plan, status)
223+
};
189224
tracker = tracker.map(|t| t.results_cache_hit(false));
190225

191226
Ok(PlanOrCached::Plan(

0 commit comments

Comments
 (0)