Skip to content

Commit 88d467e

Browse files
committed
fix: Correct ScyllaDB connector filter pushdown documentation
The docs incorrectly stated that filter pushdown was entirely disabled and all data was fetched via SELECT * queries. The actual implementation pushes down partition key equality filters (Exact) and clustering key comparison filters (Inexact, when partition key is present). Updated both vNext and v1.11.x docs to accurately describe the pushdown behavior, fixed the misleading "by default" on batch size (it is hardcoded), and corrected the Performance Considerations and Limitations sections.
1 parent e5fcb73 commit 88d467e

2 files changed

Lines changed: 42 additions & 20 deletions

File tree

  • website

website/docs/components/data-connectors/scylladb.md

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -123,11 +123,24 @@ For decimals that exceed this precision, values may be truncated or rounded.
123123

124124
## Query Execution
125125

126-
Due to fundamental differences between CQL and SQL, the connector implements a local filtering strategy. All data is fetched from ScyllaDB using `SELECT *` queries, and filtering, joins, aggregations, and other SQL operations are performed locally by DataFusion.
126+
The connector pushes down partition key and clustering key filters to CQL where possible. Filters that CQL cannot express are evaluated locally by DataFusion after data retrieval. Joins, aggregations, and other SQL operations are always performed locally.
127127

128-
### Why Filter Pushdown is Disabled
128+
### Filter Pushdown
129129

130-
CQL lacks many SQL constructs:
130+
The following filters are pushed down to ScyllaDB:
131+
132+
| Filter type | Operators | Pushdown behavior |
133+
| --- | --- | --- |
134+
| Partition key equality | `=` | Always pushed down (Exact) |
135+
| Clustering key comparison | `=`, `<`, `<=`, `>`, `>=` | Pushed down when a partition key equality filter is present (Inexact) |
136+
| Regular column filters | Any | Not pushed down — evaluated locally by DataFusion |
137+
| OR conditions, complex expressions | Any | Not pushed down — evaluated locally by DataFusion |
138+
139+
Clustering key filters are marked as **Inexact**, meaning DataFusion re-checks them after retrieval to ensure correctness.
140+
141+
### CQL vs SQL
142+
143+
CQL lacks many SQL constructs, which is why most filter types cannot be pushed down:
131144

132145
| Feature | SQL | CQL |
133146
| ---------------- | --- | --- |
@@ -141,19 +154,17 @@ CQL lacks many SQL constructs:
141154
| COUNT(DISTINCT) | ✅ | ❌ |
142155
| Arbitrary WHERE | ✅ | ❌ |
143156

144-
Because of these limitations, the connector:
157+
### Projection Pushdown
145158

146-
1. Fetches full table data using `SELECT * FROM keyspace.table`
147-
2. Lets DataFusion handle all filtering locally after data retrieval
148-
3. Supports projection pushdown—only requested columns are transferred
159+
Projection pushdown is supported — only the columns referenced in the query are fetched from ScyllaDB.
149160

150161
### Streaming Execution
151162

152-
Query results are streamed using the scylla driver's paging mechanism in batches of 8192 rows by default, minimizing memory usage for large result sets.
163+
Query results are streamed using the scylla driver's paging mechanism in batches of 8192 rows, minimizing memory usage for large result sets.
153164

154165
## Performance Considerations
155166

156-
Since filter pushdown is disabled, all table data is transferred for each query. Consider the following optimizations:
167+
Partition key and clustering key filters reduce the amount of data transferred from ScyllaDB, but queries without these filters fetch all table data. Consider the following optimizations:
157168

158169
### Enable Acceleration
159170

@@ -266,7 +277,7 @@ The following SQL operations cannot be pushed down to ScyllaDB and are performed
266277
- **Aggregations**: COUNT, SUM, AVG, etc. are computed locally
267278
- **Subqueries**: Nested queries are not supported in CQL
268279
- **Window functions**: RANK, ROW_NUMBER, etc. not supported
269-
- **Complex WHERE clauses**: CQL requires partition key in WHERE; Spice fetches all data
280+
- **Complex WHERE clauses**: Only partition key equality and clustering key comparisons are pushed down; other filters are evaluated locally
270281
- **ORDER BY**: Sorting is done locally
271282

272283
### Connector Limitations

website/versioned_docs/version-1.11.x/components/data-connectors/scylladb.md

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -123,11 +123,24 @@ For decimals that exceed this precision, values may be truncated or rounded.
123123

124124
## Query Execution
125125

126-
Due to fundamental differences between CQL and SQL, the connector implements a local filtering strategy. All data is fetched from ScyllaDB using `SELECT *` queries, and filtering, joins, aggregations, and other SQL operations are performed locally by DataFusion.
126+
The connector pushes down partition key and clustering key filters to CQL where possible. Filters that CQL cannot express are evaluated locally by DataFusion after data retrieval. Joins, aggregations, and other SQL operations are always performed locally.
127127

128-
### Why Filter Pushdown is Disabled
128+
### Filter Pushdown
129129

130-
CQL lacks many SQL constructs:
130+
The following filters are pushed down to ScyllaDB:
131+
132+
| Filter type | Operators | Pushdown behavior |
133+
| --- | --- | --- |
134+
| Partition key equality | `=` | Always pushed down (Exact) |
135+
| Clustering key comparison | `=`, `<`, `<=`, `>`, `>=` | Pushed down when a partition key equality filter is present (Inexact) |
136+
| Regular column filters | Any | Not pushed down — evaluated locally by DataFusion |
137+
| OR conditions, complex expressions | Any | Not pushed down — evaluated locally by DataFusion |
138+
139+
Clustering key filters are marked as **Inexact**, meaning DataFusion re-checks them after retrieval to ensure correctness.
140+
141+
### CQL vs SQL
142+
143+
CQL lacks many SQL constructs, which is why most filter types cannot be pushed down:
131144

132145
| Feature | SQL | CQL |
133146
| ---------------- | --- | --- |
@@ -141,19 +154,17 @@ CQL lacks many SQL constructs:
141154
| COUNT(DISTINCT) | ✅ | ❌ |
142155
| Arbitrary WHERE | ✅ | ❌ |
143156

144-
Because of these limitations, the connector:
157+
### Projection Pushdown
145158

146-
1. Fetches full table data using `SELECT * FROM keyspace.table`
147-
2. Lets DataFusion handle all filtering locally after data retrieval
148-
3. Supports projection pushdown—only requested columns are transferred
159+
Projection pushdown is supported — only the columns referenced in the query are fetched from ScyllaDB.
149160

150161
### Streaming Execution
151162

152-
Query results are streamed using the scylla driver's paging mechanism in batches of 8192 rows by default, minimizing memory usage for large result sets.
163+
Query results are streamed using the scylla driver's paging mechanism in batches of 8192 rows, minimizing memory usage for large result sets.
153164

154165
## Performance Considerations
155166

156-
Since filter pushdown is disabled, all table data is transferred for each query. Consider the following optimizations:
167+
Partition key and clustering key filters reduce the amount of data transferred from ScyllaDB, but queries without these filters fetch all table data. Consider the following optimizations:
157168

158169
### Enable Acceleration
159170

@@ -266,7 +277,7 @@ The following SQL operations cannot be pushed down to ScyllaDB and are performed
266277
- **Aggregations**: COUNT, SUM, AVG, etc. are computed locally
267278
- **Subqueries**: Nested queries are not supported in CQL
268279
- **Window functions**: RANK, ROW_NUMBER, etc. not supported
269-
- **Complex WHERE clauses**: CQL requires partition key in WHERE; Spice fetches all data
280+
- **Complex WHERE clauses**: Only partition key equality and clustering key comparisons are pushed down; other filters are evaluated locally
270281
- **ORDER BY**: Sorting is done locally
271282

272283
### Connector Limitations

0 commit comments

Comments
 (0)