Skip to content

Commit 3ba272a

Browse files
lukekimclaude
andauthored
Add DataFrame API, Expr DSL, functions module, and SDK ergonomics (#151)
* feat: add query_arrow/pandas/polars/pylist helpers on Client Adds one-shot output ergonomics so callers don't have to reach for .read_all().to_pandas() on the underlying Flight or ADBC reader. Each helper accepts an optional `params` list that routes through the existing ADBC parameterized path, otherwise it goes through Flight as before. polars is gated behind an optional extra (`spicepy[polars]`) and raises a clear ImportError if missing. * style: apply black formatting to _client.py * feat: add DataFrame API, expression DSL, functions module, and catalog helpers Adds a credible subset of the datafusion-python surface on top of the existing Flight + ADBC client. Tier 1 (Client method additions): - catalog introspection: catalogs(), schemas(), tables(), describe(), get_schema() - explain(sql, analyze=, verbose=) wrapping EXPLAIN - streaming/output: query_pydict(), query_batches() iterator, show() - writers: write_parquet(), write_csv(), write_json() streaming Flight output to local files - DataFrame entry points: table(), sql(), from_arrow(), from_pandas(), from_pydict() Tier 2 (new modules): - spicepy._sql: identifier and literal escape helpers - spicepy._expr: Expr DSL with arithmetic/comparison/logical operator overloads, alias, cast, is_null, in_, between, asc/desc, CASE WHEN, window OVER(); col(), lit(), case() public builders - spicepy.functions: aggregates (sum/avg/min/max/count/count_distinct/ stddev/variance/median/...), math (abs/round/ceil/floor/sqrt/power/ ln/log/exp), strings (lower/upper/length/trim/concat/substr/replace/ regexp_match/starts_with/ends_with), date/time (now/current_date/ date_trunc/date_part/extract), null/control flow (coalesce/nullif/ ifnull/case), window-only (row_number/rank/dense_rank/percent_rank/ cume_dist/lag/lead/first_value/last_value/nth_value) - spicepy._dataframe.SpiceDataFrame: lazy SQL-compiling builder with select/with_column(s)/drop/rename/cast, filter/where/limit/head/ offset, sort/order_by/distinct, union/intersect/except_, join (inner/left/right/full/semi/anti/cross) with key list or Expr, group_by().aggregate(), aggregate() (global), schema/explain, collect/to_arrow/to_pandas/to_polars/to_pylist/to_pydict/count/show - inline VALUES path for small client-side data via from_arrow/ from_pandas/from_pydict 239 new tests (test_sql, test_expr, test_functions, test_dataframe, extensions to test_client). * chore(bandit): skip B608 — escaped SQL composition is the SDK's job * docs(expr): explain why subclasses must not override __eq__ Adds a note to Expr's docstring documenting that the comparison operators build SQL expression trees (DSL pattern, same as SQLAlchemy, pandas, polars, Ibis, datafusion-python) and that subclass __eq__ overrides would silently break filtering/joins. Addresses a wave of code-quality bot reviews that misapply a value-semantics __eq__ rule to a DSL. * chore(deps): fold dependency updates into feature PR * feat: add GITHUB_TOKEN environment variable for Spice installation in WSL on Windows --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 8d0250c commit 3ba272a

16 files changed

Lines changed: 2950 additions & 1156 deletions

.github/workflows/test.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ jobs:
9797
# guest).
9898
- name: Set up WSL (Windows)
9999
if: matrix.os == 'windows-latest'
100-
uses: Vampire/setup-wsl@887f39deb6c0976365e546926fe66f41b77d65ff # v6.1.0
100+
uses: Vampire/setup-wsl@d1da7f2c0322a5ee4f24975344f67fc0f5baf364 # v7.0.0
101101
with:
102102
distribution: Ubuntu-24.04
103103
additional-packages: |
@@ -107,6 +107,8 @@ jobs:
107107
- name: install Spice in WSL (Windows)
108108
if: matrix.os == 'windows-latest'
109109
shell: wsl-bash {0}
110+
env:
111+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
110112
run: |
111113
curl https://install.spiceai.org | /bin/bash
112114
$HOME/.spice/bin/spice install
@@ -220,7 +222,7 @@ jobs:
220222
pytest --cov=spicepy --cov-report=xml --cov-report=term-missing --ignore=tests/test_main.py tests/
221223
- name: Upload coverage to Codecov
222224
if: matrix.python-version == '3.12'
223-
uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v4.5.0
225+
uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
224226
with:
225227
files: ./coverage.xml
226228
fail_ci_if_error: false

pyproject.toml

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ readme = "README.md"
1010
license = {text = "Apache-2.0"}
1111
requires-python = ">=3.11"
1212
dependencies = [
13-
"pyarrow>=23.0.1",
13+
"pyarrow>=24.0.0",
1414
"pandas>=3.0.2",
15-
"certifi>=2026.2.25",
15+
"certifi>=2026.4.22",
1616
"requests>=2.33.1",
1717
]
1818
authors = [
@@ -41,26 +41,30 @@ Issues = "https://github.com/spiceai/spicepy/issues"
4141
test = [
4242
"pylint>=4.0.5",
4343
"flake8>=7.3.0",
44-
"ruff>=0.15.10",
45-
"mypy>=1.20.0",
44+
"ruff>=0.15.12",
45+
"mypy>=1.20.2",
4646
"pytest>=9.0.3",
4747
"pytest-cov>=7.1.0",
4848
"pytest-xdist>=3.8.0",
4949
"pytest-timeout>=2.4.0",
5050
"pytest_httpserver==1.1.5",
51-
"types-requests>=2.33.0",
52-
"pandas-stubs>=3.0.0",
51+
"types-requests>=2.33.0.20260408",
52+
"pandas-stubs>=3.0.0.260204",
5353
"black>=26.3.1",
5454
"bandit>=1.9.4",
5555
"pandas>=3.0.2",
56-
"pyarrow>=23.0.1",
56+
"pyarrow>=24.0.0",
5757
"adbc-driver-flightsql>=1.11.0",
5858
"adbc-driver-manager>=1.11.0",
59+
"polars>=1.0.0",
5960
]
6061
params = [
6162
"adbc-driver-flightsql>=1.11.0",
6263
"adbc-driver-manager>=1.11.0",
6364
]
65+
polars = [
66+
"polars>=1.0.0",
67+
]
6468

6569
# ============== Tool Configuration ==============
6670

@@ -93,6 +97,8 @@ module = [
9397
"adbc_driver_manager.*",
9498
"certifi",
9599
"pandas",
100+
"polars",
101+
"polars.*",
96102
"_pytest.*",
97103
"pytest.*",
98104
]
@@ -185,6 +191,14 @@ ignore = [
185191
"PLR2004", # Magic values ok in tests
186192
"ARG", # Unused arguments ok in tests (fixtures)
187193
"T20", # Print statements ok in tests
194+
"N812", # `functions as F` is the DataFrame convention
195+
"S608", # asserting SQL fragments is the point of these tests
196+
]
197+
"spicepy/_client.py" = [
198+
"S608", # SQL is constructed from escaped identifiers/literals; lint can't tell
199+
]
200+
"spicepy/_dataframe.py" = [
201+
"S608", # The DataFrame layer's entire job is composing SQL; identifiers and literals are escaped
188202
]
189203

190204
[tool.ruff.lint.isort]
@@ -246,9 +260,12 @@ directory = "htmlcov"
246260

247261
[tool.bandit]
248262
exclude_dirs = ["tests", ".venv"]
249-
skips = ["B101"] # assert_used
263+
skips = [
264+
"B101", # assert_used
265+
"B608", # hardcoded_sql_expressions: composing SQL with escaped identifiers/literals is this package's job
266+
]
250267

251268
[dependency-groups]
252269
dev = [
253-
"build>=1.4.3",
270+
"build>=1.4.4",
254271
]

requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
pyarrow>=23.0.1
1+
pyarrow>=24.0.0
22
pandas>=3.0.2
3-
certifi>=2026.2.25
3+
certifi>=2026.4.22
44
requests>=2.33.1

spicepy/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,19 @@
55
"""
66

77
# flake8: noqa
8+
from . import functions
89
from ._client import Client
10+
from ._dataframe import SpiceDataFrame
11+
from ._expr import Expr, case, col, lit
912
from ._http import RefreshOpts
13+
14+
__all__ = [
15+
"Client",
16+
"Expr",
17+
"RefreshOpts",
18+
"SpiceDataFrame",
19+
"case",
20+
"col",
21+
"functions",
22+
"lit",
23+
]

0 commit comments

Comments
 (0)