Skip to content

[pull] trunk from spiceai:trunk#754

Merged
pull[bot] merged 1 commit into
TheRakeshPurohit:trunkfrom
spiceai:trunk
Apr 19, 2026
Merged

[pull] trunk from spiceai:trunk#754
pull[bot] merged 1 commit into
TheRakeshPurohit:trunkfrom
spiceai:trunk

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 19, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* Add schema decomposition to HTTP connector

Mirrors the DynamoDB connector's `json_object: "*"` column-metadata
feature in the HTTP connector. When a dataset declares a set of
`columns:` with exactly one column marked `metadata.json_object: "*"`,
each JSON row returned by the endpoint is decomposed:

- Declared static columns are projected as top-level Utf8 fields (nulls
  preserved for absent keys).
- All remaining keys are gathered into the marked catch-all column as a
  sorted JSON object string.

No data is silently dropped: every key lands in either a static column
or the catch-all JSON.

Also adds user-facing docs covering schema decomposition for both the
DynamoDB and HTTP connectors with TVmaze examples.

* fix: address schema-decomposition PR review comments

Round of Copilot review fixes on the HTTP schema-decomposition PR:

- json_nest.rs: switch the static_fields filter to compare via
  `.as_str()` instead of the prior `**c != json_field_name`. Both forms
  compile, but the `.as_str()` form is clearer to readers.
- provider.rs: in `From<Error> for DataFusionError`, box the outer
  `Error::JsonNesting { source }` so the variant's display context
  ("Failed to decompose HTTP response row into declared columns ...")
  travels with the `External` error instead of being stripped.
- provider.rs: unit coverage for `create_batch_from_rows_nested` via
  `HttpExec` — object rows, missing keys, non-object rows, empty
  catch-all, empty-projection fallback, and dispatch through the
  non-nested entry point.
- https.rs: unit coverage for `parse_http_json_nesting` — missing
  columns, missing marker, valid "*" marker, multiple markers,
  non-wildcard string, and non-string marker values.

* fix: satisfy clippy pedantic in schema-decomposition code

- Backtick `DynamoDB` in two doc comments so clippy::doc_markdown passes.
- Rename `marker` → `marker_value` inside `parse_http_json_nesting` so it
  no longer trips clippy::similar_names against the surrounding `marked`
  binding.

* perf: skip catch-all JSON build when projection excludes it

Address Copilot follow-up review: when a query projection doesn't
include the catch-all column, the previous implementation still
decomposed every row into a HashMap and serialized the catch-all
JSON object — wasted work that scales with row width.

The nested batch builder now:

- walks the projection once, detects whether the catch-all column is
  included, and writes values directly into per-column StringBuilders
  in a single pass instead of materializing a full decomposed row per
  input;
- takes a fast path for object rows when the catch-all isn't
  projected: reads static fields straight out of the parsed JSON
  object and skips the HashMap + serialize step;
- falls back to the existing decompose_json_row path when the
  catch-all is needed or the row isn't a JSON object, which keeps
  error propagation and non-object row handling identical.

Added two tests: one locks down the fast path (narrow projection
against wide rows) and one covers the fall-through on non-object
rows when the catch-all isn't projected.
@pull pull Bot locked and limited conversation to collaborators Apr 19, 2026
@pull pull Bot added the ⤵️ pull label Apr 19, 2026
@pull pull Bot merged commit a874314 into TheRakeshPurohit:trunk Apr 19, 2026
1 of 10 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant