[pull] trunk from spiceai:trunk#754
Merged
Merged
Conversation
* Add schema decomposition to HTTP connector
Mirrors the DynamoDB connector's `json_object: "*"` column-metadata
feature in the HTTP connector. When a dataset declares a set of
`columns:` with exactly one column marked `metadata.json_object: "*"`,
each JSON row returned by the endpoint is decomposed:
- Declared static columns are projected as top-level Utf8 fields (nulls
preserved for absent keys).
- All remaining keys are gathered into the marked catch-all column as a
sorted JSON object string.
No data is silently dropped: every key lands in either a static column
or the catch-all JSON.
Also adds user-facing docs covering schema decomposition for both the
DynamoDB and HTTP connectors with TVmaze examples.
* fix: address schema-decomposition PR review comments
Round of Copilot review fixes on the HTTP schema-decomposition PR:
- json_nest.rs: switch the static_fields filter to compare via
`.as_str()` instead of the prior `**c != json_field_name`. Both forms
compile, but the `.as_str()` form is clearer to readers.
- provider.rs: in `From<Error> for DataFusionError`, box the outer
`Error::JsonNesting { source }` so the variant's display context
("Failed to decompose HTTP response row into declared columns ...")
travels with the `External` error instead of being stripped.
- provider.rs: unit coverage for `create_batch_from_rows_nested` via
`HttpExec` — object rows, missing keys, non-object rows, empty
catch-all, empty-projection fallback, and dispatch through the
non-nested entry point.
- https.rs: unit coverage for `parse_http_json_nesting` — missing
columns, missing marker, valid "*" marker, multiple markers,
non-wildcard string, and non-string marker values.
* fix: satisfy clippy pedantic in schema-decomposition code
- Backtick `DynamoDB` in two doc comments so clippy::doc_markdown passes.
- Rename `marker` → `marker_value` inside `parse_http_json_nesting` so it
no longer trips clippy::similar_names against the surrounding `marked`
binding.
* perf: skip catch-all JSON build when projection excludes it
Address Copilot follow-up review: when a query projection doesn't
include the catch-all column, the previous implementation still
decomposed every row into a HashMap and serialized the catch-all
JSON object — wasted work that scales with row width.
The nested batch builder now:
- walks the projection once, detects whether the catch-all column is
included, and writes values directly into per-column StringBuilders
in a single pass instead of materializing a full decomposed row per
input;
- takes a fast path for object rows when the catch-all isn't
projected: reads static fields straight out of the parsed JSON
object and skips the HashMap + serialize step;
- falls back to the existing decompose_json_row path when the
catch-all is needed or the row isn't a JSON object, which keeps
error propagation and non-object row handling identical.
Added two tests: one locks down the fast path (narrow projection
against wide rows) and one covers the fall-through on non-object
rows when the catch-all isn't projected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )