spec: Variant lower/upper bounds #12658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

amogh-jahagirdar merged 6 commits into apache:main from aihuaxu:variant-bound-spec

Apr 21, 2025

Contributor

aihuaxu commented Mar 26, 2025 •

edited by Fokko

Loading

This is to revise the bounds specification for Variant. In summary:

The writer determines which fields to collect bounds for in a Variant column. Field bounds are stored as serialized Variant objects, where each key is a normalized JSON path identifying a field, and each value is the corresponding lower or upper bound.

E.g.

For a Variant column with the schema as follows:

{
  "event_type": "login",
  "user.name": "Alex", 
  "tags": ["action", "drama"]
}

The collected bound object looks like:

{
  "$['event_type']": "login",
  "$['user.name']": "Alex",
  "$['tags']": "action"
}


          Variant bound spec

a93bd21

github-actions bot added the Specification label

Contributor Author

aihuaxu commented Mar 26, 2025

cc @rdblue, @danielcweeks, @RussellSpitzer

rdblue reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

amogh-jahagirdar reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

xxubai reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved


          Add array path and change to only use normalized path.

31dbfa2

aihuaxu force-pushed the variant-bound-spec branch from f03ba8f to 31dbfa2 Compare

March 27, 2025 04:58

aihuaxu requested review from rdblue, amogh-jahagirdar and xxubai

March 27, 2025 05:03

rdblue reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

rdblue reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

rdblue reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved


          Clarify the condition to collect bounds

af19925

xxubai approved these changes

View reviewed changes

Contributor

xxubai left a comment

LGTM

Contributor

danielcweeks commented Apr 3, 2025

@aihuaxu and @rdblue is there a reason we need to explicitly restrict the lower/upper bounds to shredded fields? I would think that the stats pruning would be useful for any field that a writer would want to include in the bound (regardless of whether it was shredded or not).

Contributor Author

aihuaxu commented Apr 4, 2025

@aihuaxu and @rdblue is there a reason we need to explicitly restrict the lower/upper bounds to shredded fields? I would think that the stats pruning would be useful for any field that a writer would want to include in the bound (regardless of whether it was shredded or not).

What we were thinking is that the bounds are collected from shredded column stats during shredding process. But it does seem reasonable to me to bounds and shredding can be separated: if a writer has the knowledge of the bounds and chooses not to shred, the bounds can still be used in pruning.

rdblue mentioned this pull request

API: Use normalized JSON path to identify Variant fields #12835

Merged

rdblue reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

rdblue reviewed

View reviewed changes

format/spec.md Show resolved Hide resolved

rdblue reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

a28931d

Improve the wording

Co-authored-by: Ryan Blue <[email protected]>

RussellSpitzer reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

RussellSpitzer reviewed

View reviewed changes

format/spec.md Outdated Show resolved Hide resolved

RussellSpitzer approved these changes

View reviewed changes

Member

RussellSpitzer left a comment

Looks good to me, I think we need to clarify a bit here

rdblue reviewed

View reviewed changes

format/spec.md Outdated

+              * `$` -- the Variant root value
+              * `$['user.name']` -- the field `"user.name"` in the root value that is a Variant object
+              * `$['location']['latitude']` -- the field `latitude` in a nested `location` object
+              * `$['ids']` -- the `ids` array

Contributor

rdblue Apr 18, 2025

When I read the thread you raised on the dev list, I liked that you used "tags" as the example. Maybe we should change some of these to match the examples in the variant shredding spec?

Contributor Author

aihuaxu Apr 18, 2025

Let me do that to match as much as possible.

flyrain approved these changes

View reviewed changes

amogh-jahagirdar approved these changes

View reviewed changes

huaxingao approved these changes

View reviewed changes

aihuaxu and others added 2 commits

April 18, 2025 16:46


          Apply suggestions from code review

1df3a35

Clarify some sentences.

Co-authored-by: Russell Spitzer <[email protected]>


          Update examples for variant metadata

9a99971

aihuaxu force-pushed the variant-bound-spec branch from b16f660 to 9a99971 Compare

April 19, 2025 16:37

Fokko approved these changes

View reviewed changes

Contributor

amogh-jahagirdar commented Apr 21, 2025

Thanks @RussellSpitzer @rdblue @Fokko @flyrain @huaxingao @XBaith for reviewing and everyone for voting. Since the vote passed, I'll go ahead and merge

amogh-jahagirdar merged commit cc2e0ff into apache:main

2 checks passed

aihuaxu mentioned this pull request

[Spec] filed path update for Variant array metadata #13462

Open

6 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

rdblue rdblue left review comments

RussellSpitzer RussellSpitzer approved these changes

Fokko Fokko approved these changes

flyrain flyrain approved these changes

huaxingao huaxingao approved these changes

xxubai xxubai approved these changes

amogh-jahagirdar amogh-jahagirdar approved these changes

Labels