Skip to content

feat: Compile schema.json and bundle in Python package #2115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 15, 2025

Conversation

effigies
Copy link
Collaborator

@effigies effigies commented May 11, 2025

In bids-standard/python-validator#2, I'm generating classes on the fly from schema.meta.context. As it stands, the import takes about 530ms on my machine, about 470ms of it spent loading the schema and constructing the classes.

This change builds schema.json during the Python packaging process and includes it instead of the YAML directory inside the tarball and wheel. Along with deferring a couple imports (here and in #2109), this brings the import time down to about 35ms, with 10ms spent on loading the schema and building the classes.

Editable installs are left loading YAML files, to avoid caching an old state.

@yarikoptic
Copy link
Collaborator

yarikoptic commented May 12, 2025

quick one, am I correct to assume that then it would "hardcode" python validator to

  • use 1 specific (potentially very old) bids-specification version for that release of bidsschematools?
  • we would need to release bidsschematools for every version of bids itself?
  • at runtime only 1 that specific version of bids could be validated?

To avoid "yes" to any of those questions, I would have thought for us to keep *tools separate from data. Tools ideally should support multiple versions of the standard overall and ideally even at runtime (e.g. consider validating two different bids datasets of two different versions in the same ipython session). If we are to bundle pip installable copy of standard, we could do it via separate package, may be off https://github.com/bids-standard/bids-schema

edit: nwb&pynwb are experiencing various issues due to pretty much such tight coupling

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my comment in the main thread, I do not think we should ship any version of the schema with the tools.

@effigies
Copy link
Collaborator Author

Wrote a response, but apparently didn't finish or hit "Comment".

The schema is already bundled with the tools. This PR:

  1. Allows load_schema() to load a schema from JSON, not just a YAML directory.
  2. Replaces a YAML directory with an equivalent JSON during sdist build.
  3. Will load either a bundled JSON or YAML directory, to enable regular installs to load fast, while editable installs still get live updates.

If you want to remove the schema from the tools, that will definitely be a breaking change, and will require rethinking the whole package.

@effigies
Copy link
Collaborator Author

effigies commented May 15, 2025

@yarikoptic Want to join the schema call to discuss this?

@yarikoptic
Copy link
Collaborator

sorry, was/is busy... ok, so it is (as described) just for the sake of speed performance, gotcha... boost is significant indeed, so let's go for it.

@effigies effigies merged commit 1012d4d into bids-standard:maint/1.10.0 May 15, 2025
18 of 19 checks passed
@effigies effigies deleted the feat/compile-schema branch May 15, 2025 19:57
@effigies effigies added the exclude-from-changelog This item will not feature in the automatically generated changelog label May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-from-changelog This item will not feature in the automatically generated changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants