Skip to content

Replace global schema state with schema adapter dataclass #70

@effigies

Description

@effigies

This repository's current structure makes it awkward to work with multiple layouts in a single process. In the fMRIPrep use case, we need the following layouts:

  1. templateflow: Not quite BIDS
  2. Input dataset (BIDS, but look for some extra files for backwards compatibility reasons)
  3. Output dataset: Utilizes unmerged BEPs

To me, the natural way to do this would be something like:

tf_schema: Namespace = patch_schema(...)
tf_tab = index_dataset('~/.cache/templateflow', schema=tf_schema)

ds_tab = index_dataset(input_path)

deriv_tab = index_dataset(output_path, schema='https://bids-specification.readthedocs.io/en/bepXYZ/schema.json')

I think at the moment, we will need to do something like

tf_schema = patch_schema(...)
tf_schema_path = Path(tempfile.NamedTemporaryFile(suffix='.json', delete=False))
tf_schema_path.write_text(tf_schema.to_json())
set_bids_schema(tf_schema_path)
tf_tab = index_dataset('~/.cache/templateflow')
set_bids_schema()  # Reset

How would you feel about abstracting this global state into an object like:

class Schema:
    # Default, can be given a URL or path
    schema_path: PathLike | None
    # Alternate: Can be passed an instantiated schema
    bids_schema: Namespace

    def set_bids_schema(path: str | Path | None = None) -> None: ...
    def get_bids_schema() -> Namespace: ...
    def get_bids_entity_arrow_schema() -> pa.Schema: ...

You could then preserve the current API with:

_GLOBAL_SCHEMA = Schema()
set_bids_schema = _GLOBAL_SCHEMA.set_bids_schema
get_bids_schema = _GLOBAL_SCHEMA.get_bids_schema
get_bids_entity_arrow_schema = _GLOBAL_SCHEMA.get_bids_entity_arrow_schema

Functions that call any of these functions somewhere in their call stack (particularly index_dataset) could be given a schema=_GLOBAL_SCHEMA default argument.

LMK what you think and I'd be happy to put together a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions