This repository's current structure makes it awkward to work with multiple layouts in a single process. In the fMRIPrep use case, we need the following layouts:
- templateflow: Not quite BIDS
- Input dataset (BIDS, but look for some extra files for backwards compatibility reasons)
- Output dataset: Utilizes unmerged BEPs
To me, the natural way to do this would be something like:
tf_schema: Namespace = patch_schema(...)
tf_tab = index_dataset('~/.cache/templateflow', schema=tf_schema)
ds_tab = index_dataset(input_path)
deriv_tab = index_dataset(output_path, schema='https://bids-specification.readthedocs.io/en/bepXYZ/schema.json')
I think at the moment, we will need to do something like
tf_schema = patch_schema(...)
tf_schema_path = Path(tempfile.NamedTemporaryFile(suffix='.json', delete=False))
tf_schema_path.write_text(tf_schema.to_json())
set_bids_schema(tf_schema_path)
tf_tab = index_dataset('~/.cache/templateflow')
set_bids_schema() # Reset
How would you feel about abstracting this global state into an object like:
class Schema:
# Default, can be given a URL or path
schema_path: PathLike | None
# Alternate: Can be passed an instantiated schema
bids_schema: Namespace
def set_bids_schema(path: str | Path | None = None) -> None: ...
def get_bids_schema() -> Namespace: ...
def get_bids_entity_arrow_schema() -> pa.Schema: ...
You could then preserve the current API with:
_GLOBAL_SCHEMA = Schema()
set_bids_schema = _GLOBAL_SCHEMA.set_bids_schema
get_bids_schema = _GLOBAL_SCHEMA.get_bids_schema
get_bids_entity_arrow_schema = _GLOBAL_SCHEMA.get_bids_entity_arrow_schema
Functions that call any of these functions somewhere in their call stack (particularly index_dataset) could be given a schema=_GLOBAL_SCHEMA default argument.
LMK what you think and I'd be happy to put together a PR.
This repository's current structure makes it awkward to work with multiple layouts in a single process. In the fMRIPrep use case, we need the following layouts:
To me, the natural way to do this would be something like:
I think at the moment, we will need to do something like
How would you feel about abstracting this global state into an object like:
You could then preserve the current API with:
Functions that call any of these functions somewhere in their call stack (particularly
index_dataset) could be given aschema=_GLOBAL_SCHEMAdefault argument.LMK what you think and I'd be happy to put together a PR.