Skip to content

Commit 28ee47e

Browse files
clane9kaitj
andauthored
Downgrade minimal required python to 3.11 (#51)
* Downgrade minimal required python to 3.11 Requiring python>=3.12 is a bit of a burden. It's only required for `find_bids_datasets`, which uses `Path.walk`. Instead, downgrade minimum python to 3.11 and add a guard on this function in case python<3.12. Note, we could look into downgrading further. The next block would be `get_column_names` returns a `StrEnum`, which was introduced in 3.11. I'm hesitant to remove this though, because being able to treat these enum fields as native strings is nice. * Switch path.walk to os.walk for py311 - Enables use of `finds_bids_datasets` in py311. - `root` in `_indexing.py` is initially passed as original type, with walked `dirpath` typecasted as Path - Removed error for <py312 in throughout codebase + testing * Setup python matrix for testing - Only runs after formatting - Update to dependencies to include other versions of python support * Implement iterative directory walk for find `Path.walk` and `CloudPath.walk` depend on python>=3.12. Also, `CloudPath.walk` retrieves all files up front rather than iteratively. Here we add some directory walk logic of our own for iteratively finding BIDS datasets under a root directory. * Update README.md * Update module docs --------- Co-authored-by: Jason Kai <21226986+kaitj@users.noreply.github.com>
1 parent c605559 commit 28ee47e

9 files changed

Lines changed: 874 additions & 644 deletions

File tree

.github/workflows/ci.yaml

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,20 @@ name: CI
22

33
on:
44
push:
5-
branches: [ "main" ]
5+
branches: ["main"]
66
pull_request:
7-
branches: [ "main" ]
7+
branches: ["main"]
88

99
env:
1010
UV_FROZEN: true
1111

1212
jobs:
13-
test:
13+
format:
1414
runs-on: ubuntu-latest
1515
steps:
1616
- uses: actions/checkout@v4
1717
with:
18-
submodules: 'true'
18+
submodules: "true"
1919
- name: Install uv
2020
uses: astral-sh/setup-uv@v5
2121
with:
@@ -26,9 +26,24 @@ jobs:
2626
run: |
2727
uv run ruff check bids2table tests
2828
uv run ruff format --check bids2table tests
29+
30+
tests:
31+
runs-on: ubuntu-latest
32+
needs: format
33+
strategy:
34+
matrix:
35+
python-version: ["3.11", "3.12", "3.13"]
36+
steps:
37+
- uses: actions/checkout@v4
38+
with:
39+
submodules: "true"
40+
- name: Install uv with python version
41+
uses: astral-sh/setup-uv@v6
42+
with:
43+
python-version: ${{ matrix.python-version }}
2944
- name: Run tests
3045
run: |
31-
uv run pytest \
46+
uv run --all-extras pytest \
3247
--junitxml=pytest.xml \
3348
--cov-report=xml:coverage.xml \
3449
--cov=bids2table tests

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[![Docs](https://github.com/childmindresearch/bids2table/actions/workflows/docs.yaml/badge.svg?branch=main)](https://childmindresearch.github.io/bids2table/bids2table)
44
[![codecov](https://codecov.io/gh/childmindresearch/bids2table/branch/main/graph/badge.svg?token=22HWWFWPW5)](https://codecov.io/gh/childmindresearch/bids2table)
55
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
6-
![Python3](https://img.shields.io/badge/python->=3.12-blue.svg)
6+
![Python3](https://img.shields.io/badge/python->=3.11-blue.svg)
77
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
88

99
Index [BIDS](https://bids-specification.readthedocs.io/en/stable/) datasets fast, locally or in the cloud.

bids2table/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
[![Docs](https://github.com/childmindresearch/bids2table/actions/workflows/docs.yaml/badge.svg?branch=main)](https://childmindresearch.github.io/bids2table/bids2table)
55
[![codecov](https://codecov.io/gh/childmindresearch/bids2table/branch/main/graph/badge.svg?token=22HWWFWPW5)](https://codecov.io/gh/childmindresearch/bids2table)
66
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
7-
![Python3](https://img.shields.io/badge/python->=3.12-blue.svg)
7+
![Python3](https://img.shields.io/badge/python->=3.11-blue.svg)
88
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
99
1010
Index [BIDS](https://bids-specification.readthedocs.io/en/stable/) datasets fast, locally or in the cloud.

bids2table/__main__.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,9 @@ def main():
6767
parser_index.set_defaults(func=_index_command)
6868

6969
parser_find = subparsers.add_parser("find", help="Find BIDS datasets.")
70+
parser_find.add_argument(
71+
"--maxdepth", type=int, help="Max search depth", default=None
72+
)
7073
parser_find.add_argument(
7174
"--exclude-dirs",
7275
metavar="DIR",
@@ -75,12 +78,6 @@ def main():
7578
default=None,
7679
help="List of directory names or glob patterns to exclude from search.",
7780
)
78-
parser_find.add_argument(
79-
"--follow-symlinks",
80-
"-L",
81-
action="store_true",
82-
help="Follow symbolic links.",
83-
)
8481
parser_find.add_argument(
8582
"--verbose",
8683
"-v",
@@ -157,7 +154,7 @@ def _find_command(args: argparse.Namespace):
157154
for dataset in b2t2.find_bids_datasets(
158155
args.root,
159156
exclude=args.exclude_dirs,
160-
follow_symlinks=args.follow_symlinks,
157+
maxdepth=args.maxdepth,
161158
):
162159
print(dataset)
163160

bids2table/_indexing.py

Lines changed: 51 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -129,51 +129,66 @@ def get_column_names() -> enum.StrEnum:
129129
def find_bids_datasets(
130130
root: str | PathT,
131131
exclude: str | list[str] | None = None,
132-
follow_symlinks: bool = True,
133-
log_frequency: int = 100,
132+
maxdepth: int | None = None,
134133
) -> Generator[PathT, None, None]:
135134
"""Find all BIDS datasets under a root directory.
136135
137136
Args:
138137
root: Root path to begin search.
139138
exclude: Glob pattern or list of patterns matching sub-directory names to
140139
exclude from the search.
141-
follow_symlinks: Search into symlinks that point to directories.
140+
maxdepth: Maximum depth to search.
142141
143142
Yields:
144143
Root paths of all BIDS datasets under `root`.
145144
"""
146145
root = as_path(root)
147146

148-
dir_count = 0
147+
if isinstance(exclude, str):
148+
exclude = [exclude]
149+
elif exclude is None:
150+
exclude = []
151+
exclude = [re.compile(fnmatch.translate(pat)) for pat in exclude]
152+
153+
entry_count = 1
149154
ds_count = 0
150155

151-
# NOTE: Path.walk was introduced in 3.12. Otherwise, could use an older python.
152-
for dirpath, dirnames, _ in root.walk(follow_symlinks=follow_symlinks):
153-
dir_count += 1
156+
if _is_bids_dataset(root):
157+
ds_count += 1
158+
yield root
154159

155-
if _is_bids_dataset(dirpath):
156-
ds_count += 1
157-
yield dirpath
160+
# Tuple of path, depth
161+
stack = [(root, 0)]
158162

159-
# Only descend into specific sub-directories that are allowed to contain
160-
# sub-datasets.
161-
_filter_dirnames(dirnames, _BIDS_NESTED_PARENT_DIRNAMES)
163+
while stack:
164+
top, depth = stack.pop()
162165

163-
# Filter sub-directories to descend into.
164-
if exclude:
165-
matches = _filter_exclude(dirnames, exclude)
166-
_filter_dirnames(dirnames, matches)
166+
inside_bids = _is_bids_dataset(top)
167+
depth += 1
167168

168-
if log_frequency and dir_count % log_frequency == 0:
169-
_logger.info(
170-
"Searched %d directories; found %d BIDS datasets.", dir_count, ds_count
171-
)
169+
for entry in top.iterdir():
170+
entry_count += 1
172171

173-
if log_frequency:
174-
_logger.info(
175-
"Searched %d directories; found %d BIDS datasets.", dir_count, ds_count
176-
)
172+
if any(re.fullmatch(pat, entry.name) for pat in exclude):
173+
continue
174+
175+
if _is_bids_dataset(entry):
176+
ds_count += 1
177+
yield entry
178+
179+
# Checks if we should descend into this directory.
180+
# Check not reached final depth.
181+
descend = maxdepth is None or depth < maxdepth
182+
# Heuristic checks whether the filename looks like a (visible) directory.
183+
descend = descend and not (entry.suffix or entry.name.startswith("."))
184+
# Only descend into specific subdirectories of BIDS directories.
185+
descend = descend and (
186+
not inside_bids or entry.name in _BIDS_NESTED_PARENT_DIRNAMES
187+
)
188+
# Finally, check if actually a directory (which is slow so we want to
189+
# short-circuit as much as possible).
190+
if descend and entry.is_dir():
191+
stack.append((entry, depth))
177192

178193

179194
def index_dataset(
@@ -316,6 +331,17 @@ def _get_bids_dataset(path: str | PathT) -> tuple[str | None, PathT | None]:
316331

317332
def _is_bids_dataset(path: PathT) -> bool:
318333
"""Test if path is a BIDS dataset root directory."""
334+
# Quick heuristic checks.
335+
# BIDS datasets should not contain a file extension.
336+
if path.suffix:
337+
return False
338+
# Path should not be hidden.
339+
if path.name.startswith("."):
340+
return False
341+
# Subject dirs are not datasets.
342+
if _is_bids_subject_dir(path):
343+
return False
344+
319345
# Check if contains a dataset_description.json or any subject directories. Note,
320346
# it's common for ppl to forget the dataset description, so let's not be too strict.
321347
description_exists = (path / "dataset_description.json").exists()
@@ -493,15 +519,6 @@ def _multi_pattern_filter(names: list[str], patterns: str | list[str]) -> set[st
493519
return matching_names
494520

495521

496-
def _filter_dirnames(dirnames: list[str], matches: set[str]) -> None:
497-
"""Remove dirnames matching `matches` in place."""
498-
# Iterate in reversed order since we are modifying in place.
499-
n_names = len(dirnames)
500-
for ii, dirname in enumerate(reversed(dirnames)):
501-
if dirname not in matches:
502-
del dirnames[n_names - ii - 1]
503-
504-
505522
def _hfmt(n: int) -> str:
506523
if n < 10_000:
507524
n_fmt = str(n)

pyproject.toml

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,11 @@ build-backend = "setuptools.build_meta"
55
[project]
66
name = "bids2table"
77
dynamic = ["version"]
8-
authors = [
9-
{ name = "Connor Lane", email = "connor.lane858@gmail.com" },
10-
]
8+
authors = [{ name = "Connor Lane", email = "connor.lane858@gmail.com" }]
119
description = "Index BIDS datasets fast, locally or in the cloud."
1210
readme = "README.md"
13-
requires-python = ">=3.12"
14-
license = {text = "MIT License"}
11+
requires-python = ">=3.11"
12+
license = { text = "MIT License" }
1513
classifiers = [
1614
"Development Status :: 3 - Alpha",
1715
"Intended Audience :: Developers",
@@ -20,24 +18,17 @@ classifiers = [
2018
"Programming Language :: Python :: 3.11",
2119
"Programming Language :: Python :: 3.12",
2220
"Programming Language :: Python :: 3.13",
23-
"Programming Language :: Python :: 3.14",
2421
"License :: OSI Approved :: MIT License",
2522
"Operating System :: POSIX",
2623
"Operating System :: Unix",
2724
"Operating System :: MacOS",
2825
"Operating System :: Microsoft :: Windows",
2926
]
3027

31-
dependencies = [
32-
"bidsschematools>=1.0",
33-
"pyarrow>=14.0.2",
34-
"tqdm>=4.66.2",
35-
]
28+
dependencies = ["bidsschematools>=1.0", "pyarrow>=20.0.0", "tqdm>=4.67.1"]
3629

3730
[project.optional-dependencies]
38-
s3 = [
39-
"cloudpathlib[s3]>=0.17.0",
40-
]
31+
s3 = ["cloudpathlib[s3]>=0.21.0"]
4132

4233
[dependency-groups]
4334
dev = [
@@ -48,7 +39,7 @@ dev = [
4839
"pre-commit>=4.1.0",
4940
"pytest>=8.3.5",
5041
"pytest-cov>=6.0.0",
51-
"ruff>=0.9.10",
42+
"ruff>=0.11.9",
5243
]
5344

5445
[project.urls]

tests/test_indexing.py

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import logging
2+
from itertools import islice
23
from pathlib import Path
34

45
import pyarrow as pa
@@ -25,24 +26,43 @@ def test_get_column_names():
2526

2627

2728
def test_find_bids_datasets():
28-
datasets = sorted(indexing.find_bids_datasets(BIDS_EXAMPLES, log_frequency=100))
29+
datasets = sorted(
30+
indexing.find_bids_datasets(
31+
BIDS_EXAMPLES,
32+
exclude=["surfaces", "subjects", "code", "sourcedata"],
33+
)
34+
)
2935
expected_datasets = sorted(
3036
[p.parent for p in BIDS_EXAMPLES.rglob("dataset_description.json")]
3137
)
3238
# find_bids_datasets finds a few extra derivative datasets that are missing a
3339
# dataset_description.json.
3440
assert set(expected_datasets).issubset(datasets)
35-
assert len(datasets) == len(expected_datasets) + 6
41+
assert len(datasets) == len(expected_datasets) + 3
3642

3743
datasets_no_derivatives = sorted(
38-
indexing.find_bids_datasets(BIDS_EXAMPLES, exclude="derivatives")
44+
indexing.find_bids_datasets(
45+
BIDS_EXAMPLES,
46+
exclude=["derivatives", "code", "sourcedata"],
47+
)
3948
)
4049
expected_datasets_no_derivatives = sorted(
4150
[p.parent for p in BIDS_EXAMPLES.glob("*/dataset_description.json")]
4251
)
4352
assert datasets_no_derivatives == expected_datasets_no_derivatives
4453

4554

55+
def test_find_bids_datasets_s3():
56+
root = "s3://openneuro.org"
57+
datasets = list(islice(indexing.find_bids_datasets(root, maxdepth=2), 10))
58+
names = sorted([ds.name for ds in datasets])
59+
expected_names = [
60+
"ds000001", "ds000002", "ds000003", "ds000005", "ds000006",
61+
"ds000007", "ds000008", "ds000009", "ds000011", "ds000017",
62+
] # fmt: skip
63+
assert names == expected_names
64+
65+
4666
@pytest.mark.parametrize(
4767
"root,expected_count",
4868
[
@@ -92,12 +112,12 @@ def test_index_dataset_warns(path: str, msg: str, caplog: LogCaptureFixture):
92112

93113
@pytest.mark.parametrize("max_workers", [0, 2])
94114
def test_batch_index_dataset(max_workers: int):
95-
datasets = list(indexing.find_bids_datasets(BIDS_EXAMPLES))
115+
datasets = list(BIDS_EXAMPLES.glob("*"))
96116
tables = indexing.batch_index_dataset(
97117
datasets, max_workers=max_workers, show_progress=False
98118
)
99119
table = pa.concat_tables(tables)
100-
assert len(table) == 10133
120+
assert len(table) == 9727
101121

102122

103123
@pytest.mark.parametrize(

tests/test_main.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,6 @@
1010

1111
BIDS_EXAMPLES = Path(__file__).parents[1] / "bids-examples"
1212

13-
COMMANDS = [
14-
"find {examples}",
15-
"index -o {out_dir}/ds102.parquet {examples}/ds102",
16-
"index -o {out_dir}/ds101_ds102.parquet {examples}/ds101 {examples}/ds102",
17-
"index -o {out_dir}/ds10N.parquet '{examples}/ds10?'",
18-
]
19-
2013

2114
@contextmanager
2215
def patch_argv(argv: List[str]):
@@ -31,7 +24,6 @@ def patch_argv(argv: List[str]):
3124
@pytest.mark.parametrize(
3225
"cmd,output",
3326
[
34-
("find {examples}", None),
3527
("index -o {out_dir}/ds102.parquet {examples}/ds102", "ds102.parquet"),
3628
(
3729
"index -o {out_dir}/ds101_ds102.parquet {examples}/ds101 {examples}/ds102",
@@ -40,7 +32,7 @@ def patch_argv(argv: List[str]):
4032
("index -o {out_dir}/ds10N.parquet '{examples}/ds10?'", "ds10N.parquet"),
4133
],
4234
)
43-
def test_main(cmd: str, output: str | None, tmp_path: Path):
35+
def test_main_index(cmd: str, output: str | None, tmp_path: Path):
4436
cmd_fmt = cmd.format(out_dir=tmp_path, examples=BIDS_EXAMPLES)
4537
prog = str(Path(cli.__file__).absolute())
4638
argv = [prog] + shlex.split(cmd_fmt)
@@ -49,3 +41,12 @@ def test_main(cmd: str, output: str | None, tmp_path: Path):
4941

5042
if output:
5143
assert (tmp_path / output).exists()
44+
45+
46+
@pytest.mark.parametrize("cmd", ["find {examples}"])
47+
def test_main_find(cmd: str):
48+
cmd_fmt = cmd.format(examples=BIDS_EXAMPLES)
49+
prog = str(Path(cli.__file__).absolute())
50+
argv = [prog] + shlex.split(cmd_fmt)
51+
with patch_argv(argv):
52+
cli.main()

0 commit comments

Comments
 (0)