Skip to content

Commit aa8ba86

Browse files
authored
Add documentation (#49)
* Add docs * Update docs workflow * Update docs workflow
1 parent ecb8a57 commit aa8ba86

7 files changed

Lines changed: 232 additions & 15 deletions

File tree

.github/workflows/docs.yaml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: docs
2+
3+
on:
4+
workflow_run:
5+
workflows: [CI]
6+
types:
7+
- completed
8+
branches:
9+
- main
10+
11+
env:
12+
UV_FROZEN: true
13+
14+
jobs:
15+
# Build the documentation and upload the static HTML files as an artifact.
16+
build:
17+
if: ${{ github.event.workflow_run.conclusion == 'success' }}
18+
runs-on: ubuntu-latest
19+
steps:
20+
- uses: actions/checkout@v4
21+
- name: Install uv
22+
uses: astral-sh/setup-uv@v5
23+
with:
24+
pyproject-file: pyproject.toml
25+
- name: Install the project
26+
run: uv sync --all-extras
27+
- name: Build docs
28+
run: pdoc -o docs/ -d google bids2table
29+
- uses: actions/upload-pages-artifact@v3
30+
with:
31+
path: docs/
32+
33+
# Deploy the artifact to GitHub pages.
34+
# This is a separate job so that only actions/deploy-pages has the necessary permissions.
35+
deploy:
36+
needs: build
37+
runs-on: ubuntu-latest
38+
permissions:
39+
pages: write
40+
id-token: write
41+
environment:
42+
name: github-pages
43+
url: ${{ steps.deployment.outputs.page_url }}
44+
steps:
45+
- id: deployment
46+
uses: actions/deploy-pages@v4

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ __pycache__/
1111
*.egg-info/
1212
dist
1313
build
14+
docs
1415
*/_version.py
1516

1617
# Unit test / coverage reports

README.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# bids2table
22
[![CI](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml?query=branch%3Amain)
3+
[![Docs](https://github.com/childmindresearch/bids2table/actions/workflows/docs.yaml/badge.svg?branch=main)](https://childmindresearch.github.io/bids2table/bids2table)
34
[![codecov](https://codecov.io/gh/childmindresearch/bids2table/branch/main/graph/badge.svg?token=22HWWFWPW5)](https://codecov.io/gh/childmindresearch/bids2table)
45
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
56
![Python3](https://img.shields.io/badge/python->=3.12-blue.svg)
67
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
78

8-
Index BIDS datasets fast, locally or in the cloud.
9+
Index [BIDS](https://bids-specification.readthedocs.io/en/stable/) datasets fast, locally or in the cloud.
910

1011
## Installation
1112

@@ -102,8 +103,10 @@ Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less
102103
You can also index datasets using the Python API.
103104

104105
```python
105-
import pyarrow as pa
106106
import bids2table as b2t2
107+
import pandas as pd
108+
import pyarrow as pa
109+
import pyarrow.parquet as pq
107110

108111
# Index a single dataset.
109112
tab = b2t2.index_dataset("bids-examples/ds102")
@@ -116,4 +119,10 @@ tab = pa.concat_tables(tabs)
116119

117120
# Index a dataset on S3.
118121
tab = b2t2.index_dataset("s3://openneuro.org/ds000224")
122+
123+
# Save as parquet.
124+
pq.write_table(tab, "ds000224.parquet")
125+
126+
# Convert to a pandas dataframe.
127+
df = tab.to_pandas(types_mapper=pd.ArrowDtype)
119128
```

bids2table/__init__.py

Lines changed: 151 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,156 @@
11
# ruff: noqa: I001
2-
"""Index BIDS datasets fast, locally or in the cloud."""
2+
r"""
3+
[![CI](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml?query=branch%3Amain)
4+
[![Docs](https://github.com/childmindresearch/bids2table/actions/workflows/docs.yaml/badge.svg?branch=main)](https://childmindresearch.github.io/bids2table/bids2table)
5+
[![codecov](https://codecov.io/gh/childmindresearch/bids2table/branch/main/graph/badge.svg?token=22HWWFWPW5)](https://codecov.io/gh/childmindresearch/bids2table)
6+
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
7+
![Python3](https://img.shields.io/badge/python->=3.12-blue.svg)
8+
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
39
10+
Index [BIDS](https://bids-specification.readthedocs.io/en/stable/) datasets fast, locally or in the cloud.
11+
12+
## Installation
13+
14+
To install the latest release from pypi, you can run
15+
16+
```sh
17+
pip install bids2table
18+
```
19+
20+
To install with S3 support, include the `s3` extra
21+
22+
```sh
23+
pip install bids2table[s3]
24+
```
25+
26+
The latest development version can be installed with
27+
28+
```sh
29+
pip install "bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git"
30+
```
31+
32+
## Usage
33+
34+
To run these examples, you will need to clone the [bids-examples](https://github.com/bids-standard/bids-examples) repo.
35+
36+
```sh
37+
git clone -b 1.9.0 https://github.com/bids-standard/bids-examples.git
38+
```
39+
40+
### Finding BIDS datasets
41+
42+
You can search a directory for valid BIDS datasets using `b2t2 find`
43+
44+
```
45+
(bids2table) clane$ b2t2 find bids-examples | head -n 10
46+
bids-examples/asl002
47+
bids-examples/ds002
48+
bids-examples/ds005
49+
bids-examples/asl005
50+
bids-examples/ds051
51+
bids-examples/eeg_rishikesh
52+
bids-examples/asl004
53+
bids-examples/asl003
54+
bids-examples/ds003
55+
bids-examples/eeg_cbm
56+
```
57+
58+
### Indexing datasets from the command line
59+
60+
Indexing datasets is done with `b2t2 index`. Here we index a single example dataset, saving the output as a parquet file.
61+
62+
```
63+
(bids2table) clane$ b2t2 index -o ds102.parquet bids-examples/ds102
64+
ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]
65+
```
66+
67+
You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.
68+
69+
```
70+
(bids2table) clane$ b2t2 index -o bids-examples.parquet bids-examples/*
71+
100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]
72+
```
73+
74+
You can pipe the output of `b2t2 find` to `b2t2 index` to create an index of all datasets under a root directory.
75+
76+
```
77+
(bids2table) clane$ b2t2 find bids-examples | b2t2 index -o bids-examples.parquet
78+
97it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]
79+
```
80+
81+
The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.
82+
83+
### Indexing datasets hosted on S3
84+
85+
bids2table supports indexing datasets hosted on S3 via [cloudpathlib](https://github.com/drivendataorg/cloudpathlib). To use this functionality, make sure to install bids2table with the `s3` extra. Or you can also just install cloudpathlib directly
86+
87+
```sh
88+
pip install cloudpathlib[s3]
89+
```
90+
91+
As an example, here we index all datasets on [OpenNeuro](https://openneuro.org/)
92+
93+
```
94+
(bids2table) clane$ b2t2 index -o openneuro.parquet \
95+
-j 8 --use-threads s3://openneuro.org/ds*
96+
100%|█████████████████████████████████████| 1408/1408 [12:25<00:00, 1.89it/s, ds=ds006193, N=1.2M]
97+
```
98+
99+
Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.
100+
101+
102+
### Indexing datasets from python
103+
104+
You can also index datasets using the Python API.
105+
106+
```python
107+
import bids2table as b2t2
108+
import pandas as pd
109+
import pyarrow as pa
110+
import pyarrow.parquet as pq
111+
112+
# Index a single dataset.
113+
tab = b2t2.index_dataset("bids-examples/ds102")
114+
115+
# Find and index a batch of datasets.
116+
tabs = b2t2.batch_index_dataset(
117+
b2t2.find_bids_datasets("bids-examples"),
118+
)
119+
tab = pa.concat_tables(tabs)
120+
121+
# Index a dataset on S3.
122+
tab = b2t2.index_dataset("s3://openneuro.org/ds000224")
123+
124+
# Save as parquet.
125+
pq.write_table(tab, "ds000224.parquet")
126+
127+
# Convert to a pandas dataframe.
128+
df = tab.to_pandas(types_mapper=pd.ArrowDtype)
129+
```
130+
"""
131+
132+
__all__ = [
133+
"index_dataset",
134+
"batch_index_dataset",
135+
"find_bids_datasets",
136+
"get_arrow_schema",
137+
"get_column_names",
138+
"parse_bids_entities",
139+
"validate_bids_entities",
140+
"set_bids_schema",
141+
"get_bids_schema",
142+
"get_bids_entity_arrow_schema",
143+
"format_bids_path",
144+
"cloudpathlib_is_available",
145+
]
146+
147+
from ._indexing import (
148+
index_dataset,
149+
batch_index_dataset,
150+
find_bids_datasets,
151+
get_arrow_schema,
152+
get_column_names,
153+
)
4154
from ._entities import (
5155
parse_bids_entities,
6156
validate_bids_entities,
@@ -9,12 +159,5 @@
9159
get_bids_entity_arrow_schema,
10160
format_bids_path,
11161
)
12-
from ._indexing import (
13-
find_bids_datasets,
14-
index_dataset,
15-
batch_index_dataset,
16-
get_arrow_schema,
17-
get_column_names,
18-
)
19162
from ._pathlib import Path, cloudpathlib_is_available
20163
from ._version import *

bids2table/_entities.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def parse_bids_entities(path: str | Path) -> dict[str, str]:
144144
path: BIDS path to parse.
145145
146146
Returns:
147-
entities: dict mapping BIDS entity keys to values.
147+
A dict mapping BIDS entity keys to values.
148148
"""
149149
if isinstance(path, str):
150150
path = Path(path)
@@ -207,9 +207,10 @@ def validate_bids_entities(
207207
entities: dict mapping BIDS keys to unvalidated entities
208208
209209
Returns:
210-
valid_entities: A mapping of valid BIDS keys to type-casted values.
211-
extra_entities: A mapping of any leftover entity mappings that didn't match a
212-
known entity or failed validation.
210+
A tuple of `(valid_entities, extra_entities)`, where `valid_entities` is a
211+
mapping of valid BIDS keys to type-casted values, and `extra_entities` a
212+
mapping of any leftover entity mappings that didn't match a known entity or
213+
failed validation.
213214
"""
214215
valid_entities = {}
215216
extra_entities = {}
@@ -254,7 +255,7 @@ def format_bids_path(entities: dict[str, Any], int_format: str = "%d") -> Path:
254255
int_format: format string for integer (index) BIDS values.
255256
256257
Returns:
257-
path: formatted `Path` instance.
258+
A formatted `Path` instance.
258259
"""
259260
special = {"datatype", "suffix", "ext"}
260261

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ dev = [
4444
"ipython>=9.2.0",
4545
"jupyter>=1.1.1",
4646
"pandas==2.2.3",
47+
"pdoc>=15.0.3",
4748
"pre-commit>=4.1.0",
4849
"pytest>=8.3.5",
4950
"pytest-cov>=6.0.0",

uv.lock

Lines changed: 16 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)