11# bids2table
2- <!-- [](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml?query=branch%3Amain)
3- [](https://childmindresearch.github.io/bids2table)
4- [](https://codecov.io/gh/childmindresearch/bids2table) -->
2+ [ ![ CI] ( https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml/badge.svg?branch=main )] ( https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml?query=branch%3Amain )
3+ [ ![ codecov] ( https://codecov.io/gh/childmindresearch/bids2table/branch/main/graph/badge.svg?token=22HWWFWPW5 )] ( https://codecov.io/gh/childmindresearch/bids2table )
54[ ![ Ruff] ( https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json )] ( https://github.com/astral-sh/ruff )
6- ![ Python3] ( https://img.shields.io/badge/python->=3.11 -blue.svg )
5+ ![ Python3] ( https://img.shields.io/badge/python->=3.12 -blue.svg )
76[ ![ License] ( https://img.shields.io/badge/license-MIT-blue.svg )] ( LICENSE )
87
98Index BIDS datasets fast, locally or in the cloud.
@@ -13,5 +12,102 @@ Index BIDS datasets fast, locally or in the cloud.
1312The latest development version can be installed with
1413
1514``` sh
16- pip install git+https://github.com/childmindresearch/bids2table.git@develop/b2t2
15+ pip install " bids2table @ git+https://github.com/childmindresearch/bids2table.git@develop/b2t2"
16+ ```
17+
18+ To install with S3 support, include the ` s3 ` extra
19+
20+ ``` sh
21+ pip install " bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git@develop/b2t2"
22+ ```
23+
24+ ## Usage
25+
26+ ### Finding BIDS datasets
27+
28+ You can search a directory for valid BIDS datasets using ` b2t2 find `
29+
30+ ```
31+ (bids2table) clane$ b2t2 find bids-examples | head -n 10
32+ bids-examples/asl002
33+ bids-examples/ds002
34+ bids-examples/ds005
35+ bids-examples/asl005
36+ bids-examples/ds051
37+ bids-examples/eeg_rishikesh
38+ bids-examples/asl004
39+ bids-examples/asl003
40+ bids-examples/ds003
41+ bids-examples/eeg_cbm
42+ ```
43+
44+ ### Indexing datasets from the command line
45+
46+ Indexing datasets is done with ` b2t2 index ` . Here we index a single example dataset, saving the output as a parquet file.
47+
48+ ```
49+ (bids2table) clane$ b2t2 index -v -o ds102.parquet bids-examples/ds102
50+ ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]
51+ ```
52+
53+ You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.
54+
55+ ```
56+ (bids2table) clane$ b2t2 index -v -o bids-examples.parquet bids-examples/*
57+ 100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]
58+ ```
59+
60+ You can pipe the output of ` b2t2 find ` to ` b2t2 index ` to create an index of all datasets under a root directory.
61+
62+ ```
63+ (bids2table) clane$ b2t2 find bids-examples | b2t2 index -v -o bids-examples.parquet
64+ 97it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]
65+ ```
66+
67+ The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.
68+
69+ ### Indexing datasets hosted on S3
70+
71+ bids2table supports indexing datasets hosted on S3 via [ cloudpathlib] ( https://github.com/drivendataorg/cloudpathlib ) . To use this functionality, install cloudpathlib with S3 support
72+
73+ ``` sh
74+ pip install cloudpathlib[s3]
75+ ```
76+
77+ You can also install bids2table with the s3 extra
78+
79+ ``` sh
80+ pip install " bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git@develop/b2t2"
81+ ```
82+
83+ As an example, here we index all datasets on [ OpenNeuro] ( https://openneuro.org/ )
84+
85+ ```
86+ (bids2table) clane$ b2t2 index -v -o openneuro.parquet \
87+ -j 8 --use-threads s3://openneuro.org/ds*
88+ 100%|█████████████████████████████████████| 1408/1408 [12:25<00:00, 1.89it/s, ds=ds006193, N=1.2M]
89+ ```
90+
91+ Using 8 threads, we can index all ~ 1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.
92+
93+
94+ ### Indexing datasets from python
95+
96+ You can also index datasets in Python using the Python API.
97+
98+ ``` python
99+ import pyarrow as pa
100+ import bids2table as b2t2
101+
102+ # Index a single dataset.
103+ tab = b2t2.index_dataset(" bids-examples/ds102" )
104+
105+ # Find and index a batch of datasets.
106+ tabs = b2t2.batch_index_dataset(
107+ b2t2.find_bids_datasets(" bids-examples" ),
108+ )
109+ tab = pa.concat_tables(tabs)
110+
111+ # Index a dataset on S3.
112+ tab = b2t2.index_dataset(" s3://openneuro.org/ds000224" )
17113```
0 commit comments