data-cli

Command-line tool for downloading datasets published by CZ Biohub. Resolves a collection ID to its constituent datasets and downloads files from S3 and HTTP, with progress bars, size estimates, and dry-run accounting.

Installation

To install the OPS data CLI, run:

pip install biohub-data-cli

Quick start

See what a collection contains without downloading:

ops-data download collection <collection-id> --dry-run

Download a collection to the current directory:

ops-data download collection <collection-id>

Download multiple collections to a specific directory, skipping the prompt:

ops-data download collection <id-a> <id-b> -o ./data -y

Download only specific datasets from a collection:

ops-data download collection <collection-id> --dataset dataset-1,dataset-2

Files land under <outdir>/<collection-slug>/<dataset-slug>/.

Commands

`ops-data download collection IDS...`

Download one or more collections by ID.

Option	Description
`-o, --outdir PATH`	Output directory. Defaults to `.`.
`-y, --yes`	Skip the size-estimate confirmation prompt.
`--dataset SLUGS`	Comma-separated dataset slugs to download a subset of the collection. Only valid with a single collection.
`--dry-run`	Print per-dataset size statistics without downloading. Mutually exclusive with `-y`.
`--no-resume`	Ignore cached listing state and re-list/re-download from scratch.

Dry run resolves every S3 URI (listing prefixes, heading objects) to report exact byte totals per dataset. HTTP URLs are not sized during dry run and surface as a warning in the summary.

Filtering datasets with --dataset downloads only the named datasets from a collection instead of all of them, e.g. --dataset dataset-1,dataset-2. Slugs are downloaded in the order given, duplicates are ignored, and an unknown slug fails with the list of available slugs. Run --dry-run first to see the available slugs. Filtering applies to a single collection, so it can't be combined with multiple IDs.

Confirmation prompt shows the aggregate size estimate before any bytes move. Pass -y to skip it in scripts.

Failures are collected and reported at the end. The process exits non-zero if any download failed, but other downloads continue — one bad URL won't abort the run.

Development

This project uses uv for dependency management.

Install dependencies (including dev extras):

uv sync

Run tests:

uv run pytest

Run tests with coverage report:

uv run pytest --cov=biohub_data_cli --cov-report=term-missing

Run the CLI from a checkout:

uv run ops-data --help

Integration tests

Tests marked integration hit real S3 buckets / HTTP servers and are deselected by default. Run them explicitly:

uv run pytest -m integration

Code of Conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.

Reporting Security Issues

If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
ci		ci
src/biohub_data_cli		src/biohub_data_cli
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
release-please-config.json		release-please-config.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

data-cli

Installation

Quick start

Commands

`ops-data download collection IDS...`

Development

Integration tests

Code of Conduct

Reporting Security Issues

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

data-cli

Installation

Quick start

Commands

ops-data download collection IDS...

Development

Integration tests

Code of Conduct

Reporting Security Issues

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`ops-data download collection IDS...`

Packages