Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"name": "cell-explorer",
"description": "Tools for querying cBioPortal cell-level zarr (AnnData) datasets",
"author": {
"name": "cBioPortal"
}
}
78 changes: 78 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Everything is changing, do not import. Unstable.

- `@cbioportal-cell-explorer/app` — Web application
- `@cbioportal-cell-explorer/zarrstore` — Zarr store library
- `@cbioportal-cell-explorer/cli` — CLI tool for querying zarr datasets
- `@cbioportal-cell-explorer/docs` — Documentation site

## Setup
Expand All @@ -21,3 +22,80 @@ pnpm install
```sh
pnpm --filter @cbioportal-cell-explorer/app dev
```

## CLI

A command-line tool for querying AnnData-formatted zarr datasets. It reuses the `@cbioportal-cell-explorer/zarrstore` library to query datasets over HTTP — no downloads or Python required.

### Quick start

```sh
pnpm install
pnpm exec cbioportal-cell-explorer info
```

### Commands

```
cbioportal-cell-explorer <command> [options]

Commands:
info Dataset overview (shape, columns, embeddings)
obs [column] List obs columns or show value counts
var [--search <pattern>] List or search gene names
expression <gene> Expression stats, optionally grouped
[--group-by <obs_column>]
embedding [key] [--limit N] List or dump embedding coordinates

Global options:
--url <zarr_url> Zarr dataset URL (default: MSK SPECTRUM TME)
--json Output as JSON
-h, --help Show help
```

### Examples

```sh
# Dataset overview
pnpm exec cbioportal-cell-explorer info

# What cell types are in the dataset?
pnpm exec cbioportal-cell-explorer obs cell_type

# Search for a gene by symbol
pnpm exec cbioportal-cell-explorer var --search EGFR

# EGFR expression grouped by cell type
pnpm exec cbioportal-cell-explorer expression EGFR --group-by cell_type

# Use a different dataset
pnpm exec cbioportal-cell-explorer info --url https://example.com/my-dataset.zarr/

# JSON output for scripting
pnpm exec cbioportal-cell-explorer obs cell_type --json
```

Gene names can be specified as symbols (e.g. `EGFR`) or Ensembl IDs (e.g. `ENSG00000146648`) — the CLI resolves symbols automatically via the `feature_name` var column.

## Claude Code Skill

This repo includes a [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin with a `/zarr` skill that lets you query zarr datasets using natural language.

### Setup

Run Claude Code from the repo root with `--plugin-dir`:

```sh
claude --plugin-dir .
```

Then use the `/zarr` slash command:

```
/zarr what cell types are in this dataset?
/zarr show EGFR expression across cell types
/zarr is BRCA1 in this dataset?
/zarr what metadata columns are available?
```

Claude will translate your question into CLI commands, run them, and summarize the results.
21 changes: 21 additions & 0 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"name": "@cbioportal-cell-explorer/cli",
"version": "0.1.0",
"type": "module",
"bin": {
"cbioportal-cell-explorer": "./src/main.ts"
},
"exports": "./src/main.ts",
"scripts": {
"typecheck": "tsc --noEmit"
},
"dependencies": {
"@cbioportal-cell-explorer/zarrstore": "workspace:*",
"@petamoriken/float16": "^3.9.3"
},
"devDependencies": {
"@types/node": "^25.3.3",
"tsx": "^4.21.0",
"typescript": "^5.9.3"
}
Comment on lines +5 to +20
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bin points directly at a TypeScript file (./src/main.ts) with a #!/usr/bin/env tsx shebang. When installed via npx @cbioportal-cell-explorer/cli, tsx won’t reliably be on the user’s PATH (devDependencies aren’t installed, and even a nested dependency’s bin typically isn’t exposed). To make npx/global installs work, publish a JS entrypoint (e.g. build to dist/ and point bin/exports there) or use a small JS shim that invokes an internal TS runner in a way that doesn’t depend on tsx being globally available.

Copilot uses AI. Check for mistakes.
}
80 changes: 80 additions & 0 deletions packages/cli/src/commands/embedding.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import { parseArgs } from "node:util";
import type { GlobalOpts } from "../util/args.ts";
import { openStore } from "../util/args.ts";
import { printTable, printJson } from "../util/format.ts";

export async function embedding(opts: GlobalOpts): Promise<void> {
const { values, positionals } = parseArgs({
args: opts.rest,
options: {
url: { type: "string" },
json: { type: "boolean" },
help: { type: "boolean", short: "h" },
limit: { type: "string", short: "n" },
},
allowPositionals: true,
strict: false,
});

const key = positionals[0];
const limit = values.limit ? parseInt(values.limit as string, 10) : undefined;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit is parsed with parseInt(...) but not validated; invalid values can produce NaN, which makes display become NaN and results in no rows being printed/returned. Validate that limit is a finite positive integer and error out (or fall back to default) when it isn’t.

Suggested change
const limit = values.limit ? parseInt(values.limit as string, 10) : undefined;
const rawLimit = values.limit as string | undefined;
let limit: number | undefined;
if (rawLimit !== undefined) {
const parsed = Number.parseInt(rawLimit, 10);
if (!Number.isFinite(parsed) || parsed <= 0) {
console.error("`--limit` must be a positive integer.");
process.exit(1);
}
limit = parsed;
}

Copilot uses AI. Check for mistakes.

const adata = await openStore(opts.url);

if (!key) {
const keys = adata.obsmKeys();
if (opts.json) {
printJson(keys);
} else {
console.log(`Embeddings (${keys.length}):`);
for (const k of keys) {
console.log(` ${k}`);
}
}
return;
}

const result = await adata.obsm(key);

if (!("data" in result) || !("shape" in result)) {
console.error("Embedding data is not a dense array (sparse not supported for display).");
process.exit(1);
}

const { data, shape } = result;
const nRows = shape[0];
const nDims = shape.length > 1 ? shape[1] : 1;
const cap = opts.json ? (limit ?? nRows) : (limit ?? 10);
const display = Math.min(cap, nRows);

if (opts.json) {
const rows: number[][] = [];
for (let i = 0; i < display; i++) {
const row: number[] = [];
for (let d = 0; d < nDims; d++) {
row.push((data as ArrayLike<number>)[i * nDims + d]);
}
rows.push(row);
}
printJson({ key, shape, rows });
return;
}

console.log(`Embedding: ${key} | Shape: ${nRows.toLocaleString("en-US")} x ${nDims}`);
console.log();

const headers = ["Index", ...Array.from({ length: nDims }, (_, d) => `Dim${d + 1}`)];
const rows: (string | number)[][] = [];
for (let i = 0; i < display; i++) {
const row: (string | number)[] = [i];
for (let d = 0; d < nDims; d++) {
row.push((data as ArrayLike<number>)[i * nDims + d]);
}
rows.push(row);
}

printTable(headers, rows);
if (display < nRows) {
console.log(` ... ${(nRows - display).toLocaleString("en-US")} more rows (use --limit or --json)`);
}
}
94 changes: 94 additions & 0 deletions packages/cli/src/commands/expression.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import { parseArgs } from "node:util";
import type { GlobalOpts } from "../util/args.ts";
import { openStore } from "../util/args.ts";
import { printTable, printJson, formatNumber } from "../util/format.ts";
import { computeStats, computeGroupedStats } from "../util/stats.ts";
import { resolveGeneName } from "../util/gene.ts";

export async function expression(opts: GlobalOpts): Promise<void> {
const { values, positionals } = parseArgs({
args: opts.rest,
options: {
url: { type: "string" },
json: { type: "boolean" },
help: { type: "boolean", short: "h" },
"group-by": { type: "string", short: "g" },
sort: { type: "string" },
},
allowPositionals: true,
strict: false,
});

const geneInput = positionals[0];
if (!geneInput) {
console.error("Usage: cell-explorer expression <gene> [--group-by <obs_column>]");
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage error message refers to cell-explorer instead of the actual binary name cbioportal-cell-explorer, which is confusing and inconsistent with the README/USAGE text. Update the message to match the real command name.

Suggested change
console.error("Usage: cell-explorer expression <gene> [--group-by <obs_column>]");
console.error("Usage: cbioportal-cell-explorer expression <gene> [--group-by <obs_column>]");

Copilot uses AI. Check for mistakes.
process.exit(1);
}

const groupBy = values["group-by"] as string | undefined;
const sortBy = (values.sort as string | undefined) ?? "count";

const adata = await openStore(opts.url);

const geneName = await resolveGeneName(adata, geneInput);
const displayName = geneName === geneInput ? geneName : `${geneInput} (${geneName})`;
const expr = await adata.geneExpression(geneName);

if (!groupBy) {
const stats = computeStats(expr as ArrayLike<number>);
if (opts.json) {
printJson({ gene: geneInput, resolvedId: geneName, ...stats });
} else {
console.log(`Gene: ${displayName} | ${stats.count.toLocaleString("en-US")} cells`);
console.log();
printTable(
["Stat", "Value"],
[
["Count", stats.count],
["Mean", stats.mean],
["Median", stats.median],
["Std", stats.std],
["Min", stats.min],
["Max", stats.max],
["Non-zero", `${stats.nonzeroCount.toLocaleString("en-US")} (${formatNumber(stats.nonzeroPct)}%)`],
],
);
}
return;
}

const labels = await adata.obsColumn(groupBy);
const grouped = computeGroupedStats(expr as ArrayLike<number>, labels as ArrayLike<string | number | null>);

// Sort
const sortKey = sortBy as keyof (typeof grouped)[0];
if (sortKey && sortKey !== "count") {
grouped.sort((a, b) => {
const va = a[sortKey];
const vb = b[sortKey];
if (typeof va === "number" && typeof vb === "number") return vb - va;
return 0;
});
}

if (opts.json) {
printJson({ gene: geneInput, resolvedId: geneName, groupBy, groups: grouped });
} else {
const totalCells = grouped.reduce((s, g) => s + g.count, 0);
console.log(`Gene: ${displayName} | Grouped by: ${groupBy} | ${totalCells.toLocaleString("en-US")} cells`);
console.log();
printTable(
["Group", "Count", "Mean", "Median", "Std", "Min", "Max", "NonZero%"],
grouped.map((g) => [
g.group,
g.count,
g.mean,
g.median,
g.std,
g.min,
g.max,
`${g.nonzeroPct.toFixed(1)}%`,
]),
);
}
}
38 changes: 38 additions & 0 deletions packages/cli/src/commands/info.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import type { GlobalOpts } from "../util/args.ts";
import { openStore } from "../util/args.ts";
import { printJson } from "../util/format.ts";

export async function info(opts: GlobalOpts): Promise<void> {
const adata = await openStore(opts.url);

const obsColumns = await adata.obsColumns();
const varColumns = await adata.varColumns();
const obsmKeys = adata.obsmKeys();

if (opts.json) {
printJson({
url: opts.url,
shape: adata.shape,
nObs: adata.nObs,
nVar: adata.nVar,
attrs: adata.attrs,
obsColumns,
varColumns,
obsmKeys,
});
return;
}

const urlName = opts.url.replace(/\/$/, "").split("/").pop() ?? opts.url;
console.log(`Dataset: ${urlName}`);
console.log(`Shape: ${adata.nObs.toLocaleString("en-US")} cells x ${adata.nVar.toLocaleString("en-US")} genes`);
console.log();
console.log(`Obs columns (${obsColumns.length}):`);
console.log(` ${obsColumns.join(", ")}`);
console.log();
console.log(`Var columns (${varColumns.length}):`);
console.log(` ${varColumns.join(", ")}`);
console.log();
console.log(`Embeddings (${obsmKeys.length}):`);
console.log(` ${obsmKeys.length > 0 ? obsmKeys.join(", ") : "(none)"}`);
}
Loading
Loading