Skip to content

feat: add CLI tool and Claude Code /zarr skill#203

Open
inodb wants to merge 1 commit intomainfrom
feat/cli-and-zarr-skill
Open

feat: add CLI tool and Claude Code /zarr skill#203
inodb wants to merge 1 commit intomainfrom
feat/cli-and-zarr-skill

Conversation

@inodb
Copy link
Copy Markdown
Member

@inodb inodb commented Mar 25, 2026

Summary

  • Adds @cbioportal-cell-explorer/cli package — a Node CLI tool for querying AnnData zarr datasets over HTTP, reusing the existing @cbioportal-cell-explorer/zarrstore library
  • Adds a Claude Code plugin with a /zarr slash skill for natural language queries against zarr datasets

CLI Commands

Command Description
info Dataset overview (shape, columns, embeddings)
obs [column] List obs columns or show value counts
var [--search pattern] List or search genes by symbol or Ensembl ID
expression <gene> [--group-by col] Expression stats, optionally grouped
embedding [key] List or dump embedding coordinates

All commands support --url <zarr_url> to override the default dataset and --json for machine-readable output. Gene symbols (e.g. EGFR) are resolved automatically via the feature_name var column.

Example

$ cbioportal-cell-explorer expression EGFR --group-by cell_type

Gene: EGFR (ENSG00000146648) | Grouped by: cell_type | 927,205 cells

Group                                       Count     Mean   Median     Std      Min      Max  NonZero%
T cell                                    250,334  -0.2980  -0.2991  0.0728  -0.2991   7.2461  100.0%
fallopian tube secretory epithelial cell  250,281   0.1697  -0.2991  1.1745  -0.2991       10  100.0%
monocyte                                  201,217  -0.2965  -0.2991  0.0884  -0.2991   7.4453  100.0%
fibroblast                                161,198   0.6696  -0.2991  1.6591  -0.2991   9.9219  100.0%
...

Claude Code Skill

The /zarr skill translates natural language into CLI commands:

/zarr what cell types are in this dataset?
/zarr show EGFR expression across cell types
/zarr is BRCA1 in this dataset?

Usage: claude --plugin-dir . from the repo root.

Test plan

  • pnpm install succeeds
  • pnpm exec cbioportal-cell-explorer info returns dataset overview
  • pnpm exec cbioportal-cell-explorer obs cell_type shows value counts
  • pnpm exec cbioportal-cell-explorer var --search EGFR finds EGFR
  • pnpm exec cbioportal-cell-explorer expression EGFR --group-by cell_type shows grouped stats
  • pnpm exec cbioportal-cell-explorer embedding X_umap50 --limit 5 dumps coordinates
  • --json flag produces valid JSON for all commands
  • claude --plugin-dir . discovers the /zarr skill

🤖 Generated with Claude Code

…sets

Add `@cbioportal-cell-explorer/cli` package that reuses the zarrstore
library to query AnnData-formatted zarr datasets from the command line.

Commands:
- `info` — dataset overview (shape, columns, embeddings)
- `obs [column]` — list obs columns or show value counts
- `var [--search pattern]` — list or search gene names by symbol or ID
- `expression <gene> [--group-by col]` — expression stats per group
- `embedding [key]` — list or dump embedding coordinates

Also adds a Claude Code plugin with a `/zarr` skill that translates
natural language queries into CLI commands (e.g. "what is the expression
of EGFR across cell types?").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new @cbioportal-cell-explorer/cli package to query AnnData zarr datasets over HTTP (via the existing @cbioportal-cell-explorer/zarrstore), and adds a Claude Code plugin skill (/zarr) that maps natural-language questions to CLI commands.

Changes:

  • Added a Node/TypeScript CLI (cbioportal-cell-explorer) with commands for dataset info, obs/var inspection, gene expression stats, and embeddings.
  • Added a Claude Code skill definition (/zarr) documenting how to run the CLI and how to map user intents to commands.
  • Updated repo docs and lockfile to include the new package and dependencies.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
skills/zarr/SKILL.md Adds Claude Code /zarr skill instructions and command mappings
.claude-plugin/plugin.json Declares the Claude plugin metadata
packages/cli/package.json Defines the new CLI package, bin entry, and dependencies
packages/cli/tsconfig.json Adds TS config for the CLI package
packages/cli/src/main.ts CLI entrypoint and command dispatch
packages/cli/src/commands/*.ts Implements info, obs, var, expression, embedding commands
packages/cli/src/util/*.ts Adds argument parsing, formatting, gene resolution, and stats utilities
README.md Documents CLI usage and the Claude /zarr skill
pnpm-lock.yaml Locks new dependencies and workspace importer for packages/cli
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

});

const search = values.search as string | undefined;
const limit = values.limit ? parseInt(values.limit as string, 10) : undefined;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit is parsed with parseInt(...) but the result isn’t validated. If a user passes a non-numeric value (or --limit 0/negative), limit can become NaN and slice(0, cap) will return an empty list with no clear error. Validate limit (finite integer > 0) and fail with a helpful message when invalid.

Suggested change
const limit = values.limit ? parseInt(values.limit as string, 10) : undefined;
const rawLimit = values.limit as string | undefined;
let limit: number | undefined;
if (rawLimit !== undefined) {
const parsedLimit = Number.parseInt(rawLimit, 10);
if (!Number.isFinite(parsedLimit) || parsedLimit <= 0) {
console.error(
`Invalid --limit value "${rawLimit}". Please provide a positive integer.`,
);
process.exitCode = 1;
return;
}
limit = parsedLimit;
}

Copilot uses AI. Check for mistakes.
});

const key = positionals[0];
const limit = values.limit ? parseInt(values.limit as string, 10) : undefined;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit is parsed with parseInt(...) but not validated; invalid values can produce NaN, which makes display become NaN and results in no rows being printed/returned. Validate that limit is a finite positive integer and error out (or fall back to default) when it isn’t.

Suggested change
const limit = values.limit ? parseInt(values.limit as string, 10) : undefined;
const rawLimit = values.limit as string | undefined;
let limit: number | undefined;
if (rawLimit !== undefined) {
const parsed = Number.parseInt(rawLimit, 10);
if (!Number.isFinite(parsed) || parsed <= 0) {
console.error("`--limit` must be a positive integer.");
process.exit(1);
}
limit = parsed;
}

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +87
export function computeGroupedStats(
values: ArrayLike<number>,
labels: ArrayLike<string | number | null>,
): GroupStats[] {
const groups = new Map<string, number[]>();

for (let i = 0; i < values.length; i++) {
const label = String(labels[i] ?? "null");
let arr = groups.get(label);
if (!arr) {
arr = [];
groups.set(label, arr);
}
arr.push(values[i]);
}

const results: GroupStats[] = [];
for (const [group, vals] of groups) {
const s = computeStats(vals);
results.push({ group, ...s });
}

// Sort by count descending
results.sort((a, b) => b.count - a.count);
return results;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computeGroupedStats builds per-group number[] arrays and then computeStats copies/sorts those values again to compute medians. For large datasets (hundreds of thousands of cells) this can be slow and memory-heavy. Consider computing per-group stats in a streaming/2-pass way (count/sum/sumSq/min/max/nonzero without storing all values) and making median optional/approximate for grouped stats to keep the CLI responsive.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +61
export function computeStats(values: ArrayLike<number>): Omit<GroupStats, "group"> {
const n = values.length;
if (n === 0) {
return { count: 0, mean: 0, median: 0, std: 0, min: 0, max: 0, nonzeroCount: 0, nonzeroPct: 0 };
}

let sum = 0;
let min = Infinity;
let max = -Infinity;
let nonzero = 0;

for (let i = 0; i < n; i++) {
const v = values[i];
sum += v;
if (v < min) min = v;
if (v > max) max = v;
if (v !== 0) nonzero++;
}

const mean = sum / n;

let sumSqDiff = 0;
for (let i = 0; i < n; i++) {
const d = values[i] - mean;
sumSqDiff += d * d;
}
const std = Math.sqrt(sumSqDiff / n);

// Sort a copy for median
const sorted = Array.from(values as ArrayLike<number>).sort((a, b) => a - b);

return {
count: n,
mean,
median: median(sorted),
std,
min,
max,
nonzeroCount: nonzero,
nonzeroPct: (nonzero / n) * 100,
};
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new CLI package adds non-trivial logic (arg parsing, gene resolution, stats computation) but there are currently no automated tests under packages/cli. Since the repo already uses Vitest elsewhere, consider adding basic unit tests for computeStats/computeGroupedStats and resolveGeneName (including not-found and case-insensitive column matching) to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment thread packages/cli/src/main.ts
Comment on lines +38 to +55
async function main() {
const opts = parseGlobalArgs(process.argv);

if (opts.help && !opts.command) {
console.log(USAGE);
process.exit(0);
}

const handler = COMMANDS[opts.command];
if (!handler) {
if (opts.command) {
console.error(`Unknown command: ${opts.command}`);
}
console.error(USAGE);
process.exit(1);
}

await handler(opts);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--help only prints the top-level usage when no subcommand is provided. When users run cbioportal-cell-explorer <command> --help, the flag is parsed but ignored, so the command executes instead of showing help. Handle opts.help when a command is present (either in main() before dispatch, or inside each command handler) and exit 0 after printing the relevant usage.

Copilot uses AI. Check for mistakes.
Comment thread packages/cli/src/main.ts
Comment on lines +8 to +13
import { parseGlobalArgs } from "./util/args.ts";
import { info } from "./commands/info.ts";
import { obs } from "./commands/obs.ts";
import { varCmd } from "./commands/var.ts";
import { expression } from "./commands/expression.ts";
import { embedding } from "./commands/embedding.ts";
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative imports include the .ts extension (e.g. ./util/args.ts), but the rest of the repo uses extensionless TS imports. Aligning with the existing convention avoids inconsistent module specifiers and reduces friction if any code is later bundled/compiled with different resolution rules.

Suggested change
import { parseGlobalArgs } from "./util/args.ts";
import { info } from "./commands/info.ts";
import { obs } from "./commands/obs.ts";
import { varCmd } from "./commands/var.ts";
import { expression } from "./commands/expression.ts";
import { embedding } from "./commands/embedding.ts";
import { parseGlobalArgs } from "./util/args";
import { info } from "./commands/info";
import { obs } from "./commands/obs";
import { varCmd } from "./commands/var";
import { expression } from "./commands/expression";
import { embedding } from "./commands/embedding";

Copilot uses AI. Check for mistakes.

const geneInput = positionals[0];
if (!geneInput) {
console.error("Usage: cell-explorer expression <gene> [--group-by <obs_column>]");
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage error message refers to cell-explorer instead of the actual binary name cbioportal-cell-explorer, which is confusing and inconsistent with the README/USAGE text. Update the message to match the real command name.

Suggested change
console.error("Usage: cell-explorer expression <gene> [--group-by <obs_column>]");
console.error("Usage: cbioportal-cell-explorer expression <gene> [--group-by <obs_column>]");

Copilot uses AI. Check for mistakes.
Comment thread skills/zarr/SKILL.md
Comment on lines +21 to +24
2. **Via npx** (if the CLI is published to npm):
```bash
npx @cbioportal-cell-explorer/cli <command> [options]
```
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SKILL.md suggests trying npx @cbioportal-cell-explorer/cli ... as the second option, but the current CLI entrypoint requires the tsx runtime. Unless the published package ships compiled JS or includes tsx as a runtime dependency, this npx invocation will fail. Update the instructions to either use the tsx fallback directly or note the packaging requirement.

Copilot uses AI. Check for mistakes.
Comment thread packages/cli/package.json
Comment on lines +5 to +20
"bin": {
"cbioportal-cell-explorer": "./src/main.ts"
},
"exports": "./src/main.ts",
"scripts": {
"typecheck": "tsc --noEmit"
},
"dependencies": {
"@cbioportal-cell-explorer/zarrstore": "workspace:*",
"@petamoriken/float16": "^3.9.3"
},
"devDependencies": {
"@types/node": "^25.3.3",
"tsx": "^4.21.0",
"typescript": "^5.9.3"
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bin points directly at a TypeScript file (./src/main.ts) with a #!/usr/bin/env tsx shebang. When installed via npx @cbioportal-cell-explorer/cli, tsx won’t reliably be on the user’s PATH (devDependencies aren’t installed, and even a nested dependency’s bin typically isn’t exposed). To make npx/global installs work, publish a JS entrypoint (e.g. build to dist/ and point bin/exports there) or use a small JS shim that invokes an internal TS runner in a way that doesn’t depend on tsx being globally available.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants