Skip to content

Commit c57a39c

Browse files
committed
Deprecate python 310 and add instructions file
1 parent 5c6653f commit c57a39c

File tree

4 files changed

+144
-5
lines changed

4 files changed

+144
-5
lines changed

.github/copilot-instructions.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# Copilot Instructions for CellAnnotator
2+
3+
## Important Notes
4+
- Avoid drafting summary documents or endless markdown files. Just summarize in chat what you did, why, and any open questions.
5+
- Don't update Jupyter notebooks - those are managed manually.
6+
- When running terminal commands, activate the appropriate environment first (`mamba activate cell_annotator`).
7+
- Rather than making assumptions, ask for clarification when uncertain.
8+
- **GitHub workflows**: Use GitHub CLI (`gh`) when possible. For GitHub MCP server tools, ensure Docker Desktop is running first (`open -a "Docker Desktop"`).
9+
10+
## Project Overview
11+
12+
**CellAnnotator** is an scverse ecosystem package for automated cell type annotation in scRNA-seq data using Large Language Models (LLMs). It's provider-agnostic, supporting OpenAI, Google Gemini, and Anthropic Claude. The tool sends cluster marker genes (not expression values) to LLMs, which return structured cell type annotations with confidence scores.
13+
14+
### Domain Context (Brief)
15+
- **AnnData**: Standard single-cell data structure. Contains `.X`, `.obs` (cell metadata), `.var` (gene metadata).
16+
- **Marker genes**: Differentially expressed genes that characterize cell types/clusters (computed via scanpy).
17+
- **LLM providers**: OpenAI (GPT), Google (Gemini), Anthropic (Claude). Uses Pydantic for structured outputs.
18+
- **Workflow**: 1) Compute marker genes per cluster, 2) Send to LLM with biological context, 3) Get structured annotations, 4) Harmonize across samples.
19+
20+
### Key Dependencies`
21+
- **Core**: scanpy, pydantic, python-dotenv, rich
22+
- **LLM providers**: openai, anthropic, google-genai (all optional)
23+
- **Optional**: rapids-singlecell (GPU), colorspacious (colors)
24+
25+
## Architecture & Code Organization
26+
27+
### Module Structure (follows scverse conventions)
28+
- Use `AnnData` objects as primary data structure
29+
- Type annotations use modern syntax: `str | None` instead of `Optional[str]`
30+
- Supports Python 3.11, 3.12, 3.13 (see `pyproject.toml`)
31+
- Avoid local imports unless necessary for circular import resolution
32+
33+
### Core Components
34+
1. **`src/cell_annotator/model/cell_annotator.py`**: Main `CellAnnotator` class
35+
- Orchestrates annotation across multiple samples
36+
- `annotate_clusters()`: Main entry point for annotation
37+
2. **`src/cell_annotator/model/sample_annotator.py`**: `SampleAnnotator` class
38+
- Handles annotation for single sample
39+
- Computes marker genes, queries LLM, stores results
40+
3. **`src/cell_annotator/model/base_annotator.py`**: `BaseAnnotator` abstract class
41+
- Shared LLM provider logic and validation
42+
4. **`src/cell_annotator/_response_formats.py`**: Pydantic models for structured LLM outputs
43+
5. **`src/cell_annotator/_prompts.py`**: LLM prompt templates
44+
6. **`src/cell_annotator/utils.py`**: Helper functions (marker gene filtering, formatting)
45+
46+
## Development Workflow
47+
48+
### Environment Management (Hatch-based)
49+
```bash
50+
# Testing - NEVER use pytest directly
51+
hatch test # test with highest Python version
52+
hatch test --all # test all Python 3.11 & 3.13 + pre-release
53+
54+
# Documentation
55+
hatch run docs:build # build Sphinx docs
56+
hatch run docs:open # open in browser
57+
hatch run docs:clean # clean build artifacts
58+
59+
# Environment inspection
60+
hatch env show # list environments
61+
```
62+
63+
### Testing Strategy
64+
- Test matrix defined in `[[tool.hatch.envs.hatch-test.matrix]]` in `pyproject.toml`
65+
- Tests Python 3.11 & 3.13 with stable deps, 3.13 with pre-release deps
66+
- Tests live in `tests/`, use pytest with `@pytest.mark.real_llm_query` for actual LLM calls
67+
- Run via `hatch test` to ensure proper environment isolation
68+
- Optional dependencies tested via `features = ["test"]` which includes all providers
69+
70+
### Code Quality Tools
71+
- **Ruff**: Linting and formatting (120 char line length)
72+
- **Biome**: JSON/JSONC formatting with trailing commas
73+
- **Pre-commit**: Auto-runs ruff, biome. Install with `pre-commit install`
74+
- Use `git pull --rebase` if pre-commit.ci commits to your branch
75+
76+
## Key Configuration Files
77+
78+
### `pyproject.toml`
79+
- **Build**: `hatchling` with `hatch-vcs` for git-based versioning
80+
- **Dependencies**: Minimal core (scanpy, pydantic); provider packages are optional extras
81+
- **Extras**: `[openai]`, `[anthropic]`, `[gemini]`, `[all-providers]`, `[test]`, `[doc]`
82+
- **Ruff**: 120 char line length, NumPy docstring convention
83+
- **Test matrix**: Python 3.11 & 3.13
84+
85+
### Version Management
86+
- Version from git tags via `hatch-vcs`
87+
- Release: Create GitHub release with tag `vX.X.X`
88+
- Follows **Semantic Versioning**
89+
90+
## Project-Specific Patterns
91+
92+
### Basic Usage
93+
```python
94+
from cell_annotator import CellAnnotator
95+
96+
# Annotate across multiple samples
97+
cell_ann = CellAnnotator(
98+
adata,
99+
species="human",
100+
tissue="heart",
101+
cluster_key="leiden",
102+
sample_key="batch",
103+
provider="openai", # or "gemini", "anthropic"
104+
).annotate_clusters()
105+
106+
# Results in adata.obs['cell_type_predicted']
107+
```
108+
109+
### LLM Provider Selection
110+
- Providers: `"openai"` (default), `"gemini"`, `"anthropic"`
111+
- API keys via environment variables or `.env` file (loaded with python-dotenv)
112+
- Models: `gpt-4o-mini` (default), `gpt-4o`, `gemini-2.0-flash-exp`, `claude-3-5-sonnet-20241022`
113+
114+
### Structured Outputs with Pydantic
115+
- `CellTypeListOutput`: List of expected cell types
116+
- `ExpectedMarkerGeneOutput`: Dict of cell type → marker genes
117+
- Ensures reliable, parseable LLM responses
118+
119+
### AnnData Conventions
120+
- Marker genes computed via `scanpy.tl.rank_genes_groups()`
121+
- Results stored in `adata.obs[cell_type_key]` (default: `"cell_type_predicted"`)
122+
- Confidence scores in `adata.obs[f"{cell_type_key}_confidence"]`
123+
124+
## Common Gotchas
125+
126+
1. **Hatch for testing**: Always use `hatch test`, never standalone `pytest`. CI matches hatch test matrix.
127+
2. **API keys**: Must be set as env vars or in `.env` file. Package auto-loads via python-dotenv.
128+
3. **Provider packages**: Install provider extras (`pip install cell-annotator[openai]`) to use specific LLMs.
129+
4. **Real LLM tests**: Use `@pytest.mark.real_llm_query` and skip in CI unless explicitly enabled.
130+
5. **Marker gene filtering**: Package automatically filters marker genes to genes present in `adata.var_names`.
131+
6. **Pre-commit conflicts**: Use `git pull --rebase` to integrate pre-commit.ci fixes.
132+
7. **Line length**: Ruff set to 120 chars, but keep docstrings readable (~80 chars per line).
133+
134+
## Related Resources
135+
136+
- **Contributing guide**: `docs/contributing.md`
137+
- **Tutorials**: `docs/notebooks/tutorials/`
138+
- **OpenAI structured outputs**: https://platform.openai.com/docs/guides/structured-outputs
139+
- **scanpy docs**: https://scanpy.readthedocs.io/
140+
- **Pydantic docs**: https://docs.pydantic.dev/

.readthedocs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ version: 2
33
build:
44
os: ubuntu-20.04
55
tools:
6-
python: "3.10"
6+
python: "3.11"
77
sphinx:
88
configuration: docs/conf.py
99
# disable this for more lenient docs builds

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
3535

3636
## 📦 Installation
37-
You need to have 🐍 Python 3.10 or newer installed on your system.
37+
You need to have 🐍 Python 3.11 or newer installed on your system.
3838
If you don't have Python installed, we recommend installing [Mambaforge][].
3939

4040

pyproject.toml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,9 @@ maintainers = [
1313
authors = [
1414
{ name = "Marius Lange" },
1515
]
16-
requires-python = ">=3.10"
16+
requires-python = ">=3.11"
1717
classifiers = [
1818
"Programming Language :: Python :: 3 :: Only",
19-
"Programming Language :: Python :: 3.10",
2019
"Programming Language :: Python :: 3.11",
2120
"Programming Language :: Python :: 3.12",
2221
"Programming Language :: Python :: 3.13",
@@ -115,7 +114,7 @@ scripts.clean = "git clean -fdX -- {args:docs}"
115114
# Test the lowest and highest supported Python versions with normal deps
116115
[[tool.hatch.envs.hatch-test.matrix]]
117116
deps = [ "stable" ]
118-
python = [ "3.10", "3.13" ]
117+
python = [ "3.11", "3.13" ]
119118

120119
# Test the newest supported Python version also with pre-release deps
121120
[[tool.hatch.envs.hatch-test.matrix]]

0 commit comments

Comments
 (0)