Skip to content

Commit 2862055

Browse files
slobentanzerclaude
andauthored
docs(landing): align commands surface and link tutorial as canonical onboarding (#28)
- Add a tutorial admonition at the top of the manual quick-start and list it first in the reading order; surface the tutorial early in the README too. - Rewrite the docs/index.md Commands section to reflect what is actually wired today: new Data acquisition + tracking group (get, add, mv, rm, queue, mark), move propose-alignment under semantic mapping, separate annotate/config from the standard build path, and split out a Stubs / not yet wired group (discover, benchmark, read, chat, search). - Point readers from `biotope add` to the `.biotope.yaml` review + `annotate apply` loop for curated metadata that does not fit as CLI flags, and fix the annotate entry to describe each subcommand's actual scope (dataset/record-set, not field-level). - README quick-start now mirrors the tutorial's spine (uvx init -> uv sync -> get -> add -> queue -> map -> build -> create_knowledge_graph.py -> view) and drops `annotate apply` from the minimum path. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 475fc3b commit 2862055

2 files changed

Lines changed: 75 additions & 45 deletions

File tree

README.md

Lines changed: 44 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -9,46 +9,54 @@ CLI for the BioCypher ecosystem: Croissant-described data → BioCypher knowledg
99

1010
**Status: pre-alpha, developer-facing.** APIs and CLI will change. Not yet suitable for end users.
1111

12+
## Start here
13+
14+
The fastest way in is the **[tutorial](https://biocypher.github.io/biotope/tutorial/)**
15+
— a 15-minute end-to-end walk-through that builds a real knowledge graph from
16+
public airport/flight data. It is the canonical, most up-to-date onboarding
17+
path; the snippet below is just a flavour preview.
18+
1219
## Install
1320

1421
```bash
15-
uv pip install biotope # or
16-
uv pip install -e ".[dev]" # editable, with test deps
22+
uv add biotope # in a uv-managed venv
23+
pipx install biotope # global install
24+
uvx biotope init my-kg # no install: ephemeral venv for the scaffolder
25+
uv pip install -e ".[dev]" # editable, with test deps (for biotope itself)
1726
```
1827

1928
## From `init` to a knowledge graph
2029

2130
```bash
22-
# 1. Scaffold a project.
23-
biotope init my-kg --purpose "What approved drugs target genes in T2D?"
24-
cd my-kg
31+
# 1. Scaffold a project (uvx works fine here — no local install needed).
32+
uvx biotope init my-kg --purpose "What approved drugs target genes in T2D?" --no-prompt
33+
cd my-kg && uv sync
2534

26-
# 2. Declare intent — what entities and relations the graph must contain.
27-
# Non-interactive (agent-friendly):
28-
biotope map --entity gene --entity disease --entity drug \
29-
--relation gene_associated_with_disease
35+
# 2. Bring in data — biotope get downloads + tracks; biotope add stages a
36+
# local file/folder and runs croissant-baker over it.
37+
uv run biotope get https://example.org/opentargets.parquet --output-dir data/ot --no-add
38+
uv run biotope add data/ot --license CC-BY-4.0 --creator "Open Targets"
3039

31-
# 3. Bring in data and its Croissant metadata.
32-
biotope add data/opentargets --license CC-BY-4.0 --creator "Open Targets"
33-
biotope annotate apply data/opentargets # after reviewing data/opentargets/.biotope.yaml
40+
# 3. Inspect the pipeline state at any time.
41+
uv run biotope queue # raw / processed / mapped, with provenance footer
3442

35-
# 4. Generate an unresolved mapping scaffold from your declared intent.
36-
# The file has one slot per entity/relation plus an inspector appendix
37-
# listing record sets, field kinds, identifier-like fields, and sample rows.
38-
biotope map scaffold .biotope/datasets/data/opentargets.jsonld
43+
# 4. Declare intent — what entities and relations the graph must contain.
44+
# Non-interactive (agent-friendly); without flags, `biotope map` opens
45+
# a wizard that captures intent and resolves slots in one flow.
46+
uv run biotope map --entity gene --entity disease --entity drug \
47+
--relation gene_associated_with_disease
3948

4049
# 5. Resolve the slots. Two equivalent paths:
41-
# a) Wizard (humans): biotope map
42-
# b) Edit `mappings/*.yaml` directly, then validate (agents):
43-
# biotope map inspect <croissant> --json # field catalogue
44-
# biotope map preview --json # status + projected schema + sample tuples
45-
46-
# 6. Optional: align entities across multiple mappings.
47-
biotope propose-alignment mappings/*.mapping.yaml --out alignment.yaml
48-
49-
# 7. Build a runnable BioCypher project. Strict: rejects unresolved slots.
50-
biotope build
51-
biotope view
50+
# a) Wizard (humans): uv run biotope map
51+
# b) Edit mappings/*.mapping.yaml directly, then validate (agents):
52+
# uv run biotope map scaffold .biotope/datasets/data/ot.jsonld
53+
# uv run biotope map inspect <croissant> --json # field catalogue
54+
# uv run biotope map preview --json # projected schema + tuples
55+
56+
# 6. Build a runnable BioCypher project. Strict: rejects unresolved slots.
57+
uv run biotope build
58+
uv run python build/create_knowledge_graph.py
59+
uv run biotope view
5260
```
5361

5462
`biotope init` is a pure scaffolder. All non-autogeneratable metadata is supplied as CLI flags — by a user or an agent reading `AGENTS.md`. Semantic decisions (which record set, which fields, which transforms) are made by the human or copilot; biotope only enumerates options, validates, and previews.
@@ -70,14 +78,17 @@ The agent surface is `AGENTS.md` (template lives at `biotope/templates/AGENTS.md
7078

7179
```
7280
init project scaffolding
81+
get add mv rm acquisition + tracking (baker writes croissants)
82+
queue mark pipeline-state dashboard + manual transitions
7383
map (inspect|scaffold|preview) semantic mapping (intent + wizard + agent path)
74-
add mv status commit log push pull git-like metadata VCS
75-
check-data checksum verification
76-
discover registry-aware source ranking
7784
propose-alignment cross-mapping same_node equivalences
78-
build view benchmark build + inspect a graph
79-
read chat NLP ingestion, conversation (promises)
80-
search annotate get config legacy / auxiliary
85+
build view build + inspect a graph
86+
status commit log push pull git-like metadata VCS
87+
check-data checksum verification
88+
annotate config field-level annotation + project config
89+
90+
discover benchmark scaffolded but not yet wired into the standard flow
91+
read chat search promises / auxiliary
8192
```
8293

8394
`biotope describe` and the heuristic `biotope propose-mapping` were removed/deprecated. Intent capture is now `biotope map --entity ... --relation ...`; scaffolding is `biotope map scaffold`. `propose-mapping` remains as a deprecated alias for the scaffold subcommand.

docs/index.md

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,13 @@ APIs, CLI flags, and config-file layouts will change. End-user docs come after t
2222

2323
## Quick start: without coding agent
2424

25+
!!! tip "Prefer a worked end-to-end example?"
26+
27+
The [**Tutorial**](tutorial.md) walks through building a real knowledge graph
28+
from public airport/flight data in ~15 minutes. It's the most up-to-date
29+
onboarding path and the source of truth for the recommended workflow
30+
(`init` → `get` → `add` → `queue` → `map` → `build`).
31+
2532
```bash
2633
uv pip install biotope
2734

@@ -61,17 +68,24 @@ All semantic decisions (which record set, which fields, which transforms) are ma
6168

6269
- `biotope init` — scaffold a project (`.biotope/`, `AGENTS.md`, `project.yaml`, `git init`).
6370

71+
### Data acquisition + tracking
72+
73+
- `biotope get <url>` — download a file (optionally into `--output-dir`) and, unless `--no-add`, track it.
74+
- `biotope add <path>` — stage data files or rooted directories; baker writes the Croissant entry under `.biotope/datasets/`. `--derived-from` records provenance for human/agent-extracted derivatives. For curated metadata that doesn't fit as CLI flags (descriptions, citations, per–record-set fields), `add` also drops a `.biotope.yaml` scaffold next to the dataset — review it, then run `biotope annotate apply <dir>` to merge it into the manifest.
75+
- `biotope mv` / `biotope rm` — move or untrack files and update metadata paths.
76+
- `biotope queue` — show every dataset grouped by pipeline state (`raw` / `processed` / `mapped`). The recommended dashboard during a build.
77+
- `biotope mark <dataset> <status>` — manually set a dataset's `biotope:status`.
78+
6479
### Semantic mapping
6580

6681
- `biotope map` — bare command. If any intent flag (`--purpose`, `--entity`, `--relation`, `--source`, `--notes`, `--clear-*`, `--show`) is passed, it updates `project.yaml` non-interactively. Otherwise it launches the guided wizard.
6782
- `biotope map inspect <croissant>` — deterministic field catalogue + sample rows. `--json` for agents.
6883
- `biotope map scaffold <croissant>` — emit an unresolved mapping scaffold with an inspector comment appendix.
6984
- `biotope map preview [<mapping>]` — validate a (partial) mapping; show projected BioCypher schema + sample tuples. `--json` for agents.
85+
- `biotope propose-alignment` — propose cross-mapping `same_node` equivalences.
7086

7187
### Git-like metadata VCS
7288

73-
- `biotope add` — stage data files; baker enriches the Croissant entry under `.biotope/datasets/`.
74-
- `biotope mv` — move tracked files; updates metadata paths.
7589
- `biotope status` — show staged/modified files and validation state.
7690
- `biotope commit` — commit metadata changes.
7791
- `biotope log` — show metadata commit history.
@@ -80,25 +94,30 @@ All semantic decisions (which record set, which fields, which transforms) are ma
8094

8195
### Knowledge-graph construction
8296

83-
- `biotope discover` — rank registered adapters and local Croissant files against `required_entities`.
84-
- `biotope propose-alignment` — propose cross-mapping `same_node` equivalences.
8597
- `biotope build` — materialise a runnable BioCypher project from mappings + alignment. Emits `config/schema_config.yaml` (with `namespace` and autogenerated `input_label`) and per-mapping generated Python under `build/generated/<stem>/`.
86-
- `biotope view` — node/edge counts for the most recent build.
87-
- `biotope benchmark` — quality/coverage metrics (skeleton in v1).
98+
- `biotope view` — node/edge counts for the most recent build (or project competence questions if no build yet).
99+
100+
### Annotation + project config
101+
102+
- `biotope annotate``apply` (merge a curated `.biotope.yaml` scaffold into a dataset's Croissant manifest, with optional `--set dataset.<field>=…` / `--set record_set.<field>=…` overrides), `edit` (interactive annotation), `load` (sample records via the manifest), `validate` (mlcroissant validation).
103+
- `biotope config` — manage project-level validation rules, remote validation URLs, and project metadata.
104+
105+
### Stubs / not yet wired
106+
107+
- `biotope discover` — rank registered adapters and local Croissant files against `required_entities`. Exists as a CLI entry but the registry surface is not yet wired into the recommended workflow; the tutorial does not use it.
108+
- `biotope benchmark` — quality/coverage metrics. v1 stub: emits a skeleton JSON object so downstream tooling can structure-test against it. Real metric implementations land iteratively.
109+
- `biotope read` — NLP ingestion + health-check entry. Promise.
110+
- `biotope chat` — provider-agnostic conversational interface (biochatter backend). Promise.
111+
- `biotope search` — registry search across MCP / biotools. Auxiliary; not used in the standard build path.
88112

89113
### Deprecated
90114

91115
- `biotope describe` — removed; folded into `biotope map` intent flags.
92116
- `biotope propose-mapping` — deprecated alias for `biotope map scaffold`. The old heuristic ("one RecordSet per node type, FK fields as edges") is gone; the alias now produces an unresolved scaffold for human/agent completion.
93117

94-
### Promises (not feature-complete)
95-
96-
- `biotope read` — NLP ingestion + health-check entry.
97-
- `biotope chat` — provider-agnostic conversational interface (biochatter backend ships first).
98-
- `biotope search` / `biotope get` / `biotope annotate` / `biotope config` — auxiliary surfaces.
99-
100118
## Reading order
101119

120+
1. [Tutorial](tutorial.md) — 15-minute end-to-end walk-through; the ground-truth onboarding path.
102121
1. [Architecture](architecture.md) — modules, data flow, config files.
103122
1. [Project context](project-context.md) — project layout and `.biotope/` files.
104123
1. [Commands](api-docs/init.md) — per-command reference, generated from docstrings.

0 commit comments

Comments
 (0)