A local-first SNOMED CT toolchain that's 10-100x faster than IHTSDO Snowstorm. One binary — from raw RF2 release to NDJSON, then SQL, Parquet, Markdown, TUI, GUI, graphs and MCP/LLM tool use. All on your machine, no network calls, REST APIs, or external servers required.
This is very much a work in progress, but it's ready to use and I would very much like feedback on how it performs for you.
RF2 Snapshot
│
▼ sct ndjson (~10s for 831k concepts)
│
canonical NDJSON artefact
│
├── sct sqlite ──▶ snomed.db (SQL + FTS5, MCP backend)
│ │
│ ├── sct lexical ──▶ keyword search (FTS5)
│ └── sct mcp ──▶ stdio MCP server (Claude Desktop / Claude Code)
├── sct parquet ──▶ snomed.parquet (DuckDB / analytics)
├── sct markdown──▶ snomed-concepts/ (RAG / LLM file reading) (untested)
└── sct embed ──▶ snomed-embeddings.arrow (semantic vector search)
│
sct semantic ──▶ cosine similarity search (requires Ollama)
sct info <file> inspect any artefact
sct diff --old <f> --new <f> compare two NDJSON releases (untested)
sct gui browser-based UI served over localhost
with graph visualisation and point-and-click exploration.
sct tui experimental terminal UI to explore concepts and relationships.
sct completions <shell> generate shell completions (optional)
The NDJSON artefact at the centre is a stable, versionable, greppable file. All other outputs are derived from it and can be regenerated at any time.
sct joins the relatively incomprehensible RF2 files into a single NDJSON artefact. For the UK Monolith Edition this NDJSON file is over 1Gb but it was still possible to load into VSCode to get a feel for the data structure, which is something that is impossible with the original RF2 files. This also means you can use standard tools like jq or ripgrep to query the data without needing a custom server or API.
SNOMED CT is distributed as RF2 — a set of tab-separated files that require joining across multiple tables to get anything useful. The entire healthcare industry relies on remote terminology servers for this, with the overhead of network calls and REST APIs. sct performs the join once creating an NDJSON artefact, and produces standard files you can query locally with sqlite3, duckdb, jq, ripgrep, or an LLM. No server, no API key, no network.
| Operation | sct + SQLite |
Snowstorm Lite | sct speedup |
|---|---|---|---|
| Import - Clinical Edition | 22s | 209s | ~10x faster |
| Import - Full UK Monolith | ~57s | Failed (OOM)* | ∞ |
| Single concept lookup (SCTID) | 6ms | 491ms | ~80x faster |
| Free-text search (10 results) | 2ms | 202ms | ~100x faster |
- Snowstorm Lite running in Docker with 24Gb of Java heap allocation ran out of memory on the full UK Monolith, which has 831k concepts.
scthandled it in under a minute.
For more detailed benchmarks, see docs/benchmarks.md. Feel free to run the benchmarks yourself and share your results, perhaps as an Issue.
# 1. Clone the repository
git clone https://github.com/pacharanero/sct
# 2. Install
cargo install --path .
# 3. Download a distribution of SNOMED CT
# UK: https://isd.digital.nhs.uk/ → Monolith Edition, RF2: Snapshot
# (free under NHS England national licence — access is immediate)
# NB: You need to Subscribe to a release before you can see the Download option 🤯
# International: https://mlds.ihtsdotools.org/ (allow up to a week for approval)
# 4. Convert RF2 → NDJSON (~10s for 831k concepts)
# Pass the .zip directly — no manual extraction needed
sct ndjson --rf2 SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z.zip
# ✓ 831,487 concepts written → snomedct-monolithrf2-production-20260311t120000z.ndjson
# 4. Load into SQLite with FTS5
sct sqlite --input snomedct-monolithrf2-production-20260311t120000z.ndjson
# 5. Query with standard tools — no custom binary needed
sqlite3 snomed.db \
"SELECT id, preferred_term FROM concepts_fts WHERE concepts_fts MATCH 'heart attack' LIMIT 5"
# 6. Start the MCP server for Claude Desktop
sct mcp --db snomed.dbFor all further information see the full documentation by either exploring the docs/ directory or running the docs site locally with s/docs, or visit the docs on the GitHub Pages site: https://pacharanero.github.io/sct/
- sct ndjson — convert an RF2 Snapshot directory to a canonical NDJSON artefact
- sct sqlite — load NDJSON into a SQLite database with FTS5
- sct parquet — export NDJSON to a Parquet file for DuckDB / analytics
- sct markdown — export NDJSON to per-concept Markdown files (or per-hierarchy with
--mode hierarchy) - sct mcp — start a local MCP server over stdio backed by the SQLite database
- sct embed — generate Ollama vector embeddings and write an Arrow IPC file
- sct lexical — keyword (FTS5) search over the SQLite database
- sct semantic — semantic similarity search over the Arrow IPC embeddings file (requires Ollama)
sct info <file>— inspect any.ndjson,.db, or.arrowartefact and print a summarysct diff --old <file> --new <file>— compare two NDJSON releases and report what changed- sct completions — print shell completion scripts (bash, zsh, fish, powershell, elvish)
- sct tui — keyboard-driven terminal UI for interactive SNOMED CT exploration (optional feature)
- sct gui — browser-based UI served over localhost for point-and-click exploration (optional feature)
Run any subcommand with --help for full option reference.
| Goal | Command |
|---|---|
| Query with SQL / keyword search | sct sqlite then sct lexical |
| Analytics / DuckDB | sct parquet |
| RAG / LLM file ingestion | sct markdown |
| Semantic / meaning-based search | sct embed then sct semantic |
| Claude Desktop or Claude Code | sct sqlite then sct mcp |
Requires Rust stable 1.70+: rustup.rs
git clone https://github.com/pacharanero/sct
cd sct
cargo install --path .This installs the default binary (all subcommands except tui and gui). To include the optional interactive interfaces:
# Terminal UI (adds ratatui + crossterm)
cargo install --path . --features tui
# Browser UI (adds axum + tokio)
cargo install --path . --features gui
# Both
cargo install --path . --features fullOr build without installing:
cargo build --release
# Binary: target/release/sctPre-built binaries for Linux x86_64, macOS arm64, and macOS x86_64 are available on the Releases page.
SNOMED CT is licensed. Download the RF2 Snapshot for your region:
- UK: NHS Digital TRUD → SNOMED CT Monolith Edition, RF2: Snapshot. Covered by the NHS England national licence.
- International: MLDS or NLM.
Download the Monolith Snapshot if available — it bundles the international base, clinical extension, and drug extension in one directory.
Please try it out and let me know how it performs for you, especially if you have a use case that isn't well supported by the current subcommands. Open an Issue for anything you want to report, from bugs to feature requests to general feedback.
A devcontainer configuration is included in .devcontainer/. Open the project in VS Code and select "Reopen in Container" to get a ready-to-go environment with Rust, sqlite3, duckdb, jq, and ripgrep pre-installed. Also included is python3 and Ollama, for working with the embeddings and semantic search features.
Store SNOMED data files (zips, NDJSON, databases) in the data-volume/ directory inside the container — it's backed by a Docker volume for faster I/O than the default bind mount.
Please see CONTRIBUTING.md for guidelines on how to contribute, report issues, or request features.
See the ROADMAP for planned features, improvements, and long-term vision for the project.
SNOMED CT® is a registered trademark of SNOMED International. This project is an independent implementation and is not affiliated with SNOMED International. All SNOMED CT data is sourced from the official RF2 releases and remains copyright of SNOMED International. Please refer to the license terms for your use of SNOMED CT data. You must ensure you have an appropriate license to use SNOMED CT data in your jurisdiction.
sct is not trademarked. The source code and binaries are copyright Marcus Baw and Baw Medical Ltd, and provided to you under the terms of the AGPL-3.0 license.