Skip to content

berntpopp/phentrieve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

593 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phentrieve

Phentrieve Logo

Phentrieve is an advanced AI-powered research system for mapping phenotype descriptions to Human Phenotype Ontology (HPO) terms using a Retrieval-Augmented Generation (RAG) approach. It supports multiple languages and offers robust tools for benchmarking, text processing, and HPO term retrieval.

Research use only: Phentrieve is not a medical device and must not be used for diagnosis, treatment selection, patient triage, or other clinical decision-making. See the Research Use Only guide and Privacy and LLM Processing.

For comprehensive documentation, please visit the Phentrieve Documentation Site.

Key Features

  • Multilingual HPO term mapping using state-of-the-art embedding models
  • Advanced text processing pipeline including semantic chunking and assertion detection
  • Optional adaptive re-chunking improves recall on multi-concept clinical sentences (--adaptive-rechunking). See docs/user-guide/adaptive-rechunking.md.
  • Extensive benchmarking framework for model evaluation and comparison
  • User-friendly interfaces: CLI, FastAPI backend, and Vue.js frontend

Benchmark Results

Performance on 570 German clinical terms (BioLORD-2023-M model):

Retrieval Mode MRR Hit@1 Hit@10 Ont Sim@1
Single-vector 0.695 55.8% 94.0% 79.9%
Multi-vector (all_max) 0.892 84.0% 97.4% 91.9%

+28% MRR improvement with multi-vector retrieval using label, synonym, and definition embeddings.

Quick Start

Install Phentrieve using pip:

pip install phentrieve

For detailed setup and usage instructions, including Docker deployment, please see our Getting Started Guide.

Basic Usage

# Launch interactive query mode
phentrieve query --interactive

# Process research text to extract HPO terms
phentrieve text process "The research note mentions microcephaly and frequent seizures."

Discover more commands and options in the User Guide.

Configuration

Configuration profiles

Define named profiles in phentrieve.yaml to preset CLI options:

profiles:
  fast_query:
    command: query
    num_results: 5
    similarity_threshold: 0.5

Then phentrieve query "TEXT" --profile fast_query.

See docs/user-guide/configuration-profiles.md for the full guide.

Docker Deployment

Deploy Phentrieve using Docker Compose for self-hosted research environments:

# Linux: Setup volume permissions (required)
sudo ./scripts/setup-docker-volumes.sh

# macOS/Windows: No setup needed, skip to next step

# Start services
docker-compose up -d

# Access the application
# - API: http://localhost:8000
# - Frontend: http://localhost:8080

For detailed deployment instructions, security best practices, and troubleshooting, see the Docker Deployment Guide.


Full Documentation | Contributing Guide | License

About

AI-powered system for mapping clinical text to Human Phenotype Ontology (HPO) terms using Retrieval-Augmented Generation (RAG). Features Python CLI/library, FastAPI backend, and Vue.js frontend for interactive phenotype extraction from medical texts.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors