Skip to content

Knowledge-Graph-Hub/kg-alzheimers

Repository files navigation

KG-Alzheimers

documentation

KG-Alzheimers builds and distributes an Alzheimer’s disease-focused biomedical knowledge graph by harmonizing Monarch Initiative and partner data sources into BioLink Model-compliant KGX exports, DuckDB/SQLite databases, JSONL feeds, and search indexes.

Highlights

  • Integrates dozens of genomics, phenomics, pathway, and literature resources behind a repeatable ETL pipeline.
  • Ships a Typer-powered CLI (ingest) that orchestrates download, transform, merge, QC, export, and packaging steps.
  • Produces denormalized TSV bundles, RDF/JSONL snapshots, and Solr-ready indexes suitable for analytics or downstream applications.
  • Uses Poetry for dependency management and reproducible environments; CI/CD via Jenkins keeps public releases up to date.

Getting Started

Requires Python 3.10+ and Poetry.

git clone https://github.com/Knowledge-Graph-Hub/kg-alzheimers.git
cd kg-alzheimers
poetry install

# Optional: activate the virtual environment created by Poetry
poetry shell

To download all referenced datasets and run the full build pipeline locally:

# Retrieve source data declared in src/kg_alzheimers/download.yaml
poetry run ingest download --all --write-metadata

# Execute Phenio preprocessing plus all Koza ingests
poetry run ingest transform --all --log --rdf --write-metadata

# Merge transformed outputs into a unified KG bundle with QC checks
poetry run ingest merge

# (Optional) Generate closure-enriched denormalized tables
poetry run ingest closure

# Prepare release artifacts (gzipped DuckDB/TSV/JSONL bundles)
poetry run ingest prepare-release

Additional commands are documented in docs/CLI.md and include:

  • poetry run ingest export — create filtered TSV/JSONL dumps defined in src/kg_alzheimers/data-dump-config.yaml.
  • poetry run ingest report — run DuckDB QC SQL scripts to audit the merge.
  • poetry run ingest sqlite / poetry run ingest solr — load artifacts into local SQLite or Solr instances for exploration.

Note: The ingest release command is deprecated; releases are created by the Jenkins pipeline described in Jenkinsfile.

Documentation

Development

  • Run the unit test suite:

    poetry run pytest
  • Format and lint using the configured tooling (Black and Ruff):

    poetry run black src tests
    poetry run ruff check src tests

Issues and contributions are welcome via GitHub. To propose a new ingest, follow the workflow documented under docs/Create-an-Ingest/.

About

A knowledge graph integrating biomedical data related to Alzheimer's Disease

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7