Skip to content

monarch-initiative/alliance-ingest

Repository files navigation

alliance-disease-association-ingest

| Documentation |

Transformation of the alliance gene, allele and model/genotype to disease associations

Requirements

  • Python >= 3.10

  • uv

Setting Up a New Project -- Delete this section when completed

Upon creating a new project from the cookiecutter-monarch-ingest template, you can install and test the project:

cd alliance-disease-association-ingest
make install
make test

There are a few additional steps to complete before the project is ready for use.

GitHub Repository

  1. Create a new repository on GitHub.

  2. Enable GitHub Actions to read and write to the repository (required to deploy the project to GitHub Pages).

    • in GitHub, go to Settings -> Action -> General -> Workflow permissions and choose read and write permissions
  3. Initialize the local repository and push the code to GitHub. For example:

    cd alliance-disease-association-ingest
    git init
    git remote add origin https://github.com/<username>/<repository>.git
    git add -A && git commit -m "Initial commit"
    git push -u origin main

Transform Code and Configuration

  1. Edit the download.yaml, transform.py, transform.yaml, and metadata.yaml files to suit your needs.
  2. Add any additional dependencies to the pyproject.toml file.
  3. Adjust the contents of the tests directory to test the functionality of your transform.

Documentation

  1. Update this README.md file with any additional information about the project.
  2. Add any appropriate documentation to the docs directory.

Note: After the GitHub Actions for deploying documentation runs, the documentation will be automatically deployed to GitHub Pages.
However, you will need to go to the repository settings and set the GitHub Pages source to the gh-pages branch, using the /docs directory.

Once you have completed these steps, you can remove the Setting Up a New Project section from this README.md file.

Data Sources

Update this section to describe the source of the data for the ingest. Include information about the projects and groups that create or curate the data, which data files are used, and the specific sources and/or versions of those files. It is also valuable to document what model is used for the ingest (generally the Biolink Model) and what types of nodes and edges are created. Here is an example of how you might document this:

Data files for YOUR_SOURCE_DATA_TYPE are available from GROUP_OR_PROJECT through there portal at (include links where possible).

Source Files

This ingest relies on N data files from GROUP_OR_PROJECT and one additional data file for FILE_USAGE (often mapping) from OTHER_GROUP_OR_PROJECT.

  • FILENAME_1 - Describe the data in the file and give a basic description of how it's used. It's nice to include the URL's here as well as having them in the downloads.yaml later

Nodes and Edges

Use this section describe the nodes and edges generated from the ingest for instance

  • Gene Nodes - Description of which nodes are created and what data may be excluded from the ingest.
  • Gene → Disease - Similar description of the edges and which edges are created or how the data may be filtered.

Transform Code and Configuration

Metadata for the infest is in the metadata.yaml file and may require some adjustment depending on your configuration. Data files and locations are listed in the download.yaml file which is used to download all of the data sources before the transform. The transform.yaml file and python file transform.py contain the configuration and transformation code, respectively.

For more information, see the Koza documentation and kghub-downloader.

Dependencies are listed in pyproject.toml file. This project uses pytest for development testing located in the tests directory to test the functionality of your transform.

Documentation

The documentation for this ingest is in this README.md file and additional documentation is in the docs directory.

Note: After the GitHub Actions for deploying documentation runs, the documentation will be automatically deployed to GitHub Pages.

GitHub Actions

This project is set up with several GitHub Actions workflows. You should not need to modify these workflows unless you want to change the behavior. The workflows are located in the .github/workflows directory:

  • test.yaml: Run the pytest suite.
  • create-release.yaml: Create a new release once a week, or manually.
  • deploy-docs.yaml: Deploy the documentation to GitHub Pages (on pushes to main).
  • update-docs.yaml: After a release, update the documentation with node/edge reports.

Installation

cd alliance-disease-association-ingest
make install
# or
uv sync --dev

Note that the make install command is just a convenience wrapper around uv sync --dev.

Once installed, you can check that everything is working as expected:

# Run the pytest suite
make test
# Download the data and run the Koza transform
make download
make run

Usage

This project is set up with a Makefile for common tasks.
To see available options:

make help

Download and Transform

Download the data for the alliance_disease_association_ingest transform:

uv run alliance_disease_association_ingest download

To run the Koza transform for alliance-disease-association-ingest:

uv run alliance_disease_association_ingest transform

To see available options:

uv run alliance_disease_association_ingest download --help
# or
uv run alliance_disease_association_ingest transform --help

Testing

To run the test suite:

make test

This project was generated using monarch-initiative/cookiecutter-monarch-ingest.
Keep this project up to date using cruft by occasionally running in the project directory:

cruft update

For more information, see the cruft documentation

About

Biolink Model compliant KGX transformation of data from the Alliance of Genome Resources

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 5