novaice

Chemical perturbation modeling in 24hours.

Important

This model was developed during the Nucleate Hackathon 2025, Munich and does not represent a serious scientific project.

Getting started

Have a look at these overview slides

Installation

You need to have Python 3.11 or newer installed on your system. If you don't have Python installed, we recommend installing uv.

There are several alternative options to install novaice:

Install the latest development version:

pip install git+https://github.com/lucas-diedrich/novaice.git

Usage

novaice is a simple model to predict gene expression across chemical perturbation conditions. It assumes that each observation is encoded by a (drug $d_i$, gene expression $X_i$) pair. The task is to predict gene expression from a vector representation of the drug. We implement a MLP model that predicts the parameters of a normal distribution ($\mu, \sigma$) that describe the distribution of the log1p normalized RNAseq data.

We implement various methods to embed chemical compounds from the smiles strings in the .pp module.

Evaluation is based on the featurewise $R^2$ value between maximum likelihood estimate of gene abundance and measured data. We also assess how well the model is calibrated with respect to the communicated uncertainty.

Run your model

# Setup and train model
ChemPertMLPModel.setup_anndata(adata, drug_embedding_key="drug_embedding")
model = ChemPertMLPModel(adata)

# Train
model.train(max_epochs=50)

You can inspect the training run with tensorboard:

tensorboard --logdir=logs

To predict gene expression in unseen data, pass a new a new anndata object with the smiles embeddings in a .obsm slot with the same key as the training data

# Predict gene expression
predictions = model.predict_gene_expression(adata=adata_test)

Release notes

See our final presentation on model structure, model performance, and potential impact of the model

References

Built with scvi-tools

Gayoso, A., Lopez, R., Xing, G. et al. A Python library for probabilistic analysis of single-cell omics data. Nat Biotechnol 40, 163–166 (2022). https://doi.org/10.1038/s41587-021-01206-w

Leverages molformer embeddings, RDKit, and scverse libraries

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
docs		docs
src/novaice		src/novaice
.codecov.yaml		.codecov.yaml
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

novaice

Getting started

Installation

Usage

Run your model

Release notes

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

lucas-diedrich/novaice

Folders and files

Latest commit

History

Repository files navigation

novaice

Getting started

Installation

Usage

Run your model

Release notes

References

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages