Chemical perturbation modeling in 24hours.
Important
This model was developed during the Nucleate Hackathon 2025, Munich and does not represent a serious scientific project.
Have a look at these overview slides
You need to have Python 3.11 or newer installed on your system. If you don't have Python installed, we recommend installing uv.
There are several alternative options to install novaice:
- Install the latest development version:
pip install git+https://github.com/lucas-diedrich/novaice.gitnovaice is a simple model to predict gene expression across chemical perturbation conditions. It assumes that each observation is encoded by a (drug log1p normalized RNAseq data.
We implement various methods to embed chemical compounds from the smiles strings in the .pp module.
Evaluation is based on the featurewise
# Setup and train model
ChemPertMLPModel.setup_anndata(adata, drug_embedding_key="drug_embedding")
model = ChemPertMLPModel(adata)
# Train
model.train(max_epochs=50)You can inspect the training run with tensorboard:
tensorboard --logdir=logsTo predict gene expression in unseen data, pass a new a new anndata object with the smiles embeddings in a .obsm slot with
the same key as the training data
# Predict gene expression
predictions = model.predict_gene_expression(adata=adata_test)See our final presentation on model structure, model performance, and potential impact of the model
Built with scvi-tools
Gayoso, A., Lopez, R., Xing, G. et al. A Python library for probabilistic analysis of single-cell omics data. Nat Biotechnol 40, 163–166 (2022). https://doi.org/10.1038/s41587-021-01206-w
Leverages molformer embeddings, RDKit, and scverse libraries