Official implementation of ALIGNED from the paper:
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
Accepted by ICLR 2026 | [arXiv](https://arxiv.org/abs/2510.00512)
Yuanfang Xiang, Lun Ai
arXiv preprint arXiv:2510.00512, 2025
ALIGNED is a neuro-symbolic framework for predicting genetic perturbation responses that adaptively aligns data-driven learning with biological knowledge. Built on the Abductive Learning (ABL) paradigm, ALIGNED:
- Handles inconsistencies between data and knowledge bases with trade-off between simultaneously imperfect sources
- Performs systematic knowledge refinement to improve biological networks and enables the evolution of domain knowledge bases
- Achieves state-of-the-art performance while substantially improving biological interpretability
- Python ≥ 3.8
- CUDA-compatible GPU (recommended)
# Clone the repository
git clone https://github.com/yfxiang0112/Aligned.git
cd Aligned
# Create conda environment
conda env create -f environment.yml
conda activate aligned
# Install the package
pip install -e .Run experiments for all three human datasets using the Quick Start commands above.
# Norman dataset (non-random split)
python experiments/ex1_bench/run_benchmark.py \
--data_name='norman' \
--model_save_name='norman_gnn' \
--model_type='GNN' \
--device='cuda:0' \
--seed=42 \
--random_split=False
# Dixit dataset (random split)
python experiments/ex1_bench/run_benchmark.py \
--data_name='dixit' \
--model_save_name='dixit_gnn' \
--model_type='GNN' \
--device='cuda:0' \
--seed=42 \
--random_split=True
# Adamson dataset (random split)
python experiments/ex1_bench/run_benchmark.py \
--data_name='adamson' \
--model_save_name='adamson_gnn' \
--model_type='GNN' \
--device='cuda:0' \
--seed=42 \
--random_split=TrueArguments:
--data_name: Dataset selection from['norman', 'dixit', 'adamson']--model_save_name: File name to save trained models in./models--model_type: Neural component architecture:['GNN', 'MLP']--device: CUDA device ordinal (e.g.,'cuda:0')--seed: Random seed for reproducibility--random_split: Use random test split (see paper Section 4.1 for details)
Results, figures and trained models in results/ex1_aligned/, baseline comparison results in results/ex1_baselines/.
Baseline comparisons: For state-of-the-art baseline methods in Section 4.1, we use baseline method implementations and results from:
Constantin Ahlmann-Eltze, Wolfgang Huber and Simon Anders. "Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines" Nature Methods (2025). DOI: 10.1038/s41592-025-02772-6
We thank the authors for making their implementations publicly available.
# Network refinement with incompleteness injection
python experiments/ex2_refinement/run_refinement.py
# Evaluate with gene set recovery
python experiments/ex2_refinement/eval_gene_set_recovery.pyResults in results/ex2_refinement/.
python experiments/ex3_ecoli/run_ecoli.py \
--data_name='ncbi-sra' \
--model_save_name='ecoli_gnn' \
--model_type='GNN' \
--device='cuda:0' \
--seed=42Results and trained models in results/ex3_ecoli/.
Ablation results are located in results/ex4_ablations/.
Aligned/
├── aligned/ # Core package (renamed from egoal)
│ ├── abl.py # Abductive learning main loop
│ ├── learner_adap.py # Neural learner with adaptor (renamed from refl)
│ ├── reasoner.py # Symbolic reasoner (knowledge base)
│ └── utils.py # Utility functions
│
├── experiments/ # Experimental scripts
│ ├── ex1_bench/ # Section 4.1: Benchmark experiments
│ ├── ex2_refinement/ # Section 4.2: Knowledge refinement
│ ├── ex3_ecoli/ # Section 4.3: E. coli experiments
│ └── ex4_ablation/ # Section 4.4: Ablation studies
│
├── dataset/ # Data files
│ ├── human/ # Human benchmark datasets
│ ├── ncbi-sra/ # E. coli RNA-seq data
│ └── precise1k/ # E. coli PRECISE-1K data
│
├── rules/ # Knowledge bases
│ ├── ecoli/ # E. coli regulatory knowledge base
│ └── human/ # Human regulatory knowledge base
│
├── results/ # Experiment results and figure generation scripts
│ ├── ex1_aligned/ # Main benchmark results
│ ├── ex1_baselines/ # Baseline comparisons
│ ├── ex2_refinement/ # Refinement experiment results
│ ├── fig1_incons/ # Inconsistency visualization
│ ├── fig3_radar/ # Radar plots
│ ├── fig4_line/ # Performance curves
│ └── fig5_refine/ # Refinement results
│
├── models/ # Trained models
├── log/ # Training logs
└── scripts/ # Utility and preprocessing scripts
If you use ALIGNED in your research, please cite:
@article{xiang2024aligned,
title={Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction},
author={Xiang, Yuanfang and Ai, Lun},
journal={arXiv preprint arXiv:2510.00512},
year={2024}
}This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See LICENSE file for details.
For questions or issues, please open an issue on GitHub or find my contact information on my GitHub profile.