ALIGNED: Adaptive aLignment for Inconsistent Genetic kNowledgE and Data

Official implementation of ALIGNED from the paper:

Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
Yuanfang Xiang, Lun Ai
arXiv preprint arXiv:2510.00512, 2025
Accepted by ICLR 2026 | [arXiv](https://arxiv.org/abs/2510.00512)

Overview

ALIGNED is a neuro-symbolic framework for predicting genetic perturbation responses that adaptively aligns data-driven learning with biological knowledge. Built on the Abductive Learning (ABL) paradigm, ALIGNED:

Handles inconsistencies between data and knowledge bases with trade-off between simultaneously imperfect sources
Performs systematic knowledge refinement to improve biological networks and enables the evolution of domain knowledge bases
Achieves state-of-the-art performance while substantially improving biological interpretability

Installation

Requirements

Python ≥ 3.8
CUDA-compatible GPU (recommended)

Setup

# Clone the repository
git clone https://github.com/yfxiang0112/Aligned.git
cd Aligned

# Create conda environment
conda env create -f environment.yml
conda activate aligned

# Install the package
pip install -e .

Reproducibility

Section 4.1: Perturbation Prediction on Benchmark Datasets

Run experiments for all three human datasets using the Quick Start commands above.

# Norman dataset (non-random split)
python experiments/ex1_bench/run_benchmark.py \
    --data_name='norman' \
    --model_save_name='norman_gnn' \
    --model_type='GNN' \
    --device='cuda:0' \
    --seed=42 \
    --random_split=False

# Dixit dataset (random split)
python experiments/ex1_bench/run_benchmark.py \
    --data_name='dixit' \
    --model_save_name='dixit_gnn' \
    --model_type='GNN' \
    --device='cuda:0' \
    --seed=42 \
    --random_split=True

# Adamson dataset (random split)
python experiments/ex1_bench/run_benchmark.py \
    --data_name='adamson' \
    --model_save_name='adamson_gnn' \
    --model_type='GNN' \
    --device='cuda:0' \
    --seed=42 \
    --random_split=True

Arguments:

--data_name: Dataset selection from ['norman', 'dixit', 'adamson']
--model_save_name: File name to save trained models in ./models
--model_type: Neural component architecture: ['GNN', 'MLP']
--device: CUDA device ordinal (e.g., 'cuda:0')
--seed: Random seed for reproducibility
--random_split: Use random test split (see paper Section 4.1 for details)

Results, figures and trained models in results/ex1_aligned/, baseline comparison results in results/ex1_baselines/.

Baseline comparisons: For state-of-the-art baseline methods in Section 4.1, we use baseline method implementations and results from:

Constantin Ahlmann-Eltze, Wolfgang Huber and Simon Anders. "Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines" Nature Methods (2025). DOI: 10.1038/s41592-025-02772-6

Code: const-ae/linear_perturbation_prediction-Paper

We thank the authors for making their implementations publicly available.

Section 4.2: Knowledge Refinement of Gene Regulatory Networks

# Network refinement with incompleteness injection
python experiments/ex2_refinement/run_refinement.py

# Evaluate with gene set recovery
python experiments/ex2_refinement/eval_gene_set_recovery.py

Results in results/ex2_refinement/.

Section 4.3: Perturbation Prediction on Bacterial Genome

python experiments/ex3_ecoli/run_ecoli.py \
    --data_name='ncbi-sra' \
    --model_save_name='ecoli_gnn' \
    --model_type='GNN' \
    --device='cuda:0' \
    --seed=42

Results and trained models in results/ex3_ecoli/.

Section 4.4: Ablation Studies

Ablation results are located in results/ex4_ablations/.

Project Structure

Aligned/
├── aligned/                  # Core package (renamed from egoal)
│   ├── abl.py                 # Abductive learning main loop
│   ├── learner_adap.py        # Neural learner with adaptor (renamed from refl)
│   ├── reasoner.py            # Symbolic reasoner (knowledge base)
│   └── utils.py               # Utility functions
│
├── experiments/              # Experimental scripts
│   ├── ex1_bench/             # Section 4.1: Benchmark experiments
│   ├── ex2_refinement/        # Section 4.2: Knowledge refinement
│   ├── ex3_ecoli/             # Section 4.3: E. coli experiments
│   └── ex4_ablation/          # Section 4.4: Ablation studies
│
├── dataset/                  # Data files
│   ├── human/                 # Human benchmark datasets
│   ├── ncbi-sra/              # E. coli RNA-seq data
│   └── precise1k/             # E. coli PRECISE-1K data
│
├── rules/                    # Knowledge bases
│   ├── ecoli/                 # E. coli regulatory knowledge base
│   └── human/                 # Human regulatory knowledge base
│
├── results/                  # Experiment results and figure generation scripts
│   ├── ex1_aligned/           # Main benchmark results
│   ├── ex1_baselines/         # Baseline comparisons
│   ├── ex2_refinement/        # Refinement experiment results
│   ├── fig1_incons/           # Inconsistency visualization
│   ├── fig3_radar/            # Radar plots
│   ├── fig4_line/             # Performance curves
│   └── fig5_refine/           # Refinement results
│
├── models/                    # Trained models
├── log/                       # Training logs
└── scripts/                   # Utility and preprocessing scripts

Citation

If you use ALIGNED in your research, please cite:

@article{xiang2024aligned,
  title={Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction},
  author={Xiang, Yuanfang and Ai, Lun},
  journal={arXiv preprint arXiv:2510.00512},
  year={2024}
}

License

This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See LICENSE file for details.

Contact

For questions or issues, please open an issue on GitHub or find my contact information on my GitHub profile.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALIGNED: Adaptive aLignment for Inconsistent Genetic kNowledgE and Data

Overview

Installation

Requirements

Setup

Reproducibility

Section 4.1: Perturbation Prediction on Benchmark Datasets

Section 4.2: Knowledge Refinement of Gene Regulatory Networks

Section 4.3: Perturbation Prediction on Bacterial Genome

Section 4.4: Ablation Studies

Project Structure

Citation

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
aligned		aligned
dataset		dataset
experiments		experiments
results		results
rules		rules
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

License

yfxiang0112/Aligned

Folders and files

Latest commit

History

Repository files navigation

ALIGNED: Adaptive aLignment for Inconsistent Genetic kNowledgE and Data

Overview

Installation

Requirements

Setup

Reproducibility

Section 4.1: Perturbation Prediction on Benchmark Datasets

Section 4.2: Knowledge Refinement of Gene Regulatory Networks

Section 4.3: Perturbation Prediction on Bacterial Genome

Section 4.4: Ablation Studies

Project Structure

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages