PepGlider

We present PepGlider, a continuous property regularization framework that enables direct control over their specific values. The method achieves structured latent space with superior disentanglement quality and displays smooth property gradients along regularized dimension. In silico experimental results demonstrate that PepGlider enables independent control of naturally correlated properties, and supports both de novo generation and targeted optimization of existing peptides. In application to antimicrobial peptide design, PepGlider generated candidates with desired antibacterial activity profile and low toxicity profile. Unlike existing approaches, PepGlider provides precise control over continuous property distributions, while maintaining generation quality, thus offering a generalizable solution for peptide design.

Packages requirements

All used packages and their versions are provided in requirements.txt file in repository. You can install them using:

pip install -r requirements.txt

Model training

You can run a model training using train.py file. All default parameters are defined in params_setting.py file. Current configuration enables you to train PepGlider model with MIC E.coli, MIC S.aureus and Nontoxicity regularization. Feel free to change configuration to train your model! Below there is a table with parameters description:

Parameter	Default value	Description
`num_heads`	`4`	The number of attention heads in the transformer layers.
`num_layers`	`6`	The number of transformer layers in the model's encoder/decoder.
`layer_norm`	`True`	Flag specifying whether Layer Normalization should be used.
`latent_dim`	`56`	The dimension of the latent space of the VAE model.
`encoding`	`"add"`	The method of combining or add input embedding vectors with positional ones in the architecture.
`dropout`	`0.1`	The dropout coefficient used for regularization during training.
`batch_size`	`512`	The batch size used during training the model.
`lr`	`0.001`	The learning rate for the optimizer.
`kl_beta_schedule`	`(0.00001, 0.1, 8000)`	The weight growth schedule for the KL-divergence term (e.g., start, target, steps to reach the target).
`train_size`	`None`	Optional limit on the size of the training set.
`epochs`	`10000`	The number of training epochs.
`iwae_samples`	`10`	The number of decoder training iterations used in the IWAE (Implicitly Weighted Autoencoder) model.
`model_name`	`os.getenv(...)`	The name of the model for saving files; retrieved from the `CLEARML_PROJECT_NAME` environment variable.
`use_clearml`	`False`	Flag specifying whether to use the ClearML platform for experiment tracking.
`task_name`	`os.getenv(...)`	The task name in ClearML; retrieved from the `CLEARML_TASK_NAME` environment variable.
`device`	`"cuda"`	The device for computation (`cuda` for GPU or `cpu`).
`deeper_eval_every`	`20`	How often (in epochs) to perform a more detailed evaluation.
`save_model_every`	`100`	How often (in epochs) to save the model state.
`ar_vae_flg`	`True`	Flag enabling AR-VAE or PepGlider model trainig.
`reg_dim`	`[3, 4, 5]`	The dimensions (indices) in the latent space that should be subjected to regularization.
`gamma_schedule`	`(0.00001, 20, 8000)`	The weight growth schedule for $\gamma$ hyperparameter.
`gamma_multiplier`	`[1, 1, 1, 1, 1, 1]`	Multipliers applied to $\gamma$ for individual dimensions.
`factor_multiplier`	`[0.1, 0.1, 0.1, 0.6, 0.6, 0.6]`	Multipliers for $\factor$ hyperparameter.
`factor_schedule`	`(1, 1, 8000)`	The weight growth schedule for factor regularization.
`scale_factor_flg`	`False`	Flag specifying whether to scale the $\factor$ hyperparameter.
`mic_flg`	`True`	Flag enabling add MIC values to the train dataset.
`toxicity_flg`	`True`	Flag enabling generate non-toxicity scores for the train dataset.
`normalize_properties_flg`	`True`	Flag specifying whether to normalize the input data properties.
`signum_modification_of_dist_matrix_flg`	`False`	Flag enabling modification of the distance matrix with the `signum` function.

Model evaluation

All final numeric results, tables and figures were produced using two notebooks:

figures_notebook_1.ipynb
figures_notebook_2.ipynb

Feel free to duplicate them and analyse models on this repo! We leave model files to upload in first_working_models directory.

Name		Name	Last commit message	Last commit date
Latest commit History 651 Commits
__pycache__		__pycache__
data		data
first_working_models		first_working_models
model		model
toxicity_classifier		toxicity_classifier
.gitattributes		.gitattributes
README.md		README.md
__init__.py		__init__.py
add_mic.py		add_mic.py
ar_vae_metrics.py		ar_vae_metrics.py
figures_notebook_1.ipynb		figures_notebook_1.ipynb
figures_notebook_2.ipynb		figures_notebook_2.ipynb
monitoring.py		monitoring.py
new_hemolytic_model.xgb		new_hemolytic_model.xgb
notebook_functions.py		notebook_functions.py
params_setting.py		params_setting.py
regularization.py		regularization.py
requirements.txt		requirements.txt
train.py		train.py
training_functions.py		training_functions.py
validation_log_2025-11-12_13-27-23.csv		validation_log_2025-11-12_13-27-23.csv
validation_log_2025-11-19_12-42-12.csv		validation_log_2025-11-19_12-42-12.csv
validation_log_2026-01-20_00-14-27.csv		validation_log_2026-01-20_00-14-27.csv
validation_log_2026-01-30_13-13-26.csv		validation_log_2026-01-30_13-13-26.csv
validation_log_2026-01-31_19-17-28.csv		validation_log_2026-01-31_19-17-28.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PepGlider

Packages requirements

Model training

Model evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PepGlider

Packages requirements

Model training

Model evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages