BLenDeR: Blended Text Embeddings and Diffusion Residuals for Intra-Class Image Synthesis in Deep Metric Learning
Jan Niklas Kolf 2,3,∗ Ozan Tezcan 1 Justin Theiss 1 Hyung Jun Kim 1 Wentao Bao 1 Bhargav Bhushanam 1 Khushi Gupta 1 Arun Kejariwal 1 Naser Damer 2,3 Fadi Boutros 2
1 Meta Reality Labs 2 Fraunhofer IGD 3 Technical University of Darmstadt
*Work done while interning at Meta Reality Labs
This repository provides code for BLenDeR, an inference-time sampling method for personalized diffusion models that combines text embedding interpolation with novel set-theoretic residual set operations (union, intersection, and difference over denoising residuals) to synthesize class-specific concepts with novel attributes.
In the paper, BLenDeR is used to increase intra-class diversity and thereby improve the downstream performance of deep metric learning models.
We use conda for Python package management.
Run the provided env_setup.sh to set up the environment:
bash env_setup.sh
conda activate blenderWe provide pretrained LoRA weights and Textual Inversion embeddings for Stable Diffusion 1.5 to synthesize bird images.
Our example script generate_with_blendr.py shows how to load the pretrained weights and Textual Inversion embeddings.
The required text token to use in a prompt can be obtained from learned_embeds-steps-20000.bin:
ti_embeddings = torch.load("learned_embeds-steps-20000.bin")
ti_token_strs = list(ti_embeddings["learned_embeds_dict"].keys())
print(ti_token_strs) # e.g. ['<bird_001>', '<bird_002>', ...]The base prompt used to synthesize an image is a photo of a <TI Token Str> bird..
The expected structure of the dataset is <dataset-folder>/{train, test}/<class-folder>, where each <class-folder> contains a metadata.jsonl file (required keys: file_name, class_name, class_label) and the corresponding image files, following the Huggingface dataset format.
First, extract image annotations using utils/preprocessing/get_llava_detailed_description.py.
Then, calculate pairwise image annotation similarities which are required for data generation using utils/preprocessing/compute_llava_prompt_similarities.py.
To run data generation after setup with either union or difference residual set operations, run generate_with_blendr.py.
This project is licensed under CC-BY-NC 4.0.
If you find this repository useful, please consider giving a ⭐ and citing:
@article{kolf2026blender,
title={BLenDeR: Blended Text Embeddings and Diffusion Residuals for Intra-Class Image Synthesis in Deep Metric Learning},
author={Kolf, Jan Niklas and Tezcan, Ozan and Theiss, Justin and Kim, Hyung Jun and Bao, Wentao and Bhushanam, Bhargav and Gupta, Khushi and Kejariwal, Arun and Damer, Naser and Boutros, Fadi},
journal={arXiv preprint arXiv:2601.20246},
year={2026}
}