Skip to content
/ LoRWeB Public

We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations.

License

Notifications You must be signed in to change notification settings

NVlabs/LoRWeB

Repository files navigation

LoRWeB: Spanning the Visual Analogy Space with a Weight Basis of LoRAs

arXiv Project Website Evaluation Dataset (Comming Soon) Model (Comming Soon)

👥 Authors

Hila Manor1,2,  Rinon Gal2,  Haggai Maron1,2,  Tomer Michaeli1,  Gal Chechik2,3

1Technion - Israel Institute of Technology    2NVIDIA    3Bar-Ilan University


LoRWeB Teaser

Given a prompt and an image triplet ${a,a',b}$ that visually describe a desired transformation, LoRWeB dynamically constructs a single LoRA from a learnable basis of LoRA modules, and produces an editing result $b'$ that applies the same analogy to the new image.

📄 Abstract

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet ${a,a',b}$, the goal is to generate $b'$ such that $a$ : $a'$ :: $b$ : $b'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation.

📋 Table of Contents

🔨 Setup

conda env create -f environment.yml
conda activate lorweb

🚀 Usage

💻 Training

Train a LoRWeB model on your visual analogy dataset:

python run.py config/your_config.yaml

You can override the main options with arguments to the run.py script, e.g. python run.py LoRWeB_default_PROMPTS.yaml --name "lorweb_model" --linear 4 --linear_alpha 4 --loras_num 32 --lora_softmax true --query_mode "cat-aa'b"

📊 Training Data Format

We trained on Relation252k. The training script expects 2 folder: control - which will contain images of the ${a,a',b}$ triplets, and target which contains images of the corresponding $b$ image. Use preprocess_data.py to preprocess a pre-downloaded dataset.

🎨 Inference

You can test our model's checkpoint from HuggingFace (coming soon) using inference.py.

python inference.py -w "output/your_model/your_model.safetensors" -c "output/your_model/config.yaml" -a "data/path_to_a_img.jpg" -t "data/path_to_atag_img.jpg" -b "data/patH-to_b_img.jpg" -o "outputs/generated_btag_img_path.jpg"

ℹ️ Additional Information

Our complementary custom evaluation set is available on HuggingFace (coming soon).

📚 Citation

If you use this code in your research, please cite:

@article{manor2026lorweb,
    title={Spanning the Visual Analogy Space with a Weight Basis of LoRAs},
    author={Manor, Hila and Gal, Rinon and Maron, Haggai and Michaeli, Tomer and Chechik, Gal},
    journal={arXiv preprint arXiv:2602.15727},
    year={2026}
}

🙏 Acknowledgements

This project builds upon:


⭐ Star this repo if you find it useful! ⭐

About

We propose a novel modular framework that learns to dynamically mix low-rank adapters (LoRAs) to improve visual analogy learning, enabling flexible and generalizable image edits based on example transformations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages