Skip to content

Task-conditioned LoRA + MoE#324

Open
ryspark wants to merge 33 commits intomainfrom
ryanp/lora
Open

Task-conditioned LoRA + MoE#324
ryspark wants to merge 33 commits intomainfrom
ryanp/lora

Conversation

@ryspark
Copy link
Contributor

@ryspark ryspark commented Aug 14, 2025

This PR adds two new pieces of functionality: LoRA and MoE adapters conditioned on task-specific embeddings. These are meant to be learned during finetuning.

LoRA adapters: drop-in replacement for the attention output projection weights, conditioned on downstream tasks. This is meant to be used during finetuning. From this paper it seems like finetuning only these projection weights gets you most of the way there compared to full finetuning.

  • Given embeddings E for each downstream task, the TaskLoRALinear layer computes a LoRA update (ie two matrices A: D x r, B: r x D) whose product is added to the original projection weights. Both A and B are computed via an MLP on top of E. This MLP is shared across tasks but different per-layer, and is directly learned during finetuning.

MoE adapters: instead of computing FFN(x) in each Transformer block, adds a soft MoE adapter so that the pre-LayerScale output is Linear(FFN(x) + MoE(x)).

  • Optionally MoE adapters compute expert combine weights (i.e., deciding which experts to use per token) by conditioning on batch-level task embeddings instead of on token-level embeddings.

  • The actual MoE implementation is mostly from this reference implementation and can be found in helios.nn.moe.

To accommodate these changes, there are a few extra arguments added to EncoderConfig, Encoder, FlexiHeliosBase, etc. all the way down to the base helios.nn.attention.Attention layers. Additionally, there is a new argument task_emb added to the forward pass of Encoder. I considered subclassing Encoder (i.e. something like EncoderWithTaskEmbeds) but decided it was simpler to just add a few extra arguments directly.

@ryspark ryspark marked this pull request as ready for review August 14, 2025 19:57
@ryspark ryspark requested a review from Hgherzog August 14, 2025 19:58
@ryspark ryspark marked this pull request as draft August 14, 2025 20:19
@ryspark ryspark changed the title Task-conditioned LoRA Task-conditioned LoRA + MoE Aug 18, 2025
@ryspark ryspark marked this pull request as ready for review September 10, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant