On the Use of v Vector in Direction Preference Alignment Method

Hello,

Thank you for sharing your excellent paper on "Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards". I have some questions regarding the Direction Preference Alignment method, specifically about how the user preference vector \( v \) is incorporated during the model training process.

To be more specific, I would like to understand how the user preference vector \( v \) is actually integrated into the model during the training phase. From my understanding, should the attribute weights be directly concatenated onto the prompt (as you mentioned in the system prompt)?

I look forward to your response.
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On the Use of v Vector in Direction Preference Alignment Method #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development