Hello, thank you for the excellent research and for sharing the code.
I understand that TPO (Task Preference Optimization) has been applied to InternVideo 2.5, and as mentioned in the related paper, it includes three task-specific heads: region, temporal, and mask.
I have two questions regarding this:
- Are these three heads already implemented and integrated into the current InternVideo 2.5 codebase?
- The paper describes a detailed multi-stage training process, but the repository currently provides only inference scripts. Will the training scripts for these heads be released in the future? Alternatively, is there any guidance or reference available to perform supervised fine-tuning (sFT) with these task heads?
Any support or clarification would be greatly appreciated. Thank you again for your valuable contribution!
Hello, thank you for the excellent research and for sharing the code.
I understand that TPO (Task Preference Optimization) has been applied to InternVideo 2.5, and as mentioned in the related paper, it includes three task-specific heads: region, temporal, and mask.
I have two questions regarding this:
Any support or clarification would be greatly appreciated. Thank you again for your valuable contribution!