[Q]  InternVideo 2.5,, TPO

Hello, thank you for the excellent research and for sharing the code.

I understand that TPO (Task Preference Optimization) has been applied to InternVideo 2.5, and as mentioned in the related paper, it includes three task-specific heads: region, temporal, and mask.

I have two questions regarding this:

1. Are these three heads already implemented and integrated into the current InternVideo 2.5 codebase?
2. The paper describes a detailed multi-stage training process, but the repository currently provides only inference scripts. Will the training scripts for these heads be released in the future? Alternatively, is there any guidance or reference available to perform supervised fine-tuning (sFT) with these task heads?

Any support or clarification would be greatly appreciated. Thank you again for your valuable contribution!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Q] InternVideo 2.5,, TPO #61

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Q] InternVideo 2.5,, TPO #61

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions