Questions about the role of mixture of vision encoders.

Hello, thank you very much for your excellent work. I have two questions to ask.

1. I noticed that in your paper it is mentioned that the Mixture of vision encoders has brought performance improvements on 12 - 14 benchmarks. However, I haven't seen the ablation experiment regarding the Mixture of vision encoders. May I ask if there are specific data that can be shared? How much performance improvement has it actually brought?
2. I noticed that the tasks where the Mixture of vision encoders brings the greatest performance improvement are OCR and Chart/Document VQA tasks, which seem to be small - target recognition and understanding tasks. May I ask if this is because tailing is used as input, which retains more picture details?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the role of mixture of vision encoders. #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about the role of mixture of vision encoders. #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions