Mimic In-Context Learning (MimIC) is a novel framework to adapt vision language models by approximating shift effects from in-context demonstrations. By integrating lightweight learnable modules int models, it demonstrates superior performance compared to previous shift-vector methods and LoRA.
The following command can help you build the environment for testing idefics1 and idefics2.
conda create -y -n mimic python=3.10
pip install -r requirements.txtFor models, we currently support idefics1, idefics2 and llava-next-interleave. For datasets, VQAv2, OK-VQA, COCO, flickr30k, MME and SEED-bench are available.
cd ./script
# select a bash file to run
bash run_*.sh We would like to introduce some key files to help you understand how MimIC works.
In this file, we implemented MimIC attention heads (AttnApproximator) and another vector-based method -- LIVE (AttnFFNShift).
As we mentioned in paper, self-attention layers are substituted by MimIC attention heads. Such an integration is achieved by replacing forward of those self-attention layers in models (see *_attn_forward and register_shift_hooks). For example, as you can see idefics_attn_forward, we do a shift on regular attention output base on key and query of idefics.
In do_shift of AttnApproximator, we implemented
In this file, we implemented training framework of MimIC, as illustrated in Figure 3. ShiftModel feeds contexts prepared from data_module.py to the model and calculate losses depends on model_strategy. The model_strategy describes which types of losses should be calculated. For exmaple, MimIC uses Strategy.LAYER_WISE_MSE and Strategy.LM_LOSS, which stand for Strategy.LOGITS_KL_DIV and Strategy.LM_LOSS are used. As to LoRA, only Strategy.LM_LOSS should be applied.
We firstly feed in-context demonstrations and query to model to capture hidden states forward_hook, please see register_record_hook in shift_encoder.py for details. Then, we enable shift_hook (introduced in previous section) only feed query to model to obtain shifted hidden states
- You may need to add your dataset path to
src/paths.pyfirstly. - Create a new python script in
src/dataset_utils. - Create a new class named
Datasetand inherits fromsrc.dataset_utils.iterface.DatasetBase. - Implement all abstract methods and some special required attributes (see docstring of
DatasetBase).
Then you are able to use -d option to specify new dataset in run_* bash scripts.
This could be a kinda complicate.
0. You may need to add your model path to src/paths.py firstly.
- Create your new model in
testbed/models, following ICLTestbed guides here. - Specify the method of loading the model in
build_modelsfromsrc/utils.py. - Global search
ideficsinshift_model.pyand implement corresponding methods. - Determine how many epochs to run and when to save in
src/train.py.
@InProceedings{Jiang_2025_CVPR,
author = {Jiang, Yuchu and Fu, Jiale and Hao, Chenduo and Hu, Xinting and Peng, Yingzhe and Geng, Xin and Yang, Xu},
title = {Mimic In-Context Learning for Multimodal Tasks},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {29825-29835}
}
