I've noticed the recent CLIP-related updates in the repo. Could you provide documentation or examples showing how to use ViT Prisma on CLIP, specifically with logit lense and activation patching?
I'm also working on a project where I'm trying to activation patch CLIP's vision encoder (ViT) using ViT Prisma while keeping the text encoder unchanged. I'm wondering whether I should hook the entire CLIP model or if just hooking the visual encoder would be sufficient?
Any guidance would be appreciated!