ProtoPNet has been shown to improve classification outcomes for pre-trained classifier heads; potentially we could use a similar trick here.
It is not without some difficulty, though. We are currently storing average embeddings, rather than 'spatial' embeddings, and much of the advantage of the protopnet head seems to derive from having access to the spatial embeddings.
Good support for this could then look like either:
a) Optionally insert finer-grained spatial embeddings,
b) Use coarse embeddings for first-round classification, then 'rerank' by re-embedding+evaluating on spatial embeddings.