Question about handling single-channel (non-RGB) inputs in MatchAnything

Hi, thanks for your great work on MatchAnything!
I saw that your model builds on RoMa (which uses DINOv2) — since DINOv2 expects RGB images, **I’m curious how you handle single-channel inputs (e.g., depth or infrared).**
Do you replicate the channel to 3, or use some projection/adaptation before the encoder?

Thanks a lot for your time!