Just remembered that CTranslate2 can improve substantially the inference speed compared to marian-decoder for OpusMT models, or models that we do not have a bergamot student. It shouldn't be difficult to integrate it. Writing this here to not forget about it.