Which arch should we test? - Llama 3B using AR loss - ModernBert as the backbone
Which arch should we test?