Released in July 2023, BTLM quickly became the most downloaded model of its size on Hugging Face, amassing over 1 million downloads in three weeks. BTLM is a very similar architecture to GPT-2 with the exception of using Maximal Update Parameterization (μP) and adding SwiGLU activation and ALiBi for performance improvements. More details can be found in in the BTLM paper.
For more information on using BTLM, visit its model page in our documentation.