Skip to content

Official repo for AMD hybrid models training and inference workflow

License

Notifications You must be signed in to change notification settings

AMD-AGI/AMD-Hybrid-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

82 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AMD-Hybrid-Models

License: Apache 2.0 arXiv arXiv

๐Ÿ” Overview: Efficient Hybrid Language Models on AMD GPUs

Official Repository for X-EcoMLA and Zebra-Llama

Welcome! This repo hosts two complementary projects that focus on memory-efficient and high-performance large language models (LLMs). Large Language Models (LLMs) often face major memory bottlenecks due to large key-value (KV) caches during inference. This repository introduces two solutions:

Folder Description
x-eco-mla/ Implements X-EcoMLA: a method for upcycling attention into Multi-head Latent Attention (MLA) for extreme KV cache compression.
zebra-llama/ Implements Zebra-Llama: a family of hybrid MLA + Mamba2 models with minimal retraining and maximum efficiency.

Citation

If you find this repository useful in your research or application, please cite our paper:

@article{li2025x_ecomla,
  title={{X-EcoMLA}: Upcycling Pre-Trained Attention into {MLA} for Efficient and Extreme {KV} Compression},
  author={Li, Guihong and Rezagholizadeh, Mehdi and Yang, Mingyu and Appia, Vikram and Barsoum, Emad},
  journal={arXiv preprint arXiv:2503.11132},
  year={2025},
  url={https://arxiv.org/abs/2503.11132}
}

@article{yang2025zebra,
  title={Zebra-Llama: Towards Extremely Efficient Hybrid Models},
  author={Yang, Mingyu and Rezagholizadeh, Mehdi and Li, Guihong and Appia, Vikram and Barsoum, Emad},
  journal={arXiv preprint arXiv:2505.17272},
  year={2025}
}

๐Ÿค Contributing

We welcome contributions! Please open an issue to discuss questions and major changes.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

About

Official repo for AMD hybrid models training and inference workflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •