-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi Authors,
Thank you for the amazing work and for open-sourcing this project!
I had a quick question — do you think X-EcoMLA could be applied to Diffusion Language Models (DL models)?
I’m currently exploring ways to reduce the K–V cache memory during inference in diffusion-based language or vision–language models.
Since X-EcoMLA provides a theoretical and practical framework for converting MHA-based architectures into MLA with compressed KV caches, I was wondering whether a similar idea could be used for the iterative denoising steps in diffusion models.
Would love to hear your thoughts on whether X-EcoMLA's low-rank latent compression or RoRoPE decoupling could extend to diffusion-style attention or cross-attention blocks.
Thanks again for this great contribution!