Cool work! and clean implementation I saw the authors recently released a newer version called Gaussian Gated Linear Networks https://arxiv.org/pdf/2006.05964.pdf I'd be curious to know how hard it would be to implement from your codebase!