To update this document:
- Add bib entries to columnformers.bib.
- Make edits to RELATED_WORK_pandoc.md.
- Generate RELATED_WORK.md with compiled citations
using pandoc by running
make.
- [1] is a major inspiration for this work. The authors introduce the All-TNN architecture, which is basically a CNN without weight sharing.
- [2], [3] are other important works studying the emergence of topography in neural networks.
- [4] discusses the biological implausibility of weight sharing and proposes some strategies for training locally connected networks without weight sharing.
- [5] shows that imposing topographic constraints on the hidden units of a CNN results in emergent processing “streams” similar to the primate dorsal/ventral stream.
- Attention free transformers (AFT) [6]. The idea of the additive bias in place of the multiplicative query is especially relevant.
- RWKV with builds on AFT [7].
- Graph attention networks [8].
- Capsule networks [9], which have a similar inspiration to what we’re exploring.
- The perspective in [10] viewing the cortex as a uniform sheet of computational modules, and thinking of attention as communication.
- Geoff Hinton’s discussion of weight sharing and local constrastive distillation in [11].
- The discussion of geometry constraining brain function in [12].
- Spatially embedded recurrent networks in [13].
[1] Z. Lu et al., “End-to-end topographic networks as models of cortical map formation and human visual behaviour: Moving beyond convolutions,” arXiv preprint arXiv:2308.09431, 2023, doi: 10.48550/arXiv.2308.09431.
[2] F. R. Doshi and T. Konkle, “Cortical topographic motifs emerge in a self-organized map of object space,” Science Advances, 2023, doi: 10.1126/sciadv.ade8187.
[3] E. Margalit et al., “A unifying principle for the functional organization of visual cortex,” bioRxiv, 2023, doi: 10.1101/2023.05.18.541361.
[4] R. Pogodin, Y. Mehta, T. Lillicrap, and P. E. Latham, “Towards biologically plausible convolutional networks,” Advances in Neural Information Processing Systems, 2021.
[5] D. Finzi et al., “A single computational objective drives specialization of streams in visual cortex,” bioRxiv, 2023, doi: 10.1101/2023.12.19.572460.
[6] S. Zhai et al., “An attention free transformer,” arXiv preprint arXiv:2105.14103, 2021.
[7] B. Peng et al., “RWKV: Reinventing RNNs for the transformer era,” arXiv preprint arXiv:2305.13048, 2023.
[8] P. Veličković et al., “Graph attention networks,” in International conference on learning representations, 2018.
[9] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” Advances in neural information processing systems, 2017.
[10] A. Karpathy, “Introduction to transformers.” https://youtu.be/XfpMkf4rD6E?si=AM9AWDegUaFB7KCe, 2023.
[11] G. Hinton, “The robot brains season 2 episode 22.” https://www.therobotbrains.ai/who-is-geoff-hinton-part-two, 2022.
[12] J. C. Pang et al., “Geometric constraints on human brain function,” Nature, 2023.
[13] J. Achterberg et al., “Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings,” Nature Machine Intelligence, 2023.