This document compiles all references, citations, and insights used to build NexaVisualize.
It serves as both a bibliography and a reflection on the project.
The following resources were directly referenced while implementing different architectures in NexaVisualize:
-
Feedforward Neural Networks (FNNs)
- Goodfellow, Ian, Bengio, Yoshua, Courville, Aaron. Deep Learning. MIT Press (2016).
- Introduction to Feedforward Neural Networks (FNNs): Intro to FNN's
- Feedforward Neural Networks: The Backbone of Deep Learning: The bacbone of DL
- Stanford CS231n: Neural Networks Part 1.
-
Convolutional Neural Networks (CNNs)
- LeCun, Yann, et al. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE (1998).
- An Introduction to Convolutional Neural Networks (CNNs): An intro To CNN's
- Demystifying CNNs: A Deep Dive into Convolutional Neural Network Fundamentals: A Deep Dive into CNN's
- Datacamp: Convolutional Neural Networks Explained.
-
Transformers
- Vaswani, A., et al. Attention Is All You Need. NeurIPS (2017).
- VitalFlux: Transformer Neural Network Architecture.
- Attention is all you need: Attention is all you need
- How do Transfomers work Huggingface: How do transformers work
- Datacamp: Transformers Explained Visually.
-
Mixture of Experts (MoE)
- Shazeer, N., et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538 (2017).
- How do MOE's work: How Do MOE's work
- A Visual Guide to Mixture of Experts (MoE): A Visual Guide to MOE's
- Hugging Face: Visual Guide to Mixture of Experts (MoE).
- Feedforward Neural Network (FNN) – fully customizable
- Convolutional Neural Networks (CNNs) – base + variants
- Transformers – vanilla encoder-decoder, extendable for variants
- Mixture of Experts (MoE) – router + expert visualization
- (Stretch goals, left for community): Autoencoder, Variational Autoencoder (VAE)
This project wasn’t about breaking new ground in ML theory — it was about testing and solidifying my own understanding.
Key takeaways:
- Visualization matters. Most ML work is hidden in math or code. Seeing the flow of data across blocks and layers helps build intuition and makes architectures less abstract.
- Refresher on fundamentals. Re-implementing CNNs, Transformers, and MoEs from scratch was a great way to confirm I actually understood them at a structural level.
- Educational potential. Visualizations combined with citations allow learners to both see the architecture and read deeper from the sources.
- Scope discipline. By keeping V1 focused (FNN, CNN, Transformer, MoE + quality-of-life features like light/dark mode), the project reached a natural “feature complete” state instead of drifting endlessly.
NexaVisualize is feature complete for me.
- Community contributions are welcome via PRs.
- If you’d like to extend it (e.g., add ResNets, LSTMs, or VAEs), the modular base classes are designed to be extendable.
For me, this project is done. For the community, it’s a playground.