A replication and interpretability study of Frank Rosenblatt's 1957 paper: 'The Perceptron: A Perceiving and Recognizing Automaton' (Project PARA).
This project explores the origins of neural networks, focusing on the original probabilistic logic rather than modern backpropagation and gradient descent.
Unlike a modern Perceptron (which is just a single linear layer), the 1957 model is a three-stage system (
- Connections (weights) are fixed and random. The model cannot "learn" its own concepts, it must rely on a large number of random connections.
- Learning happens by incrementing a value (
$v$ ) in the$A$ -units. There is no error signal flowing backward; only a "reward" for units that were active during a correct prediction. - The parameters
$\theta, x,$ and$y$ were originally optimized to be implemented in a physical circuit (the Mark I Perceptron).
This repo replicates the original Perceptron architecture from 1957. It contains:
-
perceptron.py: Implements the
$S, A,$ and$R$ systems and the value-accumulation logic. - train.py: Trains the model to differentiate Shapes (Square vs. Cross) on a grid.
- opt.py: An implementation of Appendix I from the paper, using statistical equations to find the optimal connectivity constraints.
- interp.py: Tools for analysing the model to see what it has actually learned.
For example:
uv run train.py -N_s 400 -N_a 1000 -N_r 2 --train_samples 500train a perceptron model with 400
You can also mess around with the optmization itself using uv run opt.py. Use uv run opt.py --help for parameter options.
If you are interested in finding out what the model has actually learned run uv run interp.py. Use uv run interp.py --help for parameter options.
-
If shapes are in fixed positions during training, the model ignores the "concept" of shape. Instead, it identifies a square using only 2-3 specific pixels. In this mode, most
$A$ -units remain "dead", with only a tiny population achieving perfect performance. -
When shapes move randomly, the model cannot rely on specific pixels. Instead, it learns a statistical mess of probabilities. It doesn't learn edges or or the actual shapes. Instead it learns which random connections are statistically more likely to hit a Square than a Cross across the whole grid.
-
Removing just a few pixels from a shape can flip the model's decision instantly. This confirms that the model finds simple statistical patterns which are much easier to learn then any geometrical patterns.