Repo and experiments conducted for DAFx25 paper https://arxiv.org/abs/2505.04082
-
First, follow directions to install NAM locally in your device. https://neural-amp-modeler.readthedocs.io/en/latest/installation.html
-
Navigate to source code (typically in C:\Users$USER\anaconda3\Lib\site-packages\nam)
-
Delete source code and replace by pulling from this repo
Training is done by using command:
nam-full "C:\Users\ryota\anaconda3\Lib\site-packages\nam\config\data.json" "C:\Users\ryota\anaconda3\Lib\site-packages\nam\config\model.json" "C:\Users\ryota\anaconda3\Lib\site-packages\nam\config\learning.json" "C:\Users\ryota\anaconda3\Lib\site-packages\nam\output"where the first string is path to data config file, then model config file, then learning config file, and the output directory. All of these files exists in this repo under config folder and output folder.
Make sure to change the location of training_data in data.json
To change the hyperparameter of training, go to model.json
Note: every time training is done, it automatically runs inference on test_data/1245Hz.wav
Batch training is done by using command:
bash run_nam_activations.sh 1 10Where the first number is the first seed of the run, and second number is the last seed of the run (inclusive). Make sure to set all the directories accordingly.
Inference is done by using command:
python inference.py --exp_dir output/2025-01-30-16-42-32 --input_path test_data/1245Hz.wav --output_name output.wavwhere --exp_dir is the experiment directory in the output folder, --input_path is the audio file we want to run inference on, and --output_name is the name of the output wav file.
Visualization of batch training for experiment section is done using this command
python other/visualize.pyVisualization for the analysis portion of paper is done via this command
python other/visualize_2.pyfinding the most unbiased seed is done via this command
python other/get_seed.pyCheck out both activation_test.ipynb and aliasing_test.ipynb for additional functions and testing done throughout this research.
Finished writing up the README.md, creating a github repo, and edited the _activations.py to include snake activation and relu squared activations. Model is training pretty smoothly, and will write script to analyze aliasing tomorrow
Added more activation functions, and created two jupyter notebooks which tests the activation function, and other that tests the aliasing.
Finished reading the paper about activation functions. Figured out way to run inference on any audio file. Also figured out way to visualize aliasing (getting top n_peak frequencies).
Instead of getting top n frequencies, I went with the first n harmonics. Also, made a inference.py script to run inference on any audio file. I updated aliasing to use heavily zero padded fft, and also computed the signal to noise ratio.
Created all the model config json files for each activation functions, and incorporated a batch script that trains the model for each activation. results text file lists all of the signal to noise ratio numbers for each activation (for three different seeds), but seems like there is a lot of variability and variance. Next, I might look at constructing the most "soft" activation functions to reduce aliasing.
Made scripts to run multiple seeds and get average. Seems to be a strong correlation between stretch factor and aliasing. Made script to visualize results.
Finished running script for tanh and snake (100 log spaced runs). The results show the strong correlation between stretch factor and aliasing and error.
- DONE: calculate aliasing using equation (square each magnitude then add only harmonics ==> which is energy) / (filter harmonics and sum squared amplitude of leftover noise which is energy) = signal to noise ratio
- DONE: finalize squared relu with dip (squaredSwish ==> Squish, SquaredGeLu ==> Squiglu)
- DONE: create a batch script to run all activation function
- DONE: change from SER to ASR (aliasing)/(harmonic signal), both magnitude squared summed
- DONE: add tanh and snake with different stretch factor
- DONE: create gated vs non gated config files
- DONE: run approximately 100 runs (different deterministic seeds) on all of them
- DONE: create a argmax and average of the 100 runs for all the different activation functions
- DONE: Look specifically into the Tanh (non gated) and zoom in by adding more horizontal stretch factors (run 100 ish seeds)
- DONE: Qualative Analysis: Export the best ASR ESR model and audibly compare with sine sweep (play around with how high the frequency goes), create before after audio clips.
- DONE: Conjecture: For the best stretch factor model, increase the hidden layer and compute power
- DONE: Theme: Alias reduction through smoothness of activation functions
- Look into learnable parameters (for the whole network, per layer, or per neuron), look at kernel size, additional audio signal processing intuitions
- Lowpass before capturing and capture with higher sampling rate (adding headroom, bandlimit activation, cool with Snake)