Is your feature request related to a problem? Please describe.
I think there are performance gains to be had by not densifying inputs from minibatches when possible and instead doing backprop on the sparse matrix directly yielded from the loader at the level of sparsity we often see (~2%) in RNA-seq data at least. In this notebook, it's 2X for a MLP classifier. IIUC, this same trick applies to the loss function as well as the ELBO i.e., use the sparse matrix directly instead of densifying.
It's possible (likely) that at higher values, this benefit either decreases or becomes 0.
See https://colab.research.google.com/drive/14cjQkQ2lO9wT7BpcfLfYCUY40GncXesh?authuser=3 for something runnable, if not old given the antiquated colab GPU
The implementation is based on https://github.com/rusty1s/pytorch_sparse
Describe the solution you'd like
I think the answer is a sort of runtime setting enum with three options:
auto tries to detect sparsity and at some cutoff uses the sparse accelerator
sparse_direct will replace the first layer with a sparse-linear layer (i.e., one that does matmul on a sparse input) as well as the other applicable locations with their respective ops
densify densifies all inputs
Is your feature request related to a problem? Please describe.
I think there are performance gains to be had by not densifying inputs from minibatches when possible and instead doing backprop on the sparse matrix directly yielded from the loader at the level of sparsity we often see (~2%) in RNA-seq data at least. In this notebook, it's 2X for a MLP classifier. IIUC, this same trick applies to the loss function as well as the ELBO i.e., use the sparse matrix directly instead of densifying.
It's possible (likely) that at higher values, this benefit either decreases or becomes 0.
See https://colab.research.google.com/drive/14cjQkQ2lO9wT7BpcfLfYCUY40GncXesh?authuser=3 for something runnable, if not old given the antiquated colab GPU
The implementation is based on https://github.com/rusty1s/pytorch_sparse
Describe the solution you'd like
I think the answer is a sort of runtime setting enum with three options:
autotries to detect sparsity and at some cutoff uses the sparse acceleratorsparse_directwill replace the first layer with a sparse-linear layer (i.e., one that does matmul on a sparse input) as well as the other applicable locations with their respective opsdensifydensifies all inputs