Private model training (DP-SGD) with sparse features #1370
Description
Hello,
Private model training has been recently mentioned here. One of the privacy considerations is to include DP in the training loop through DP-SGD.
There are cases when DP-SGD would make the training process considerably slower as it destroys the sparsity of the gradients calculated during backprop, rendering impossible to use optimization techniques that rely on such sparsity. This is usually the case when some features are categorical features or working with embedding tables in the case of LLMs. I am aware there is research around this topic to remedy it, although it is not clear from the explainer linked above if this is something that has been considered in the context of Protected Audience API.
Are there any techniques that are being considered to face this or thoughts about this topic?
Thanks