Question about the randomness of Gumbel softmax in Differentiable Reward Optimization (DiffRO)

I am currently reproducing the Cosy3 model and have a question regarding the DiffRO process described in your paper.

When I applied Gumbel-Softmax, the results changed depending on the randomness of the sampling in Gumbel-Softmax, even though the input features were the same. Could you provide any tips on how to handle this randomness?
for example, adjusting temperature of gumbel softmax ?