Description
I was following this demonstration to interpret my DistilBert-based model. I found that, in a few cases, IG does not converge even with a high n_steps
value. However, if we swap the input and baseline, expecting a negative sign of the sum of IG across all dimensions with respect to the original, it converges.
Here is a part of my code:
lig = LayerIntegratedGradients(forward_pass, model.distilbert.embeddings)
attributions_1, delta_1 = lig.attribute(inputs=(input_ids_1, token_type_ids_1, attention_mask_1),
baselines=(ref_input_ids, ref_token_type_ids, ref_attention_mask),
internal_batch_size= 15,
n_steps = n_steps,
return_convergence_delta=True)
attributions_ref, delta_ref = lig.attribute(inputs=(ref_input_ids, ref_token_type_ids, ref_attention_mask),
baselines=(input_ids_1, token_type_ids_1, attention_mask_1),
internal_batch_size= 15,
n_steps = n_steps,
return_convergence_delta=True)
The reference (baseline) input consists of only starting, ending, padding tokens.
I found that they converge to different values. Specifically, for n_steps = 300
, delta_1 = -0.001
and delta_ref = 0.387
. Even after increasing n_steps = 900
, the deltas still remain the same. I would like to ask if there is an explanation for this.
*Note: my predict
and forward_pass
functions, analogous to the squad_pos_forward_func
, are defined in the same way.