Skip to content

Convergence of LayerIntegratedGradients #1413

Open
@pakaphholbig

Description

I was following this demonstration to interpret my DistilBert-based model. I found that, in a few cases, IG does not converge even with a high n_steps value. However, if we swap the input and baseline, expecting a negative sign of the sum of IG across all dimensions with respect to the original, it converges.

Here is a part of my code:

lig = LayerIntegratedGradients(forward_pass,  model.distilbert.embeddings)

attributions_1, delta_1 = lig.attribute(inputs=(input_ids_1, token_type_ids_1, attention_mask_1),
                                  baselines=(ref_input_ids, ref_token_type_ids, ref_attention_mask),
                                  internal_batch_size= 15,
                                  n_steps = n_steps,
                                  return_convergence_delta=True)

attributions_ref, delta_ref = lig.attribute(inputs=(ref_input_ids, ref_token_type_ids, ref_attention_mask),
                                  baselines=(input_ids_1, token_type_ids_1, attention_mask_1),
                                  internal_batch_size= 15,
                                  n_steps = n_steps,
                                  return_convergence_delta=True)

The reference (baseline) input consists of only starting, ending, padding tokens.

I found that they converge to different values. Specifically, for n_steps = 300, delta_1 = -0.001 and delta_ref = 0.387. Even after increasing n_steps = 900, the deltas still remain the same. I would like to ask if there is an explanation for this.

*Note: my predict and forward_pass functions, analogous to the squad_pos_forward_func, are defined in the same way.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions