Skip to content

Commit b795e83

Browse files
aobo-yfacebook-github-bot
authored andcommitted
update influence docstring (#881)
Summary: as title Pull Request resolved: #881 Reviewed By: 99warriors Differential Revision: D34537725 Pulled By: aobo-y fbshipit-source-id: fd4671032ac7824be66ea42194d1468718ba04d3
1 parent 8cc99fe commit b795e83

File tree

3 files changed

+102
-95
lines changed

3 files changed

+102
-95
lines changed

captum/influence/_core/similarity_influence.py

+22-22
Original file line numberDiff line numberDiff line change
@@ -109,16 +109,15 @@ def __init__(
109109
we pass in a Tensor with 3 examples, i.e. batch_size_2 = 3. Also,
110110
suppose that our inputs and intermediate activations throughout the
111111
model will have dimension (N, C, H, W). Then, the feature dimensions
112-
should be flattened within this function. For example, let
112+
should be flattened within this function. For example::
113113
114-
```
115-
av_test.shape = torch.Size([3, N, C, H, W])
116-
av_src.shape = torch.Size([16, N, C, H, W])
117-
```
118-
119-
Then, using `torch.view(av_test.shape[0], -1)` we have
120-
121-
`av_test.shape = torch.Size([3, N x C x H x W])
114+
>>> av_test.shape
115+
torch.Size([3, N, C, H, W])
116+
>>> av_src.shape
117+
torch.Size([16, N, C, H, W])
118+
>>> av_test = torch.view(av_test.shape[0], -1)
119+
>>> av_test.shape
120+
torch.Size([3, N x C x H x W])
122121
123122
and similarly for av_src. The similarity_metric should then use
124123
these flattened tensors to return the pairwise similarity matrix.
@@ -172,7 +171,7 @@ def influence( # type: ignore[override]
172171
first dimension in `inputs` tensor or tuple of tensors corresponds
173172
to the batch size. A tuple of tensors is only passed in if this
174173
is the input form that `module` accepts.
175-
top_k (int): The number of top-matchinig activations to return
174+
top_k (int): The number of top-matching activations to return
176175
additional_forward_args (optional): Additional arguments that will be
177176
passed to forward_func after inputs.
178177
load_src_from_disk (bool): Loads activations for `influence_src_dataset`
@@ -190,18 +189,19 @@ def influence( # type: ignore[override]
190189
implementation of `DataInfluence` abstract class.
191190
192191
Returns:
193-
influences (dictionary): Returns the influential instances retrieved from
194-
`influence_src_dataset` for each test example represented through a
195-
tensor or a tuple of tensor in `inputs`. Returned influential
196-
examples are represented as dict, with keys corresponding to
197-
the layer names passed in `layers`. Each value in the dict is a
198-
tuple containing the indices and values for the top k similarities
199-
from `influence_src_dataset` by the chosen metric. The first value
200-
in the tuple corresponds to the indices corresponding to the top k
201-
most similar examples, and the second value is the similarity score.
202-
The batch dimension corresponds to the batch dimension of `inputs`.
203-
If inputs.shape[0] == 5, then dict[`layer_name`][0].shape[0] == 5.
204-
These tensors will be of shape (inputs.shape[0], top_k).
192+
193+
influences (dict): Returns the influential instances retrieved from
194+
`influence_src_dataset` for each test example represented through a
195+
tensor or a tuple of tensor in `inputs`. Returned influential
196+
examples are represented as dict, with keys corresponding to
197+
the layer names passed in `layers`. Each value in the dict is a
198+
tuple containing the indices and values for the top k similarities
199+
from `influence_src_dataset` by the chosen metric. The first value
200+
in the tuple corresponds to the indices corresponding to the top k
201+
most similar examples, and the second value is the similarity score.
202+
The batch dimension corresponds to the batch dimension of `inputs`.
203+
If inputs.shape[0] == 5, then dict[`layer_name`][0].shape[0] == 5.
204+
These tensors will be of shape (inputs.shape[0], top_k).
205205
"""
206206
inputs_batch_size = (
207207
inputs[0].shape[0] if isinstance(inputs, tuple) else inputs.shape[0]

captum/influence/_core/tracincp.py

+43-41
Original file line numberDiff line numberDiff line change
@@ -249,25 +249,26 @@ def influence( # type: ignore[override]
249249
) -> Union[Tensor, KMostInfluentialResults]:
250250
r"""
251251
This is the key method of this class, and can be run in 3 different modes,
252-
where the mode that is run depends on the arguments passed to this method.
252+
where the mode that is run depends on the arguments passed to this method:
253+
253254
- self influence mode: This mode is used if `inputs` is None. This mode
254-
computes the self influence scores for every example in
255-
the training dataset `influence_src_dataset`.
255+
computes the self influence scores for every example in
256+
the training dataset `influence_src_dataset`.
256257
- influence score mode: This mode is used if `inputs` is not None, and `k` is
257-
None. This mode computes the influence score of every example in
258-
training dataset `influence_src_dataset` on every example in the test
259-
batch represented by `inputs` and `targets`.
258+
None. This mode computes the influence score of every example in
259+
training dataset `influence_src_dataset` on every example in the test
260+
batch represented by `inputs` and `targets`.
260261
- k-most influential mode: This mode is used if `inputs` is not None, and
261-
`k` is not None, and an int. This mode computes the proponents or
262-
opponents of every example in the test batch represented by `inputs`
263-
and `targets`. In particular, for each test example in the test batch,
264-
this mode computes its proponents (resp. opponents), which are the
265-
indices in the training dataset `influence_src_dataset` of the training
266-
examples with the `k` highest (resp. lowest) influence scores on the
267-
test example. Proponents are computed if `proponents` is True.
268-
Otherwise, opponents are computed. For each test example, this method
269-
also returns the actual influence score of each proponent (resp.
270-
opponent) on the test example.
262+
`k` is not None, and an int. This mode computes the proponents or
263+
opponents of every example in the test batch represented by `inputs`
264+
and `targets`. In particular, for each test example in the test batch,
265+
this mode computes its proponents (resp. opponents), which are the
266+
indices in the training dataset `influence_src_dataset` of the training
267+
examples with the `k` highest (resp. lowest) influence scores on the
268+
test example. Proponents are computed if `proponents` is True.
269+
Otherwise, opponents are computed. For each test example, this method
270+
also returns the actual influence score of each proponent (resp.
271+
opponent) on the test example.
271272
272273
Args:
273274
inputs (Any, optional): If not provided or `None`, the self influence mode
@@ -300,33 +301,34 @@ def influence( # type: ignore[override]
300301
301302
Returns:
302303
The return value of this method depends on which mode is run.
304+
303305
- self influence mode: if this mode is run (`inputs` is None), returns a 1D
304-
tensor of self influence scores over training dataset
305-
`influence_src_dataset`. The length of this tensor is the number of
306-
examples in `influence_src_dataset`, regardless of whether it is a
307-
Dataset or DataLoader.
306+
tensor of self influence scores over training dataset
307+
`influence_src_dataset`. The length of this tensor is the number of
308+
examples in `influence_src_dataset`, regardless of whether it is a
309+
Dataset or DataLoader.
308310
- influence score mode: if this mode is run (`inputs is not None, `k` is
309-
None), returns a 2D tensor `influence_scores` of shape
310-
`(input_size, influence_src_dataset_size)`, where `input_size` is
311-
the number of examples in the test batch, and
312-
`influence_src_dataset_size` is the number of examples in
313-
training dataset `influence_src_dataset`. In other words,
314-
`influence_scores[i][j]` is the influence score of the `j`-th
315-
example in `influence_src_dataset` on the `i`-th example in the
316-
test batch.
317-
- k-most influential mode: if this mode is run (`inputs` is not None,
318-
`k` is an int), returns a namedtuple `(indices, influence_scores)`.
319-
`indices` is a 2D tensor of shape `(input_size, k)`, where
320-
`input_size` is the number of examples in the test batch. If
321-
computing proponents (resp. opponents), `indices[i][j]` is the
322-
index in training dataset `influence_src_dataset` of the example
323-
with the `j`-th highest (resp. lowest) influence score (out of the
324-
examples in `influence_src_dataset`) on the `i`-th example in the
325-
test batch. `influence_scores` contains the corresponding influence
326-
scores. In particular, `influence_scores[i][j]` is the influence
327-
score of example `indices[i][j]` in `influence_src_dataset` on
328-
example `i` in the test batch represented by `inputs` and
329-
`targets`.
311+
None), returns a 2D tensor `influence_scores` of shape
312+
`(input_size, influence_src_dataset_size)`, where `input_size` is
313+
the number of examples in the test batch, and
314+
`influence_src_dataset_size` is the number of examples in
315+
training dataset `influence_src_dataset`. In other words,
316+
`influence_scores[i][j]` is the influence score of the `j`-th
317+
example in `influence_src_dataset` on the `i`-th example in the
318+
test batch.
319+
- k-most influential mode: if this mode is run (`inputs` is not None,
320+
`k` is an int), returns a namedtuple `(indices, influence_scores)`.
321+
`indices` is a 2D tensor of shape `(input_size, k)`, where
322+
`input_size` is the number of examples in the test batch. If
323+
computing proponents (resp. opponents), `indices[i][j]` is the
324+
index in training dataset `influence_src_dataset` of the example
325+
with the `j`-th highest (resp. lowest) influence score (out of the
326+
examples in `influence_src_dataset`) on the `i`-th example in the
327+
test batch. `influence_scores` contains the corresponding influence
328+
scores. In particular, `influence_scores[i][j]` is the influence
329+
score of example `indices[i][j]` in `influence_src_dataset` on
330+
example `i` in the test batch represented by `inputs` and
331+
`targets`.
330332
"""
331333
_inputs = _format_inputs(inputs, unpack_inputs)
332334

captum/influence/_core/tracincp_fast_rand_proj.py

+37-32
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ def __init__(
146146

147147
self.vectorize = vectorize
148148

149-
"TODO: restore prior state"
149+
# TODO: restore prior state
150150
self.final_fc_layer = final_fc_layer
151151
if isinstance(self.final_fc_layer, str):
152152
self.final_fc_layer = _get_module_from_name(model, self.final_fc_layer)
@@ -673,22 +673,25 @@ def influence( # type: ignore[override]
673673
) -> Union[Tensor, KMostInfluentialResults]:
674674
r"""
675675
This is the key method of this class, and can be run in 2 different modes,
676-
where the mode that is run depends on the arguments passed to this method.
676+
where the mode that is run depends on the arguments passed to this method
677+
677678
- influence score mode: This mode is used if `inputs` is not None, and `k` is
678-
None. This mode computes the influence score of every example in
679-
training dataset `influence_src_dataset` on every example in the test
680-
batch represented by `inputs` and `targets`.
679+
None. This mode computes the influence score of every example in
680+
training dataset `influence_src_dataset` on every example in the test
681+
batch represented by `inputs` and `targets`.
682+
681683
- k-most influential mode: This mode is used if `inputs` is not None, and
682-
`k` is not None, and an int. This mode computes the proponents or
683-
opponents of every example in the test batch represented by `inputs`
684-
and `targets`. In particular, for each test example in the test batch,
685-
this mode computes its proponents (resp. opponents), which are the
686-
indices in the training dataset `influence_src_dataset` of the training
687-
examples with the `k` highest (resp. lowest) influence scores on the
688-
test example. Proponents are computed if `proponents` is True.
689-
Otherwise, opponents are computed. For each test example, this method
690-
also returns the actual influence score of each proponent (resp.
691-
opponent) on the test example.
684+
`k` is not None, and an int. This mode computes the proponents or
685+
opponents of every example in the test batch represented by `inputs`
686+
and `targets`. In particular, for each test example in the test batch,
687+
this mode computes its proponents (resp. opponents), which are the
688+
indices in the training dataset `influence_src_dataset` of the training
689+
examples with the `k` highest (resp. lowest) influence scores on the
690+
test example. Proponents are computed if `proponents` is True.
691+
Otherwise, opponents are computed. For each test example, this method
692+
also returns the actual influence score of each proponent (resp.
693+
opponent) on the test example.
694+
692695
Note that unlike `TracInCPFast`, this class should *not* be run in self
693696
influence mode. To compute self influence scores when only considering
694697
gradients in the last fully-connected layer, please use `TracInCPFast` instead.
@@ -723,24 +726,26 @@ def influence( # type: ignore[override]
723726
Default: True
724727
725728
Returns:
726-
The return value of this method depends on which mode is run.
729+
730+
The return value of this method depends on which mode is run
731+
727732
- influence score mode: if this mode is run (`inputs is not None, `k` is
728-
None), returns a 2D tensor `influence_scores` of shape
729-
`(input_size, influence_src_dataset_size)`, where `input_size` is
730-
the number of examples in the test batch, and
731-
`influence_src_dataset_size` is the number of examples in
732-
training dataset `influence_src_dataset`. In other words,
733-
`influence_scores[i][j]` is the influence score of the `j`-th
734-
example in `influence_src_dataset` on the `i`-th example in the
735-
test batch.
736-
- k-most influential mode: if this mode is run (`inputs` is not None,
737-
`k` is an int), returns `indices`, which is a 2D tensor of shape
738-
`(input_size, k)`, where `input_size` is the number of examples
739-
in the test batch. If computing proponents (resp. opponents),
740-
`indices[i][j]` is the index in training dataset
741-
`influence_src_dataset` of the example with the `j`-th highest
742-
(resp. lowest) influence score (out of the examples in
743-
`influence_src_dataset`) on the `i`-th example in the test batch.
733+
None), returns a 2D tensor `influence_scores` of shape
734+
`(input_size, influence_src_dataset_size)`, where `input_size` is
735+
the number of examples in the test batch, and
736+
`influence_src_dataset_size` is the number of examples in
737+
training dataset `influence_src_dataset`. In other words,
738+
`influence_scores[i][j]` is the influence score of the `j`-th
739+
example in `influence_src_dataset` on the `i`-th example in the
740+
test batch.
741+
- most influential mode: if this mode is run (`inputs` is not None,
742+
`k` is an int), returns `indices`, which is a 2D tensor of shape
743+
`(input_size, k)`, where `input_size` is the number of examples
744+
in the test batch. If computing proponents (resp. opponents),
745+
`indices[i][j]` is the index in training dataset
746+
`influence_src_dataset` of the example with the `j`-th highest
747+
(resp. lowest) influence score (out of the examples in
748+
`influence_src_dataset`) on the `i`-th example in the test batch.
744749
"""
745750
msg = (
746751
"Since `inputs` is None, this suggests `TracInCPFastRandProj` is being "

0 commit comments

Comments
 (0)