Support more diverse input types for methods working on tabular data

## Current status

Currently tabular data in methods such as `AnchorTabular` and `CounterfactualProto` are expected to be in one of a set of restricted formats, e.g.
 - `np.ndarray` of homogeneous number types where numerical columns are `float` and categorical columns are integer-encoded  (to be pedantic, `float` that can be cast to `int` without loss of precision, e.g. `0.0` denoting the first category of a categorical feature).
 - `np.ndarray` of homogeneous number types where numerical columns are `float` and categorical features have been expanded into one-hot-encoded columns (i.e. each categorical feature now occupies `n_categories` columns which are populated with `0` and `1` entries)

## Problem

If a user model is not trained on a data representation that is one of the above then Alibi tabular explainers cannot be used out-of-the-box which is undesirable (as found out by @FarrandTom).

For concreteness, denote by `X` an input data point that is **non-compliant** with the Alibi API, e.g. it could be `np.ndarray` but with unsupported column types, for example `array([49.5, 'Male'], dtype=object)` representing a numerical feature and a string-encoded categorical variable.

Further, denote by `Z` an input data point that is **compliant** with the Alibi API, e.g. ` array([49.5,  0. ])` representing the same numerical feature and the same but integer-encoded categorical variable.

A client may have a model `M` that's trained on non-compliant data, i.e. it would be of type `Callable[[X], np.ndarray]`, whereas Alibi expects a model `M_hat` (prediction function) of type `Callable[[Z], np.ndarray]`. How can we go from a non-compliant model to a compliant one?

The key is being able to map back and forth between `X` and `Z`. Let `f: X->Z` be such an invertible mapping, for the example above it would be something like:
```python
def f(X: np.ndarray, **kwargs) -> np.ndarray:  # use **kwargs for any other information needed to do the conversion
    Z_num = extract_numeric(X, **kwargs)  # extract columns like 49.5,  Z_num is now a homogenous array of numbers
    Z_cat = extract_cat(X, **kwargs)  # take columns like 'Male' and convert to 0, Z_cat is now a homogenous array of numbers
    Z = combine(Z_num, Z_cat, **kwargs)  # concatenate columns in the right order
    return Z

def f_inverse(Z: np.ndarray, **kwargs) -> np.ndarray:
    ... # do similar operations as above
    return Z
```

With this extra information we can define an Alibi-compliant model in terms of client model `M` and inverse mapping `f_inv` as follows: `M_hat = M(f_inv(Z))`, in Python
```python
def M_hat(Z: np.ndarray) -> np.ndarray:
    X = f_inv(Z)
    pred = M(X)
    return pred
```

## What we can do

- (Minimum effort) Document the process of manually creating the `f, f_inv` transformations to make an Alibi non-compliant model into a compliant one, specific examples are needed. This is in spirit the same as the discussion on [white-box vs black-box models](https://docs.seldon.io/projects/alibi/en/latest/overview/white_box_black_box.html). This has short term gains demonstrating that in principle it should always be possible do use Alibi explainers if the user is prepared to do a bit more work. It is unclear, however, if this translates well to the deployment setting.
- (Medium effort) Extend the types of data that Alibi can handle natively. This likely requires some design considerations and may require the user to provide more information if they pass heterogenous `np.ndarray` data. It may be useful to extend this to take in also `pd.DataFrame` and/or `pd.Series` objects as necessary. This has longer term gains as the user would no longer need to do extra work in case their model is trained on a different data representation.
- (Medium high effort?) Extend capabilities on the Deployment side. See below for an explanation. This has longer term gains making explainer deployment more flexible wrt to pointing to a component within an inference graph.

## What about deployment?

In deployment we may have the following situation where the inference graph consists of a transformer mapping an Alibi-non-compliant data `X` to a compliant one `Z` which is then passed into an Alibi-compliant model:
<img src="https://user-images.githubusercontent.com/13080878/139284756-65ff066d-f272-4ddc-8d48-035d3c94690c.png" alt="alt text" width="450">

How could we add an explainer to this inference graph?

### Point directly to the model component
If we know the model component is Alibi-compliant, we could point the explainer to that instead of the whole inference graph (which is non-compliant):
<img src="https://user-images.githubusercontent.com/13080878/139286196-c980c3fc-47ea-47d6-b096-46379ab8f238.png" alt="alt text" width="450">
However, note that in this scenario the explainer expects the compliant data type `Z` whilst the inference graph operates on the original data type `X`. To obtain `Z` from `X` we would need to leverage the existing transformer so we could extend the inference graph like this (conceptually, implementation details may vary):
<img src="https://user-images.githubusercontent.com/13080878/139288600-8d1b1016-d1be-46aa-bc20-b621ce6deffc.png" alt="alt text" width="450">
The only job of the `Explainer-Transformer` component is to call an existing transformer that is known (by the user) to transform non-compliant data `X` into compliant data `Z`.

### Non-compliant models within an inference graph
Not all inference graphs contain a model node that would be Alibi-compliant, so in the general case the above would not work and it would be necessary to either:
 - Extend Alibi compliant data types to support a wide variety of use cases / inference graphs
 - Have the user add extra nodes to the inference graph, transforming non-compliant data types to Alibi-compliant ones. This is essentially the same as defining the transformation steps `f` and `f_inv` but also packaging them as inference graph components:
<img src="https://user-images.githubusercontent.com/13080878/139292504-158042dc-3a00-49a1-9f4c-a5b1d577ebdd.png" alt="alt text" width="550">
Here the shaded `Compliant-Transformer` and `Compliant-Inverse-Transformer` components correspond exactly to the functions `f` and `f_inv` defined above but explicitly included in an inference graph (implementation details may be different, e.g. these could live inside the `Explainer` as two Python `Callable`s). Effectively we're pointing the explainer to the whole non-compliant inference graph (equivalently, we could also point to the non-compliant model, but there is no point to do that) but we do the conversions to and from compliant data using the new transformer components on the fly (in particular, the inverse transformer intercepts prediction requests and puts them in a format that the non-compliant inference graph can deal with).

Tagging a few people who may be interested in the discussion: @FarrandTom @cliveseldon @axsaucedo @SachinVarghese @arnaudvl .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support more diverse input types for methods working on tabular data #516

Current status

Problem

What we can do

What about deployment?

Point directly to the model component

Non-compliant models within an inference graph

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support more diverse input types for methods working on tabular data #516

Description

Current status

Problem

What we can do

What about deployment?

Point directly to the model component

Non-compliant models within an inference graph

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions