Skip to content

Support more diverse input types for methods working on tabular data #516

Open
@jklaise

Description

@jklaise

Current status

Currently tabular data in methods such as AnchorTabular and CounterfactualProto are expected to be in one of a set of restricted formats, e.g.

  • np.ndarray of homogeneous number types where numerical columns are float and categorical columns are integer-encoded (to be pedantic, float that can be cast to int without loss of precision, e.g. 0.0 denoting the first category of a categorical feature).
  • np.ndarray of homogeneous number types where numerical columns are float and categorical features have been expanded into one-hot-encoded columns (i.e. each categorical feature now occupies n_categories columns which are populated with 0 and 1 entries)

Problem

If a user model is not trained on a data representation that is one of the above then Alibi tabular explainers cannot be used out-of-the-box which is undesirable (as found out by @FarrandTom).

For concreteness, denote by X an input data point that is non-compliant with the Alibi API, e.g. it could be np.ndarray but with unsupported column types, for example array([49.5, 'Male'], dtype=object) representing a numerical feature and a string-encoded categorical variable.

Further, denote by Z an input data point that is compliant with the Alibi API, e.g. array([49.5, 0. ]) representing the same numerical feature and the same but integer-encoded categorical variable.

A client may have a model M that's trained on non-compliant data, i.e. it would be of type Callable[[X], np.ndarray], whereas Alibi expects a model M_hat (prediction function) of type Callable[[Z], np.ndarray]. How can we go from a non-compliant model to a compliant one?

The key is being able to map back and forth between X and Z. Let f: X->Z be such an invertible mapping, for the example above it would be something like:

def f(X: np.ndarray, **kwargs) -> np.ndarray:  # use **kwargs for any other information needed to do the conversion
    Z_num = extract_numeric(X, **kwargs)  # extract columns like 49.5,  Z_num is now a homogenous array of numbers
    Z_cat = extract_cat(X, **kwargs)  # take columns like 'Male' and convert to 0, Z_cat is now a homogenous array of numbers
    Z = combine(Z_num, Z_cat, **kwargs)  # concatenate columns in the right order
    return Z

def f_inverse(Z: np.ndarray, **kwargs) -> np.ndarray:
    ... # do similar operations as above
    return Z

With this extra information we can define an Alibi-compliant model in terms of client model M and inverse mapping f_inv as follows: M_hat = M(f_inv(Z)), in Python

def M_hat(Z: np.ndarray) -> np.ndarray:
    X = f_inv(Z)
    pred = M(X)
    return pred

What we can do

  • (Minimum effort) Document the process of manually creating the f, f_inv transformations to make an Alibi non-compliant model into a compliant one, specific examples are needed. This is in spirit the same as the discussion on white-box vs black-box models. This has short term gains demonstrating that in principle it should always be possible do use Alibi explainers if the user is prepared to do a bit more work. It is unclear, however, if this translates well to the deployment setting.
  • (Medium effort) Extend the types of data that Alibi can handle natively. This likely requires some design considerations and may require the user to provide more information if they pass heterogenous np.ndarray data. It may be useful to extend this to take in also pd.DataFrame and/or pd.Series objects as necessary. This has longer term gains as the user would no longer need to do extra work in case their model is trained on a different data representation.
  • (Medium high effort?) Extend capabilities on the Deployment side. See below for an explanation. This has longer term gains making explainer deployment more flexible wrt to pointing to a component within an inference graph.

What about deployment?

In deployment we may have the following situation where the inference graph consists of a transformer mapping an Alibi-non-compliant data X to a compliant one Z which is then passed into an Alibi-compliant model:
alt text

How could we add an explainer to this inference graph?

Point directly to the model component

If we know the model component is Alibi-compliant, we could point the explainer to that instead of the whole inference graph (which is non-compliant):
alt text
However, note that in this scenario the explainer expects the compliant data type Z whilst the inference graph operates on the original data type X. To obtain Z from X we would need to leverage the existing transformer so we could extend the inference graph like this (conceptually, implementation details may vary):
alt text
The only job of the Explainer-Transformer component is to call an existing transformer that is known (by the user) to transform non-compliant data X into compliant data Z.

Non-compliant models within an inference graph

Not all inference graphs contain a model node that would be Alibi-compliant, so in the general case the above would not work and it would be necessary to either:

  • Extend Alibi compliant data types to support a wide variety of use cases / inference graphs
  • Have the user add extra nodes to the inference graph, transforming non-compliant data types to Alibi-compliant ones. This is essentially the same as defining the transformation steps f and f_inv but also packaging them as inference graph components:

alt text

Here the shaded `Compliant-Transformer` and `Compliant-Inverse-Transformer` components correspond exactly to the functions `f` and `f_inv` defined above but explicitly included in an inference graph (implementation details may be different, e.g. these could live inside the `Explainer` as two Python `Callable`s). Effectively we're pointing the explainer to the whole non-compliant inference graph (equivalently, we could also point to the non-compliant model, but there is no point to do that) but we do the conversions to and from compliant data using the new transformer components on the fly (in particular, the inverse transformer intercepts prediction requests and puts them in a format that the non-compliant inference graph can deal with).

Tagging a few people who may be interested in the discussion: @FarrandTom @cliveseldon @axsaucedo @SachinVarghese @arnaudvl .

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions