# CIU-Py
*Explainable Machine Learning through Contextual Importance and Utility*
**NOTE: This python implementation is currently a work in progress. As such some of the functionality present in the original R version is not quite yet available.**
The *CIU-Python* library provides methods to generate post-hoc explanations for
machine learning-based classifiers.
# What is CIU?
**Remark**: It seems like Github Markdown doesn’t show correctly the “{”
and “}” characters in Latex equations, whereas they are shown correctly
in Rstudio. Therefore, in most cases where there is an $i$ shown in
Github, it actually signifies `{i}` and where there is an $I$ it
signifies `{I}`.
CIU is a model-agnostic method for producing outcome explanations of
results of any “black-box” model `y=f(x)`. CIU directly estimates two
elements of explanation by observing the behaviour of the black-box
model (without creating any “surrogate” model `g` of `f(x)`).
**Contextual Importance (CI)** answers the question: *how much can the
result (or the utility of it) change as a function of feature* $i$ or a
set of features $\{i\}$ jointly, in the context $x$?
**Contextual Utility (CU)** answers the question: *how favorable is the
value of feature* $i$ (or a set of features $\{i\}$ jointly) for a good
(high-utility) result, in the context $x$?
CI of one feature or a set of features (jointly) $\{i\}$ compared to a
superset of features $\{I\}$ is defined as
$$
\omega_{j,\{i\},\{I\}}(x)=\frac{umax_{j}(x,\{i\})-umin_{j}(x,\{i\})}{umax_{j}(x,\{I\})-umin_{j}(x,\{I\})},
$$
where $\{i\} \subseteq \{I\}$ and $\{I\} \subseteq \{1,\dots,n\}$. $x$
is the instance/context to be explained and defines the values of input
features that do not belong to $\{i\}$ or $\{I\}$. In practice, CI is
calculated as:
$$
\omega_{j,\{i\},\{I\}}(x)= \frac{ymax_{j,\{i\}}(x)-ymin_{j,\{i\}}(x)}{ ymax_{j,\{I\}}(x)-ymin_{j,\{I\}}(x)},
$$
where $ymin_{j}()$ and $ymax_{j}()$ are the minimal and maximal $y_{j}$
values observed for output $j$.
CU is defined as
$$
CU_{j,\{i\}}(x)=\frac{u_{j}(x)-umin_{j,\{i\}}(x)}{umax_{j,\{i\}}(x)-umin_{j,\{i\}}(x)}.
$$
When $u_{j}(y_{j})=Ay_{j}+b$, this can be written as:
$$
CU_{j,\{i\}}(x)=\left|\frac{ y_{j}(x)-yumin_{j,\{i\}}(x)}{ymax_{j,\{i\}}(x)-ymin_{j,\{i\}}(x)}\right|,
$$
where $yumin=ymin$ if $A$ is positive and $yumin=ymax$ if $A$ is
negative.
## Usage
First, install the required dependencies. NOTE: this is to be run in your environment's terminal; some environments such as Google Colab might require an exclamation mark before the command, such as `!pip install`.
```
pip install CIU_Py
```
Import the library:
```python
from ciu import determine_ciu
```
Now, we can call the ``determine_ciu`` function which takes the following parameters:
* ``case``: A dictionary that contains the data of the case.
* ``predictor``: The prediction function of the black-box model *py-ciu* should
call.
* ```dataset```: Dataset to deduct min_maxs from (dictionary).
Defaults to ``None``.
* ``min_maxs`` (optional): dictionary (``'feature_name': [min, max, is_int]`` for each feature),
or infered from dataset. Defaults to ``None``
*
* ``samples`` (optional): The number of samples *py-ciu* will generate. Defaults
to ``1000``.
* ``prediction_index`` (optional): In case the model returns several
predictions, it is possible to provide the
index of the relevant prediction. Defaults to
``None``.
* ``category_mapping`` (optional): A mapping of one-hot encoded categorical
variables to lists of categories and category
names. Defaults to ``None``.
* ``feature_interactions`` (optional): A list of ``{key: list}`` tuples of
features whose interactions should be
evaluated. Defaults to ``[]``.
Here we can use a simple example with the well known Iris flower dataset
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
iris=datasets.load_iris()
df = pd.DataFrame(data = np.c_[iris['data'], iris['target']],
columns = iris['feature_names'] + ['target'])
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
df.columns = ['s_length', 's_width', 'p_length', 'p_width', 'target', 'species']
X = df[['s_length', 's_width', 'p_length', 'p_width']]
y = df['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)
```
Then create and train a model, in this case an `LDA` model
```python
model = LinearDiscriminantAnalysis()
model.fit(X_train, y_train)
```
Now simply use our Iris flower data and the model, following the parameter descriptions above
```python
iris_df = df.apply(pd.to_numeric, errors='ignore')
iris_ciu = determine_ciu(
X_test.iloc[[42]],
model.predict_proba,
iris_df.to_dict('list'),
samples = 1000,
prediction_index = 2
)
```
## Example Output
Let's import a test from the ciu_tests file
```python
from ciu_tests.ciu_tests import get_boston_gbm_test
```
The ```get_boston_gbm_test``` function returns a CIU Object that we can simply store and use as such
```python
boston_ciu = get_boston_gbm_test()
boston_ciu.explain_tabular()
```
Now we can also plot the CI/CU values using the CIU Object's ``plot_ciu`` function
```python
boston_ciu.plot_ciu()
```

Likewise there are also several options available using the following parameters:
* ``plot_mode``: defines the type plot to use between 'default', 'overlap' and 'combined'.
* ``include``: defines whether to include interactions or not.
* ``sort``: defines the order of the plot bars by the 'ci' (default), 'cu' values or unsorted if None.
* ``color_blind``: defines accessible color maps to use for the plots, such as 'protanopia',
'deuteranopia' and 'tritanopia'.
* ``color_edge_cu``: defines the hex or named color for the CU edge in the overlap plot mode.
* ``color_fill_cu``: defines the hex or named color for the CU fill in the overlap plot mode.
* ``color_edge_ci``: defines the hex or named color for the CI edge in the overlap plot mode.
* ``color_fill_ci``: defines the hex or named color for the CI fill in the overlap plot mode.
Here's quick example using some of these parameters to create a modified version of the above plot
```python
boston_ciu.plot_ciu(plot_mode="combined", color_blind='tritanopia', sort='cu')
```

## Authors
* [Vlad Apopei](https://github.com/vladapopei/)
* [Kary Främling](https://github.com/KaryFramling)
This implementation replaces an older one, available at [https://github.com/TimKam/py-ciu](https://github.com/TimKam/py-ciu)