Transparency, auditability, and explainability of interpretable machine learning models

# The classic wolf-husky classifier

.center[<img src="imgs/husky_wolf_sbs.jpg" style="width: 60%">]
.center[Is it a husky or a wolf? ]
.center[<img src="https://c.tenor.com/5v0B7L4LZzMAAAAM/confused-huskies.gif">]

---
class: center, middle

### Let's take a look at what the model is actually looking at by using

## **I**nterpretable **M**achine **L**earning methods

---
class: center, middle

# It's actually a snow detector 🤯

.footnote[ Source: [Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant Professor of Computer Science, UC Irvine](https://papers.nips.cc/paper/2019/hash/bb836c01cdc9120a9c984c525e4b1a4a-Abstract.html)]

---
class: center,middle

# But who will watch over the watchers?

.footnote[ Source: [_Explanations can be manipulated and geometry is to blame_ A.K. Dombrowski NeurIPS 2019](https://papers.nips.cc/paper/2019/hash/bb836c01cdc9120a9c984c525e4b1a4a-Abstract.html)]

---
class: middle, center

### We are reluctant to use IML methods 
### because of these drawbacks
 
<hr style="width:50%">

### But what would happen 
### if we didnt keep on using them?
 
<hr style="width:40%">
 
---
### The Apple Card algorithm fiasco

### And more issues...

.pull-left[
.center[
<img src="imgs/COMPAS_bias_updated.png" style="width:90%">
]
].pull-right[
.center[
<img src="imgs/face_detect_bias.png" style="width:100%">
]
]
---
class: center, middle

### What if we explored the 
### explainability power of IML methods 
### to understand them better?

---
class: center, middle, first-slide, inverse

Transparency, auditability, and explainability of interpretable machine learning models

---
class: center, middle

# How are we going to do this?

We are going to explore the land of IML methods and test out their explainability powers.

---
class: center,middle

.center[
## But wait,
### If we are going to test out IML methods 
### Then we need a way to quantify the structure of the data
]

---
## Can you interpret the relationships?
.center[
<img src="imgs/scag_splom_2var.png" style="width:20%">
]
---
## How about now?
.center[
<img src="imgs/scag_splom_5var.png" style="width:40%">
]
---
### What if we could look at a scatterplot in a different way?

The late Leland Wilkinson developed graph theory based scagnostics that quantifies the features of a scatterplot into measurements ranging from 0 to 1

]

### The first Scatterplot in the matrix

![](index_files/figure-html/unnamed-chunk-1-1.png)
]
.pull-right[

### The scagnostics

![](index_files/figure-html/unnamed-chunk-2-1.png)

]

---
class: center, middle

## How do we generate data from this?

### We need to reverse this!

Earlier given `\(N\)` number of `\((X,Y)\)` coordinate pairs

we got a `\(9 \times 1\)` vector of scagnostic values

Now when we give a `\(9\times 1\)` vector of scagnostic values

we need to get `\(N\)` number of `\((X,Y)\)` data points!

## But the equations aren't one to one functions!

---
class: center, middle, first-slide
<img src="imgs/ice_cream_sprinkles.jpg" style="position:absolute;top:0px;left:0px;object-fit:fill;scale: 140%;object-position: 0px 90px;">

---
# Inspiration can come at the hungriest moments

The idea actually came to me while having desert

>Why don't I first sprinkle a little bit of data points on a 2D plot,

>making sure that they land in the right places

> and add on top of those sprinkles (data points)

> keep on adding more sprinkles

> so that the final set of sprinkles (data points) looks good!

---

# Simulated Annealing

The name of the algorithm comes from annealing in material sciences, a technique involving heating and controlled cooling of a material to alter its physical properties.
]

The algorithm works by setting an initial temperature value and decreasing the temperature gradually towards zero.

As the temperature is decreased the algorithm becomes greedier in selecting the optimal solution.

In each time step, the algorithm selects a solution closer to the current solution and would accept the new solution based on the quality of the solution and the temperature dependent acceptance probabilities.

<img src="https://upload.wikimedia.org/wikipedia/commons/d/d5/Hill_Climbing_with_Simulated_Annealing.gif" />
]

---

# The algorithm

---

# Introducing

---

## Data Generating Method ✅

```r
library(scatteR)
library(tidyverse)

df <- scatteR(measurements = c("Monotonic" = 0.9),n_points = 200,error_var = 9)
qplot(data=df,x=x,y=y)
```
]

<img src="imgs/scatter_example_plot_1.png" />
]

```r
scatteR(c("Convex" = 0.9),n_points = 250,verbose=FALSE) %>% # data generation
  mutate(label = ifelse(y > x,"Upper","Lower")) %>% # data preprocessing
  ggplot(aes(x = x,y = y,color=label))+
  geom_point()+
  theme_minimal()+
  theme(legend.position = "bottom")
```

]

]

---

# IML methods come in different flavors

Gives an overall bird's eye view of the entire model

* Partial Dependency Plots
* Inidividual Conditional Expectation plot
* Accumulated Local Effects plot

]

Explains the reasoning behind a single instance

* LIME
* SHAP

]

### Data Generating Method ✅
### Background idea on IML Methods ✅
<hr style="width:50%">
 
# Let's dive in!

---
class: middle

### The experimental design

---
# PDP Curves

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model. A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex.

---

# Let the two middle men talk

The surrogate model based on the black box model says:

*"This is the function I think the model is approximating"*

The surrogate model based on the pdp model says:

*"This is the function I think the PDP curve is explaining"*
]

`\(\text{Pr}(\hat{Y} = 1)\)`, `\(\text{Pr}(\hat{Y} = 0)\)`

We can build a surrogate logistic regression model that gives

`\(ln\Big(\frac{Pr(\hat{Y} = 1)}{Pr(\hat{Y} = 0)}\Big) = \mathbf{\beta}X\)`

What if we divided the two PDP curves

and took the logarithm of it

and performed linear regression?

After all logistic regression is simply linear regression for log odds ratio right?
]]

---

# Automatic IML method evaluation using Surrogate models and PDP curves

---

## Now onto local interpretability methods
]

---

* Interpretability
* Local fidelity

<img src="imgs/lime_intro.png" />
]

SHAP is based on game theoretic Shapley values. The Shapley value is the average marginal contribution of a feature value across all possible combinations of features. The idea is to distribute the difference of the prediction and the average prediction among the features.

<img src="imgs/shap_intro.png" />
]

---

# Good Old MNIST

---

## Random Forest vs Convolutional Neural Network
### Which to choose?

Both models performed really well yet one incorporated spatial information while the other did not

---

### SHAP explanations from both parties

Random Forest
<img src="imgs/mnist_shap_rf.png">

Convolutional Neural Network
<img src="imgs/mnist_shap_cnn.png">
]

---

### LIME explanations from both parties

Random Forest
<img src="imgs/mnist_lime_rf.png">

Convolutional Neural Network
<img src="imgs/mnist_lime_cnn.png">
]

---

# But what about the real world?

* We selected a dataset of 100 X-ray images of body parts with 5 classes.

* The images were resized to fit the memory constraints

* Based on our computing budget a pretrained Mobilenet-v3 model was fine tuned on the X-ray images

* The model and a sample of images was used to calculate LIME and SHAP values

* The results gave us a lot of insights to further extend our research

---
class: center, middle

# We made it back home!

## What did we learn from our journey together?

---

.pull-left[
### LIME limitations
* Perturbation is not good when there is a model misspecification
* If there are correlated features then the explanations might not be sensible
* Needs GPU support
]

.pull-right[
### SHAP limitations
* Doesn't consider groupings of features in the way LIME does
* Exact values are hard to calculate and approximations require a lot of computing power
* Needs GPU support
]

### Overlall conclusions

* IML methods should not be used simply because the model performed well. 
* Pay attention to the structure of your data and the specification of your model that is where everything lies. 
* Be careful around IML methods that use perturbation underneath and inspect some of those perturbations before you send them out.
* Look at the overall interpretation given from your IML method per class and see whether they match up. 
* Also take a look at how certain the model is of your predictions, if the model is not sure then the IML method won’t be of any use.
* It’s still a growing field and with good advancements in the right directions, perhaps we can make ML models not so alien to us in the future!

---

# Thank you for listening!

### A huge thank you to my supervisor

Dr. Thiyanga Talagala

Check out her work on Github [@thiyangt](https://github.com/thiyangt/)

---
class: center, middle

<h2 style="color:#5b85aa">Have any follow up questions?</h2>

Email: [janithcwanni@gmail.com](#)

Twitter: [@janithcwanni](https://twitter.com/janithcwanni)

Github: [@janithwanni](https://github.com/janithwanni)

Linkedin: [Janith Wanniarachchi](https://www.linkedin.com/in/janith-wanniarachchi-462851117/)

Try scatteR at [https://github.com/janithwanni/scatteR](https://github.com/janithwanni/scatteR)

---
class: center, middle

# Glad you asked

---

# How does scagnostics work?

---

---
# Acknowledgements

The following creators are accredited to the images and content that are referenced in these slides.

* Image of Ice cream sprinkles (Slide #17): Photo by <a href="https://unsplash.com/@calavera?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">David Calavera</a> on <a href="https://unsplash.com/@calavera?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
* Image of Annealing (Slide #19): [http://www.turingfinance.com/simulated-annealing-for-portfolio-optimization/](http://www.turingfinance.com/simulated-annealing-for-portfolio-optimization/)
* Global and Local interpretations (Slide #23): [https://www.kdnuggets.com/2018/06/human-interpretable-machine-learning-need-importance-model-interpretation.html](https://www.kdnuggets.com/2018/06/human-interpretable-machine-learning-need-importance-model-interpretation.html)
* PDP curves (Slide #26): [https://christophm.github.io/interpretable-ml-book/pdp.html](https://christophm.github.io/interpretable-ml-book/pdp.html)
* SHAP diagram (Slide #30):
[https://github.com/slundberg/shap](https://github.com/slundberg/shap)