+ - 0:00:00
Notes for current slide
Notes for next slide

The classic wolf-husky classifier

Is it a husky or a wolf?

1 / 43

Let's take a look at what the model is actually looking at by using

2 / 43

Let's take a look at what the model is actually looking at by using

Interpretable Machine Learning methods

2 / 43

We are reluctant to use IML methods

because of these drawbacks




5 / 43

We are reluctant to use IML methods

because of these drawbacks




But what would happen

if we didnt keep on using them?




5 / 43

The Apple Card algorithm fiasco

6 / 43

The Apple Card algorithm fiasco

And more issues...

6 / 43

What if we explored the

explainability power of IML methods

to understand them better?

7 / 43

Transparency, auditability, and explainability of interpretable machine learning models


Janith Wanniarachchi
BSc. Statistics (Hons.)
University of Sri Jayewardenepura

Dr. Thiyanga Talagala
Supervisor
PhD, Monash University, Australia

8 / 43

How are we going to do this?

We are going to explore the land of IML methods and test out their explainability powers.

9 / 43

But wait,

If we are going to test out IML methods

Then we need a way to quantify the structure of the data

10 / 43

But wait,

If we are going to test out IML methods

Then we need a way to quantify the structure of the data

And then

We need a way to generate data by quantifying the structure of the data

10 / 43

Can you interpret the relationships?

11 / 43

How about now?

12 / 43

What if we could look at a scatterplot in a different way?

13 / 43

The answer? Scagnostics!

The late Leland Wilkinson developed graph theory based scagnostics that quantifies the features of a scatterplot into measurements ranging from 0 to 1

14 / 43

The first Scatterplot in the matrix

The scagnostics

15 / 43

How do we generate data from this?

16 / 43

How do we generate data from this?

We need to reverse this!

16 / 43

How do we generate data from this?

We need to reverse this!

Earlier given N number of (X,Y) coordinate pairs

we got a 9×1 vector of scagnostic values

16 / 43

How do we generate data from this?

We need to reverse this!

Earlier given N number of (X,Y) coordinate pairs

we got a 9×1 vector of scagnostic values

Now when we give a 9×1 vector of scagnostic values

we need to get N number of (X,Y) data points!

16 / 43

How do we generate data from this?

We need to reverse this!

Earlier given N number of (X,Y) coordinate pairs

we got a 9×1 vector of scagnostic values

Now when we give a 9×1 vector of scagnostic values

we need to get N number of (X,Y) data points!

But the equations aren't one to one functions!

16 / 43

17 / 43

Inspiration can come at the hungriest moments

The idea actually came to me while having desert

Why don't I first sprinkle a little bit of data points on a 2D plot,

making sure that they land in the right places

and add on top of those sprinkles (data points)

keep on adding more sprinkles

so that the final set of sprinkles (data points) looks good!

18 / 43

Simulated Annealing

The name of the algorithm comes from annealing in material sciences, a technique involving heating and controlled cooling of a material to alter its physical properties.

The algorithm works by setting an initial temperature value and decreasing the temperature gradually towards zero.

As the temperature is decreased the algorithm becomes greedier in selecting the optimal solution.

In each time step, the algorithm selects a solution closer to the current solution and would accept the new solution based on the quality of the solution and the temperature dependent acceptance probabilities.

19 / 43

The algorithm

20 / 43

Introducing

Install and try it out for yourself from https://github.com/janithwanni/scatteR

21 / 43

Data Generating Method ✅

library(scatteR)
library(tidyverse)
df <- scatteR(measurements = c("Monotonic" = 0.9),n_points = 200,error_var = 9)
qplot(data=df,x=x,y=y)

scatteR(c("Convex" = 0.9),n_points = 250,verbose=FALSE) %>% # data generation
mutate(label = ifelse(y > x,"Upper","Lower")) %>% # data preprocessing
ggplot(aes(x = x,y = y,color=label))+
geom_point()+
theme_minimal()+
theme(legend.position = "bottom")

22 / 43

IML methods come in different flavors

Global interpretability methods

Gives an overall bird's eye view of the entire model

  • Partial Dependency Plots
  • Inidividual Conditional Expectation plot
  • Accumulated Local Effects plot

Local interpretability methods

Explains the reasoning behind a single instance

  • LIME
  • SHAP

23 / 43

Data Generating Method ✅

Background idea on IML Methods ✅




Let's dive in!

24 / 43

The experimental design

25 / 43

PDP Curves

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model. A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex.

26 / 43

Let the two middle men talk

The surrogate model based on the black box model says:

"This is the function I think the model is approximating"

The surrogate model based on the pdp model says:

"This is the function I think the PDP curve is explaining"

We build a PDP curve on top of the model to get two curves for the two classes

Pr(Y^=1), Pr(Y^=0)

We can build a surrogate logistic regression model that gives

ln(Pr(Y^=1)Pr(Y^=0))=βX

What if we divided the two PDP curves

and took the logarithm of it

and performed linear regression?

After all logistic regression is simply linear regression for log odds ratio right?

27 / 43

Automatic IML method evaluation using Surrogate models and PDP curves

28 / 43

Now onto local interpretability methods

29 / 43

LIME

A good explanation should satisfy two criteria

  • Interpretability
  • Local fidelity

SHAP

SHAP is based on game theoretic Shapley values. The Shapley value is the average marginal contribution of a feature value across all possible combinations of features. The idea is to distribute the difference of the prediction and the average prediction among the features.

30 / 43

Good Old MNIST

31 / 43

Random Forest vs Convolutional Neural Network

Which to choose?

Both models performed really well yet one incorporated spatial information while the other did not

32 / 43

SHAP explanations from both parties

Random Forest

Convolutional Neural Network

33 / 43

LIME explanations from both parties

Random Forest

Convolutional Neural Network

34 / 43

But what about the real world?

  • We selected a dataset of 100 X-ray images of body parts with 5 classes.

  • The images were resized to fit the memory constraints

  • Based on our computing budget a pretrained Mobilenet-v3 model was fine tuned on the X-ray images

  • The model and a sample of images was used to calculate LIME and SHAP values

  • The results gave us a lot of insights to further extend our research

35 / 43

We made it back home!

36 / 43

We made it back home!

What did we learn from our journey together?

36 / 43

LIME limitations

  • Perturbation is not good when there is a model misspecification
  • If there are correlated features then the explanations might not be sensible
  • Needs GPU support

SHAP limitations

  • Doesn't consider groupings of features in the way LIME does
  • Exact values are hard to calculate and approximations require a lot of computing power
  • Needs GPU support

Overlall conclusions

  • IML methods should not be used simply because the model performed well.
  • Pay attention to the structure of your data and the specification of your model that is where everything lies.
  • Be careful around IML methods that use perturbation underneath and inspect some of those perturbations before you send them out.
  • Look at the overall interpretation given from your IML method per class and see whether they match up.
  • Also take a look at how certain the model is of your predictions, if the model is not sure then the IML method won’t be of any use.
  • It’s still a growing field and with good advancements in the right directions, perhaps we can make ML models not so alien to us in the future!
37 / 43


Thank you for listening!


A huge thank you to my supervisor

Dr. Thiyanga Talagala

Check out her work on Github @thiyangt


38 / 43

Have any follow up questions?

Email: janithcwanni@gmail.com

Twitter: @janithcwanni

Github: @janithwanni

Linkedin: Janith Wanniarachchi


Try scatteR at https://github.com/janithwanni/scatteR

39 / 43

Glad you asked

40 / 43

How does scagnostics work?

41 / 43

42 / 43

Acknowledgements

The following creators are accredited to the images and content that are referenced in these slides.

43 / 43

Let's take a look at what the model is actually looking at by using

2 / 43
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow