Title: | Interpretable Machine Learning |
---|---|
Description: | Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models. |
Authors: | Giuseppe Casalicchio [aut, cre], Christoph Molnar [aut], Patrick Schratz [aut] |
Maintainer: | Giuseppe Casalicchio <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.11.3 |
Built: | 2024-11-17 06:14:53 UTC |
Source: | https://github.com/giuseppec/iml |
The iml package provides tools to analyze machine learning models and predictions.
Maintainer: Giuseppe Casalicchio [email protected]
Authors:
Christoph Molnar [email protected]
Patrick Schratz [email protected] (ORCID)
Book on Interpretable Machine Learning
Extract glmnet effects
extract.glmnet.effects(betas, best.index, x.recoded, x.original)
extract.glmnet.effects(betas, best.index, x.recoded, x.original)
betas |
glmnet$beta |
best.index |
index k |
x.recoded |
the recoded version of x |
x.original |
the original x Assuming that the first row is the x.interest |
FeatureEffect
computes and plots (individual) feature effects
of prediction models.
The FeatureEffect class compute the effect a feature has on the prediction. Different methods are implemented:
Accumulated Local Effect (ALE) plots
Partial Dependence Plots (PDPs)
Individual Conditional Expectation (ICE) curves
Accumulated local effects and partial dependence plots both show the average model prediction over the feature. The difference is that ALE are computed as accumulated differences over the conditional distribution and partial dependence plots over the marginal distribution. ALE plots preferable to PDPs, because they are faster and unbiased when features are correlated.
ALE plots for categorical features are automatically ordered by the similarity of the categories based on the distribution of the other features for instances in a category. When the feature is an ordered factor, the ALE plot leaves the order as is.
Individual conditional expectation curves describe how, for a single observation, the prediction changes when the feature changes and can be combined with partial dependence plots.
To learn more about accumulated local effects, read the Interpretable Machine Learning book.
For the partial dependence plots: https://christophm.github.io/interpretable-ml-book/pdp.html
For individual conditional expectation: https://christophm.github.io/interpretable-ml-book/ice.html
iml::InterpretationMethod
-> FeatureEffect
grid.size
(numeric(1)
| numeric(2)
)
The size of the grid.
feature.name
(character(1)
| character(2)
)
The names of the features for which the partial dependence was computed.
n.features
(numeric(1)
)
The number of features (either 1 or 2).
feature.type
(character(1)
| character(2)
)
The detected types of the features, either "categorical" or "numerical".
method
(character(1)
)
center.at
numeric
Value at which the plot was centered. Ignored in the case of two
features.
new()
Create a FeatureEffect object
FeatureEffect$new( predictor, feature, method = "ale", center.at = NULL, grid.size = 20, grid.points = NULL )
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
feature
(character(1)
| character(2)
| numeric(1)
|
numeric(2)
)
The feature name or index for which to compute the effects.
method
(character(1)
)
'ale' for accumulated local effects,
'pdp' for partial dependence plot,
'ice' for individual conditional expectation curves,
'pdp + ice' for partial dependence plot and ice curves within the same plot.
center.at
(numeric(1)
)
Value at which the plot should be centered. Ignored in the case of two
features.
grid.size
(numeric(1)
| numeric(2)
)
The size of the grid for evaluating the predictions.
grid.points
(numeric()
| list(numeric()
, numeric()
)
An optional grid along the feature. If grid.points are set, the grid.size
argument is ignored. Provide a list of two vectors with the same order as
in the 'feature' argument, if PDP/ALE for two features is to be computed
with a user-defined grid.
set.feature()
Get/set feature(s) (by index) for which to compute PDP.
FeatureEffect$set.feature(feature)
feature
(character(1)
)
Feature name.
center()
Set the value at which the ice computations are centered.
FeatureEffect$center(center.at)
center.at
(numeric(1)
)
Value at which the plot should be centered. Ignored in the case of two
features.
predict()
Predict the marginal outcome given a feature.
FeatureEffect$predict(data, extrapolate = FALSE)
data
data.frame
Data.frame with the feature or a vector.
extrapolate
(character(1)
)
If TRUE, predict returns NA for x values outside of observed range.
If FALSE, predcit returns the closest PDP value for x values outside the range.
Ignored for categorical features
Values of the effect curves at the given values.
clone()
The objects of this class are cloneable with this method.
FeatureEffect$clone(deep = FALSE)
deep
Whether to make a deep clone.
Apley, D. W. 2016. "Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models." ArXiv Preprint.
Friedman, J.H. 2001. "Greedy Function Approximation: A Gradient Boosting Machine." Annals of Statistics 29: 1189-1232.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2013). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation, 1-22. https://doi.org/10.1080/10618600.2014.907095
# We train a random forest on the Boston dataset: data("Boston", package = "MASS") library("rpart") rf <- rpart(medv ~ ., data = Boston) mod <- Predictor$new(rf, data = Boston) # Compute the accumulated local effects for the first feature eff <- FeatureEffect$new(mod, feature = "rm", grid.size = 30) eff$plot() # Again, but this time with a partial dependence plot and ice curves eff <- FeatureEffect$new(mod, feature = "rm", method = "pdp+ice", grid.size = 30 ) plot(eff) # Since the result is a ggplot object, you can extend it: library("ggplot2") plot(eff) + # Adds a title ggtitle("Partial dependence") + # Adds original predictions geom_point( data = Boston, aes(y = mod$predict(Boston)[[1]], x = rm), color = "pink", size = 0.5 ) # If you want to do your own thing, just extract the data: eff.dat <- eff$results head(eff.dat) # You can also use the object to "predict" the marginal values. eff$predict(Boston[1:3, ]) # Instead of the entire data.frame, you can also use feature values: eff$predict(c(5, 6, 7)) # You can reuse the pdp object for other features: eff$set.feature("lstat") plot(eff) # Only plotting the aggregated partial dependence: eff <- FeatureEffect$new(mod, feature = "crim", method = "pdp") eff$plot() # Only plotting the individual conditional expectation: eff <- FeatureEffect$new(mod, feature = "crim", method = "ice") eff$plot() # Accumulated local effects and partial dependence plots support up to two # features: eff <- FeatureEffect$new(mod, feature = c("crim", "lstat")) plot(eff) # FeatureEffect plots also works with multiclass classification rf <- rpart(Species ~ ., data = iris) mod <- Predictor$new(rf, data = iris, type = "prob") # For some models we have to specify additional arguments for the predict # function plot(FeatureEffect$new(mod, feature = "Petal.Width")) # FeatureEffect plots support up to two features: eff <- FeatureEffect$new(mod, feature = c("Sepal.Length", "Petal.Length")) eff$plot() # show where the actual data lies eff$plot(show.data = TRUE) # For multiclass classification models, you can choose to only show one class: mod <- Predictor$new(rf, data = iris, type = "prob", class = "setosa") plot(FeatureEffect$new(mod, feature = "Sepal.Length"))
# We train a random forest on the Boston dataset: data("Boston", package = "MASS") library("rpart") rf <- rpart(medv ~ ., data = Boston) mod <- Predictor$new(rf, data = Boston) # Compute the accumulated local effects for the first feature eff <- FeatureEffect$new(mod, feature = "rm", grid.size = 30) eff$plot() # Again, but this time with a partial dependence plot and ice curves eff <- FeatureEffect$new(mod, feature = "rm", method = "pdp+ice", grid.size = 30 ) plot(eff) # Since the result is a ggplot object, you can extend it: library("ggplot2") plot(eff) + # Adds a title ggtitle("Partial dependence") + # Adds original predictions geom_point( data = Boston, aes(y = mod$predict(Boston)[[1]], x = rm), color = "pink", size = 0.5 ) # If you want to do your own thing, just extract the data: eff.dat <- eff$results head(eff.dat) # You can also use the object to "predict" the marginal values. eff$predict(Boston[1:3, ]) # Instead of the entire data.frame, you can also use feature values: eff$predict(c(5, 6, 7)) # You can reuse the pdp object for other features: eff$set.feature("lstat") plot(eff) # Only plotting the aggregated partial dependence: eff <- FeatureEffect$new(mod, feature = "crim", method = "pdp") eff$plot() # Only plotting the individual conditional expectation: eff <- FeatureEffect$new(mod, feature = "crim", method = "ice") eff$plot() # Accumulated local effects and partial dependence plots support up to two # features: eff <- FeatureEffect$new(mod, feature = c("crim", "lstat")) plot(eff) # FeatureEffect plots also works with multiclass classification rf <- rpart(Species ~ ., data = iris) mod <- Predictor$new(rf, data = iris, type = "prob") # For some models we have to specify additional arguments for the predict # function plot(FeatureEffect$new(mod, feature = "Petal.Width")) # FeatureEffect plots support up to two features: eff <- FeatureEffect$new(mod, feature = c("Sepal.Length", "Petal.Length")) eff$plot() # show where the actual data lies eff$plot(show.data = TRUE) # For multiclass classification models, you can choose to only show one class: mod <- Predictor$new(rf, data = iris, type = "prob", class = "setosa") plot(FeatureEffect$new(mod, feature = "Sepal.Length"))
FeatureEffects
computes and plots feature effects
of multiple features at once.
FeatureEffects computes the effects for all given features on the model
prediction. FeatureEffects is a convenience class that calls FeatureEffect
multiple times. See ?FeatureEffect
for details what's actually computed.
Only first-order effects can be computed with the FeatureEffects interface. If you are interested in the visualization of interactions between two features, directly use FeatureEffect.
Parallelization is supported via package future. To initialize future-based parallelization, select an appropriate backend and specify the amount of workers. For example, to use a PSOCK based cluster backend do:
future::plan(multisession, workers = 2) <iml function here>
Consult the resources of the future package for more parallel backend options.
iml::InterpretationMethod
-> FeatureEffects
grid.size
(numeric(1)
| numeric(2)
)
The size of the grid.
method
(character(1)
)
"ale" for accumulated local effects,
"pdp" for partial dependence plot,
"ice" for individual conditional expectation curves,
"pdp+ ice" for partial dependence plot and ice curves within the same plot.
effects
(list)
Named list of FeatureEffects.
features
(character()
)
The names of the features for which the effects were computed.
center.at
numeric
Value at which the plot was centered. Ignored in the case of two
features.
new()
Create a FeatureEffects object
FeatureEffects$new( predictor, features = NULL, method = "ale", center.at = NULL, grid.size = 20 )
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
features
(character()
)
The names of the features for which to compute the feature effects.
method
(character(1)
)
'ale' for accumulated local effects,
'pdp' for partial dependence plot,
'ice' for individual conditional expectation curves,
'pdp+ice' for partial dependence plot and ice curves within the same plot.
center.at
(numeric(1)
)
Value at which the plot should be centered. Ignored in the case of two
features.
grid.size
(numeric(1)
| numeric(2)
)
The size of the grid for evaluating the predictions.
feature
(character(1)
| character(2)
| numeric(1)
|
numeric(2)
)
The feature name or index for which to compute the effects.
clone()
The objects of this class are cloneable with this method.
FeatureEffects$clone(deep = FALSE)
deep
Whether to make a deep clone.
Apley, D. W. 2016. "Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models." ArXiv Preprint.
Friedman, J.H. 2001. "Greedy Function Approximation: A Gradient Boosting Machine." Annals of Statistics 29: 1189-1232.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2013). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation, 1-22. https://doi.org/10.1080/10618600.2014.907095
# We train a random forest on the Boston dataset: library("rpart") data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) mod <- Predictor$new(rf, data = Boston) # Compute the accumulated local effects for all features eff <- FeatureEffects$new(mod) eff$plot() ## Not run: # Again, but this time with a partial dependence plot eff <- FeatureEffects$new(mod, method = "pdp") eff$plot() # Only a subset of features eff <- FeatureEffects$new(mod, features = c("nox", "crim")) eff$plot() # You can access each FeatureEffect individually eff.nox <- eff$effects[["nox"]] eff.nox$plot() # FeatureEffects also works with multiclass classification rf <- rpart(Species ~ ., data = iris) mod <- Predictor$new(rf, data = iris, type = "prob") FeatureEffects$new(mod)$plot(ncol = 2) ## End(Not run)
# We train a random forest on the Boston dataset: library("rpart") data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) mod <- Predictor$new(rf, data = Boston) # Compute the accumulated local effects for all features eff <- FeatureEffects$new(mod) eff$plot() ## Not run: # Again, but this time with a partial dependence plot eff <- FeatureEffects$new(mod, method = "pdp") eff$plot() # Only a subset of features eff <- FeatureEffects$new(mod, features = c("nox", "crim")) eff$plot() # You can access each FeatureEffect individually eff.nox <- eff$effects[["nox"]] eff.nox$plot() # FeatureEffects also works with multiclass classification rf <- rpart(Species ~ ., data = iris) mod <- Predictor$new(rf, data = iris, type = "prob") FeatureEffects$new(mod)$plot(ncol = 2) ## End(Not run)
FeatureImp
computes feature importance for prediction models. The
importance is measured as the factor by which the model's prediction error
increases when the feature is shuffled.
To compute the feature importance for a single feature, the model prediction loss (error) is measured before and after shuffling the values of the feature. By shuffling the feature values, the association between the outcome and the feature is destroyed. The larger the increase in prediction error, the more important the feature was. The shuffling is repeated to get more accurate results, since the permutation feature importance tends to be quite unstable. Read the Interpretable Machine Learning book to learn about feature importance in detail: https://christophm.github.io/interpretable-ml-book/feature-importance.html
The loss function can be either specified via a string, or by handing a
function to FeatureImp()
. If you want to use your own loss function it
should have this signature:
function(actual, predicted)
Using the string is
a shortcut to using loss functions from the Metrics
package. Only use
functions that return a single performance value, not a vector. Allowed
losses are: "ce"
, "f1"
, "logLoss"
, "mae"
, "mse"
, "rmse"
, "mape"
,
"mdae"
, "msle"
, "percent_bias"
, "rae"
, "rmse"
, "rmsle"
, "rse"
,
"rrse"
and "smape"
.
See library(help = "Metrics")
to get a list of functions.
Parallelization is supported via package future. To initialize future-based parallelization, select an appropriate backend and specify the amount of workers. For example, to use a PSOCK based cluster backend do:
future::plan(multisession, workers = 2) <iml function here>
Consult the resources of the future package for more parallel backend options.
iml::InterpretationMethod
-> FeatureImp
loss
(character(1)
| function)
The loss function. Either the name of a loss (e.g. "ce"
for
classification or "mse"
) or a function.
original.error
(numeric(1)
)
The loss of the model before perturbing features.
n.repetitions
integer
Number of repetitions.
compare
(character(1)
)
Either "ratio"
or "difference"
,
depending on whether the importance was calculated as difference
between original model error and model error after permutation or as
ratio.
features
(list
)
Features for which importance scores are to
be calculated. The names are the feature/group names, while the contents
specify which feature(s) are to be permuted.
new()
Create a FeatureImp object
FeatureImp$new( predictor, loss, compare = "ratio", n.repetitions = 5, features = NULL )
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
loss
(character(1)
| function)
The loss function. Either the name of a loss (e.g. "ce"
for
classification or "mse"
) or a function. See Details for allowed
losses.
compare
(character(1)
)
Either "ratio"
or "difference"
.
Should importance be measured as the difference or as the ratio of
original model error and model error after permutation?
Ratio: error.permutation/error.orig
Difference: error.permutation - error.orig
n.repetitions
(numeric(1)
)
How often should the shuffling of the feature be repeated?
The higher the number of repetitions the more stable and accurate the
results become.
features
(character or list
)
For which features do you want importance scores calculated. The default
value of NULL
implies all features. Use a named list of character vectors
to define groups of features for which joint importance will be calculated.
See examples.
(data.frame)
data.frame with the results of the feature importance computation. One
row per feature with the following columns:
importance.05 (5% quantile of importance values from the repetitions)
importance (median importance)
importance.95 (95% quantile) and the permutation.error (median error over all repetitions).
The distribution of the importance is also visualized as a bar in the plots, the median importance over the repetitions as a point.
clone()
The objects of this class are cloneable with this method.
FeatureImp$clone(deep = FALSE)
deep
Whether to make a deep clone.
Fisher, A., Rudin, C., and Dominici, F. (2018). Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the "Rashomon" Perspective. Retrieved from http://arxiv.org/abs/1801.01489
library("rpart") # We train a tree on the Boston dataset: data("Boston", package = "MASS") tree <- rpart(medv ~ ., data = Boston) y <- Boston$medv X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(tree, data = X, y = y) # Compute feature importances as the performance drop in mean absolute error imp <- FeatureImp$new(mod, loss = "mae") # Plot the results directly plot(imp) # Since the result is a ggplot object, you can extend it: library("ggplot2") plot(imp) + theme_bw() # If you want to do your own thing, just extract the data: imp.dat <- imp$results head(imp.dat) ggplot(imp.dat, aes(x = feature, y = importance)) + geom_point() + theme_bw() # We can also look at the difference in model error instead of the ratio imp <- FeatureImp$new(mod, loss = "mae", compare = "difference") # Plot the results directly plot(imp) # We can calculate feature importance for a subset of features imp <- FeatureImp$new(mod, loss = "mae", features = c("crim", "zn", "indus")) plot(imp) # We can calculate joint importance of groups of features groups = list( grp1 = c("crim", "zn", "indus", "chas"), grp2 = c("nox", "rm", "age", "dis"), grp3 = c("rad", "tax", "ptratio", "black", "lstat") ) imp <- FeatureImp$new(mod, loss = "mae", features = groups) plot(imp) # FeatureImp also works with multiclass classification. # In this case, the importance measurement regards all classes tree <- rpart(Species ~ ., data = iris) X <- iris[-which(names(iris) == "Species")] y <- iris$Species mod <- Predictor$new(tree, data = X, y = y, type = "prob") # For some models we have to specify additional arguments for the predict function imp <- FeatureImp$new(mod, loss = "ce") plot(imp) # For multiclass classification models, you can choose to only compute # performance for one class. # Make sure to adapt y mod <- Predictor$new(tree, data = X, y = y == "virginica", type = "prob", class = "virginica" ) imp <- FeatureImp$new(mod, loss = "ce") plot(imp)
library("rpart") # We train a tree on the Boston dataset: data("Boston", package = "MASS") tree <- rpart(medv ~ ., data = Boston) y <- Boston$medv X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(tree, data = X, y = y) # Compute feature importances as the performance drop in mean absolute error imp <- FeatureImp$new(mod, loss = "mae") # Plot the results directly plot(imp) # Since the result is a ggplot object, you can extend it: library("ggplot2") plot(imp) + theme_bw() # If you want to do your own thing, just extract the data: imp.dat <- imp$results head(imp.dat) ggplot(imp.dat, aes(x = feature, y = importance)) + geom_point() + theme_bw() # We can also look at the difference in model error instead of the ratio imp <- FeatureImp$new(mod, loss = "mae", compare = "difference") # Plot the results directly plot(imp) # We can calculate feature importance for a subset of features imp <- FeatureImp$new(mod, loss = "mae", features = c("crim", "zn", "indus")) plot(imp) # We can calculate joint importance of groups of features groups = list( grp1 = c("crim", "zn", "indus", "chas"), grp2 = c("nox", "rm", "age", "dis"), grp3 = c("rad", "tax", "ptratio", "black", "lstat") ) imp <- FeatureImp$new(mod, loss = "mae", features = groups) plot(imp) # FeatureImp also works with multiclass classification. # In this case, the importance measurement regards all classes tree <- rpart(Species ~ ., data = iris) X <- iris[-which(names(iris) == "Species")] y <- iris$Species mod <- Predictor$new(tree, data = X, y = y, type = "prob") # For some models we have to specify additional arguments for the predict function imp <- FeatureImp$new(mod, loss = "ce") plot(imp) # For multiclass classification models, you can choose to only compute # performance for one class. # Make sure to adapt y mod <- Predictor$new(tree, data = X, y = y == "virginica", type = "prob", class = "virginica" ) imp <- FeatureImp$new(mod, loss = "ce") plot(imp)
returns TRUE if object has predict function
has.predict(object)
has.predict(object)
object |
The object to check. |
Feature interactions
Feature interactions
Interaction
estimates the feature interactions in a prediction model.
Interactions between features are measured via the decomposition of the
prediction function: If a feature j
has no interaction with any other
feature, the prediction function can be expressed as the sum of the partial
function that depends only on j
and the partial function that only depends
on features other than j
. If the variance of the full function is
completely explained by the sum of the partial functions, there is no
interaction between feature j
and the other features. Any variance that is
not explained can be attributed to the interaction and is used as a measure
of interaction strength.
The interaction strength between two features is the proportion of the variance of the 2-dimensional partial dependence function that is not explained by the sum of the two 1-dimensional partial dependence functions.
The interaction is measured by Friedman's H-statistic (square root of the H-squared test statistic) and takes on values between 0 (no interaction) to 1 (100% of standard deviation of f(x) du to interaction).
To learn more about interaction effects, read the Interpretable Machine Learning book: https://christophm.github.io/interpretable-ml-book/interaction.html
Parallelization is supported via package future. To initialize future-based parallelization, select an appropriate backend and specify the amount of workers. For example, to use a PSOCK based cluster backend do:
future::plan(multisession, workers = 2) <iml function here>
Consult the resources of the future package for more parallel backend options.
iml::InterpretationMethod
-> Interaction
grid.size
(logical(1)
)
The number of values per feature that should be used to estimate the
interaction strength.
new()
Create an Interaction object
Interaction$new(predictor, feature = NULL, grid.size = 30)
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
feature
(character(1)
| character(2)
| numeric(1)
|
numeric(2)
)
The feature name or index for which to compute the effects.
grid.size
(numeric(1)
| numeric(2)
)
The size of the grid for evaluating the predictions.
data.frame with the interaction strength (column .interation
) per
feature calculated as Friedman's H-statistic and - in the case of a
multi-dimensional outcome - per class.
clone()
The objects of this class are cloneable with this method.
Interaction$clone(deep = FALSE)
deep
Whether to make a deep clone.
Friedman, Jerome H., and Bogdan E. Popescu. "Predictive learning via rule ensembles." The Annals of Applied Statistics 2.3 (2008): 916-954.
## Not run: library("rpart") set.seed(42) # Fit a CART on the Boston housing data set data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Measure the interaction strength ia <- Interaction$new(mod) # Plot the resulting leaf nodes plot(ia) # Extract the results dat <- ia$results head(dat) # Interaction also works with multiclass classification rf <- rpart(Species ~ ., data = iris) mod <- Predictor$new(rf, data = iris, type = "prob") # For some models we have to specify additional arguments for the # predict function ia <- Interaction$new(mod) ia$plot() # For multiclass classification models, you can choose to only show one class: mod <- Predictor$new(rf, data = iris, type = "prob", class = "virginica") plot(Interaction$new(mod)) ## End(Not run)
## Not run: library("rpart") set.seed(42) # Fit a CART on the Boston housing data set data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Measure the interaction strength ia <- Interaction$new(mod) # Plot the resulting leaf nodes plot(ia) # Extract the results dat <- ia$results head(dat) # Interaction also works with multiclass classification rf <- rpart(Species ~ ., data = iris) mod <- Predictor$new(rf, data = iris, type = "prob") # For some models we have to specify additional arguments for the # predict function ia <- Interaction$new(mod) ia$plot() # For multiclass classification models, you can choose to only show one class: mod <- Predictor$new(rf, data = iris, type = "prob", class = "virginica") plot(Interaction$new(mod)) ## End(Not run)
Superclass container for Interpretation Method objects
results
data.frame
The aggregated results of the experiment
predictor
Predictor object.
new()
Create an InterpretationMethod object
InterpretationMethod$new(predictor)
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
plot()
Plot function. Calls private$generatePlot()
of the
respective subclass.
InterpretationMethod$plot(...)
...
Passed to private$generatePlot()
.
print()
Printer for InterpretationMethod objects
InterpretationMethod$print()
clone()
The objects of this class are cloneable with this method.
InterpretationMethod$clone(deep = FALSE)
deep
Whether to make a deep clone.
LocalModel
fits locally weighted linear regression models (logistic
regression for classification) to explain single predictions of a prediction
model.
A weighted glm is fitted with the machine learning model prediction as target. Data points are weighted by their proximity to the instance to be explained, using the gower proximity measure. L1-regularization is used to make the results sparse.
The resulting model can be seen as a surrogate for the machine learning model, which is only valid for that one point. Categorical features are binarized, depending on the category of the instance to be explained: 1 if the category is the same, 0 otherwise.
Please note that scaling continuous features in the machine learning method might be advisable when using LIME as an interpretation technique. LIME uses a distance measure to compute proximity weights for the weighted glm. Hence, the original scale of the features may influence the distance measure and therewith LIME results.
To learn more about local models, read the Interpretable Machine Learning book: https://christophm.github.io/interpretable-ml-book/lime.html
The approach is similar to LIME, but has the following differences:
Distance measure: Uses as default the gower proximity (= 1 - gower
distance) instead of a kernel based on the Euclidean distance. Has the
advantage to have a meaningful neighborhood and no kernel width to tune.
When the distance is not "gower"
, then the stats::dist()
function with the
chosen method will be used, and turned into a similarity measure:
.
Sampling: Uses the original data instead of sampling from normal distributions. Has the advantage to follow the original data distribution.
Visualization: Plots effects instead of betas. Both are the same for binary features, but ared different for numerical features. For numerical features, plotting the betas makes no sense, because a negative beta might still increase the prediction when the feature value is also negative.
To learn more about local surrogate models, read the Interpretable Machine Learning book: https://christophm.github.io/interpretable-ml-book/lime.html
iml::InterpretationMethod
-> LocalModel
x.interest
data.frame
Single row with the instance to be explained.
k
numeric(1)
The number of features as set by the user.
model
glmnet
The fitted local model.
best.fit.index
numeric(1)
The index of the best glmnet fit.
new()
Create a Local Model object.
LocalModel$new( predictor, x.interest, dist.fun = "gower", gower.power = 1, kernel.width = NULL, k = 3 )
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
x.interest
data.frame
Single row with the instance to be explained.
dist.fun
character(1)
)
The name of the distance function for computing proximities (weights in
the linear model). Defaults to "gower"
. Otherwise will be forwarded
to stats::dist.
gower.power
(numeric(1)
)
The calculated gower proximity will be raised to the power of this
value. Can be used to specify the size of the neighborhood for the
LocalModel (similar to kernel.width for the euclidean distance).
kernel.width
(numeric(1)
)
The width of the kernel for the proximity computation.
Only used if dist.fun is not "gower"
.
k
numeric(1)
The number of features.
data.frame
Results with the feature names (feature
) and contributions to the
prediction.
predict()
Method to predict new data with the local model See also predict.LocalModel.
LocalModel$predict(newdata = NULL, ...)
newdata
data.frame
Data to predict on.
...
Not used
explain()
Set a new data point to explain.
LocalModel$explain(x.interest)
x.interest
data.frame
Single row with the instance to be explained.
clone()
The objects of this class are cloneable with this method.
LocalModel$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. Retrieved from http://arxiv.org/abs/1602.04938
Gower, J. C. (1971), "A general coefficient of similarity and some of its properties". Biometrics, 27, 623–637.
plot.LocalModel
and predict.LocalModel
Shapley
can also be used to explain single predictions
The package lime
with the original implementation
library("randomForest") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") X <- Boston[-which(names(Boston) == "medv")] rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = X) # Explain the first instance of the dataset with the LocalModel method: x.interest <- X[1, ] lemon <- LocalModel$new(mod, x.interest = x.interest, k = 2) lemon # Look at the results in a table lemon$results # Or as a plot plot(lemon) # Reuse the object with a new instance to explain lemon$x.interest lemon$explain(X[2, ]) lemon$x.interest plot(lemon) # LocalModel also works with multiclass classification rf <- randomForest(Species ~ ., data = iris, ntree = 50) X <- iris[-which(names(iris) == "Species")] mod <- Predictor$new(rf, data = X, type = "prob", class = "setosa") # Then we explain the first instance of the dataset with the LocalModel method: lemon <- LocalModel$new(mod, x.interest = X[1, ], k = 2) lemon$results plot(lemon)
library("randomForest") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") X <- Boston[-which(names(Boston) == "medv")] rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = X) # Explain the first instance of the dataset with the LocalModel method: x.interest <- X[1, ] lemon <- LocalModel$new(mod, x.interest = x.interest, k = 2) lemon # Look at the results in a table lemon$results # Or as a plot plot(lemon) # Reuse the object with a new instance to explain lemon$x.interest lemon$explain(X[2, ]) lemon$x.interest plot(lemon) # LocalModel also works with multiclass classification rf <- randomForest(Species ~ ., data = iris, ntree = 50) X <- iris[-which(names(iris) == "Species")] mod <- Predictor$new(rf, data = X, type = "prob", class = "setosa") # Then we explain the first instance of the dataset with the LocalModel method: lemon <- LocalModel$new(mod, x.interest = X[1, ], k = 2) lemon$results plot(lemon)
Orders the levels by their similarity in other features. Computes per feature the distance, sums up all distances and does multi-dimensional scaling
order_levels(dat, feature.name)
order_levels(dat, feature.name)
dat |
data.frame with the training data |
feature.name |
the name of the categorical feature |
Goal: Compute the distances between two categories. Input: Instances from category 1 and 2
For all features, do (excluding the categorical feature for which we are computing the order):
If the feature is numerical: Take instances from category 1, calculate the empirical cumulative probability distribution function (ecdf) of the feature. The ecdf is a function that tells us for a given feature value, how many values are smaller. Do the same for category 2. The distance is the absolute maximum point-wise distance of the two ecdf. Practically, this value is high when the distribution from one category is strongly shifted far away from the other. This measure is also known as the Kolmogorov-Smirnov distance (https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).
If the feature is categorical: Take instances from category 1 and calculate a table with the relative frequency of each category of the other feature. Do the same for instances from category 2. The distance is the sum of the absolute difference of both relative frequency tables.
Sum up the distances over all features
This algorithm we run for all pairs of categories. Then we have a k times k matrix, when k is the number of categories, where each entry is the distance between two categories. Still not enough to have a single order, because, a (dis)similarity tells you the pair-wise distances, but does not give you a one-dimensional ordering of the classes. To kind of force this thing into a single dimension, we have to use a dimension reduction trick called multi-dimensional scaling. This can be solved using multi-dimensional scaling, which takes in a distance matrix and returns a distance matrix with reduced dimension. In our case, we only want 1 dimension left, so that we have a single ordering of the categories and can compute the accumulated local effects. After reducing it to a single ordering, we are done and can use this ordering to compute ALE. This is not the Holy Grail how to order the factors, but one possibility.
the order of the levels (not levels itself)
Effect of one or two feature(s) on the model predictions (deprecated)
Effect of one or two feature(s) on the model predictions (deprecated)
Deprecated, please use FeatureEffect.
iml::InterpretationMethod
-> iml::FeatureEffect
-> Partial
new()
Effect of one or two feature(s) on the model predictions
Partial$new( predictor, feature, aggregation = "pdp", ice = TRUE, center.at = NULL, grid.size = 20 )
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
feature
(character(1)
| character(2)
| numeric(1)
|
numeric(2)
)
The feature name or index for which to compute the effects.
aggregation
(character(1)
)
The aggregation approach to use. Possible values are "pdp"
,
"ale"
or "none"
.
ice
logical
Whether to compute ice plots.
center.at
(numeric(1)
)
Value at which the plot should be centered. Ignored in the case of two
features.
grid.size
(numeric(1)
| numeric(2)
)
The size of the grid for evaluating the predictions.
clone()
The objects of this class are cloneable with this method.
Partial$clone(deep = FALSE)
deep
Whether to make a deep clone.
plot.FeatureEffect()
plots the results of a FeatureEffect object.
## S3 method for class 'FeatureEffect' plot(x, rug = TRUE, show.data = FALSE, ylim = NULL)
## S3 method for class 'FeatureEffect' plot(x, rug = TRUE, show.data = FALSE, ylim = NULL)
x |
A FeatureEffect object. |
rug |
logical |
show.data |
( |
ylim |
( |
ggplot2 plot object
# We train a random forest on the Boston dataset: if (require("randomForest")) { data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = Boston) # Compute the ALE for the first feature eff <- FeatureEffect$new(mod, feature = "crim") # Plot the results directly plot(eff) }
# We train a random forest on the Boston dataset: if (require("randomForest")) { data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = Boston) # Compute the ALE for the first feature eff <- FeatureEffect$new(mod, feature = "crim") # Plot the results directly plot(eff) }
plot.FeatureEffect() plots the results of a FeatureEffect object.
## S3 method for class 'FeatureEffects' plot(x, features = NULL, nrows = NULL, ncols = NULL, fixed_y = TRUE, ...)
## S3 method for class 'FeatureEffects' plot(x, features = NULL, nrows = NULL, ncols = NULL, fixed_y = TRUE, ...)
x |
A FeatureEffect object. |
features |
character For which features should the effects be plotted? Default is all features. You can also sort the order of the plots with this argument. |
nrows |
The number of rows in the table of graphics |
ncols |
The number of columns in the table of graphics |
fixed_y |
Should the y-axis range be the same for all effects? Defaults to TRUE. |
... |
Further arguments for |
In contrast to other plot methods, for FeatureEffects the returned plot is not a ggplot2 object, but a grid object, a collection of multiple ggplot2 plots.
grid object
FeatureEffects plot.FeatureEffect
# We train a random forest on the Boston dataset: library("randomForest") data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = Boston) # Compute the partial dependence for the first feature eff <- FeatureEffects$new(mod) # Plot the results directly eff$plot() # For a subset of features eff$plot(features = c("lstat", "crim")) # With a different layout eff$plot(nrows = 2)
# We train a random forest on the Boston dataset: library("randomForest") data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = Boston) # Compute the partial dependence for the first feature eff <- FeatureEffects$new(mod) # Plot the results directly eff$plot() # For a subset of features eff$plot(features = c("lstat", "crim")) # With a different layout eff$plot(nrows = 2)
plot.FeatureImp()
plots the feature importance results of a FeatureImp
object.
## S3 method for class 'FeatureImp' plot(x, sort = TRUE, ...)
## S3 method for class 'FeatureImp' plot(x, sort = TRUE, ...)
x |
A FeatureImp object |
sort |
logical. Should the features be sorted in descending order? Defaults to TRUE. |
... |
Further arguments for the objects plot function |
The plot shows the importance per feature.
When n.repetitions
in FeatureImp$new
was larger than 1, then we get
multiple importance estimates per feature. The importance are aggregated and
the plot shows the median importance per feature (as dots) and also the
90%-quantile, which helps to understand how much variance the computation has
per feature.
ggplot2 plot object
library("rpart") # We train a tree on the Boston dataset: data("Boston", package = "MASS") tree <- rpart(medv ~ ., data = Boston) y <- Boston$medv X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(tree, data = X, y = y) # Compute feature importances as the performance drop in mean absolute error imp <- FeatureImp$new(mod, loss = "mae") # Plot the results directly plot(imp)
library("rpart") # We train a tree on the Boston dataset: data("Boston", package = "MASS") tree <- rpart(medv ~ ., data = Boston) y <- Boston$medv X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(tree, data = X, y = y) # Compute feature importances as the performance drop in mean absolute error imp <- FeatureImp$new(mod, loss = "mae") # Plot the results directly plot(imp)
plot.Interaction()
plots the results of an Interaction object.
## S3 method for class 'Interaction' plot(x, sort = TRUE)
## S3 method for class 'Interaction' plot(x, sort = TRUE)
x |
An Interaction R6 object |
sort |
logical. Should the features be sorted in descending order? Defaults to TRUE. |
ggplot2 plot object
# We train a tree on the Boston dataset: ## Not run: library("rpart") data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) mod <- Predictor$new(rf, data = Boston) # Compute the interactions ia <- Interaction$new(mod) # Plot the results directly plot(ia) ## End(Not run)
# We train a tree on the Boston dataset: ## Not run: library("rpart") data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) mod <- Predictor$new(rf, data = Boston) # Compute the interactions ia <- Interaction$new(mod) # Plot the results directly plot(ia) ## End(Not run)
plot.LocalModel()
plots the feature effects of a LocalModel object.
## S3 method for class 'LocalModel' plot(object)
## S3 method for class 'LocalModel' plot(object)
object |
A LocalModel R6 object |
ggplot2 plot object
library("randomForest") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") X <- Boston[-which(names(Boston) == "medv")] rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = X) # Explain the first instance of the dataset with the LocalModel method: x.interest <- X[1, ] lemon <- LocalModel$new(mod, x.interest = x.interest, k = 2) plot(lemon)
library("randomForest") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") X <- Boston[-which(names(Boston) == "medv")] rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = X) # Explain the first instance of the dataset with the LocalModel method: x.interest <- X[1, ] lemon <- LocalModel$new(mod, x.interest = x.interest, k = 2) plot(lemon)
plot.Shapley() plots the Shapley values - the contributions of feature values to the prediction.
## S3 method for class 'Shapley' plot(object, sort = TRUE)
## S3 method for class 'Shapley' plot(object, sort = TRUE)
object |
A Shapley R6 object |
sort |
logical |
ggplot2 plot object
## Not run: library("rpart") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(rf, data = X) # Then we explain the first instance of the dataset with the Shapley method: x.interest <- X[1, ] shapley <- Shapley$new(mod, x.interest = x.interest) plot(shapley) ## End(Not run)
## Not run: library("rpart") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(rf, data = X) # Then we explain the first instance of the dataset with the Shapley method: x.interest <- X[1, ] shapley <- Shapley$new(mod, x.interest = x.interest) plot(shapley) ## End(Not run)
Plot the response for newdata
of a TreeSurrogate object.
Each plot facet is one leaf node and visualizes the distribution of the
from the machine learning model.
## S3 method for class 'TreeSurrogate' plot(object)
## S3 method for class 'TreeSurrogate' plot(object)
object |
A TreeSurrogate object. |
ggplot2 plot object
library("randomForest") # Fit a Random Forest on the Boston housing data set data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod) # Plot the resulting leaf nodes plot(dt)
library("randomForest") # Fit a Random Forest on the Boston housing data set data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod) # Plot the resulting leaf nodes plot(dt)
Predict the response for newdata with the LocalModel model.
## S3 method for class 'LocalModel' predict(object, newdata = NULL, ...)
## S3 method for class 'LocalModel' predict(object, newdata = NULL, ...)
object |
A LocalModel R6 object |
newdata |
A data.frame for which to predict |
... |
Further arguments for the objects predict function |
A data.frame with the predicted outcome.
library("randomForest") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") X <- Boston[-which(names(Boston) == "medv")] rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = X) # Explain the first instance of the dataset with the LocalModel method: x.interest <- X[1, ] lemon <- LocalModel$new(mod, x.interest = x.interest, k = 2) predict(lemon, newdata = x.interest)
library("randomForest") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") X <- Boston[-which(names(Boston) == "medv")] rf <- randomForest(medv ~ ., data = Boston, ntree = 50) mod <- Predictor$new(rf, data = X) # Explain the first instance of the dataset with the LocalModel method: x.interest <- X[1, ] lemon <- LocalModel$new(mod, x.interest = x.interest, k = 2) predict(lemon, newdata = x.interest)
Predict the response for newdata of a TreeSurrogate object.
This function makes the TreeSurrogate object call
its internal $predict()
method.
## S3 method for class 'TreeSurrogate' predict(object, newdata, type = "prob", ...)
## S3 method for class 'TreeSurrogate' predict(object, newdata, type = "prob", ...)
object |
The surrogate tree. A TreeSurrogate object. |
newdata |
A data.frame for which to predict. |
type |
Either "prob" or "class". Ignored if the surrogate tree does regression. |
... |
Further arguments for |
A data.frame with the predicted outcome.
In case of regression it is the predicted . In case of
classification it is either the class probabilities (for type "prob") or the
class label (type "class")
library("randomForest") # Fit a Random Forest on the Boston housing data set data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod) # Plot the resulting leaf nodes predict(dt, newdata = Boston)
library("randomForest") # Fit a Random Forest on the Boston housing data set data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod) # Plot the resulting leaf nodes predict(dt, newdata = Boston)
A Predictor
object holds any machine learning model (mlr
, caret
,
randomForest
, ...) and the data to be used for analyzing the model. The
interpretation methods in the iml
package need the machine learning model
to be wrapped in a Predictor
object.
A Predictor object is a container for the prediction model and the data. This ensures that the machine learning model can be analyzed in a robust way.
Note: In case of classification, the model should return one column per class with the class probability.
data
data.frame
Data object with the data for the model interpretation.
model
(any)
The machine learning model.
batch.size
numeric(1)
The number of rows to be input the model for prediction at once.
class
character(1)
The class column to be returned.
prediction.colnames
character
The column names of the predictions.
prediction.function
function
The function to predict newdata.
task
character(1)
The inferred prediction task: "classification"
or "regression"
.
new()
Create a Predictor object
Predictor$new( model = NULL, data = NULL, predict.function = NULL, y = NULL, class = NULL, type = NULL, batch.size = 1000 )
model
any
The machine learning model. Recommended are models from mlr
and
caret
. Other machine learning with a S3 predict functions work as
well, but less robust (e.g. randomForest
).
data
data.frame
The data to be used for analyzing the prediction model. Allowed column
classes are: numeric, factor, integer, ordered and character
For some models the data can be extracted automatically.
Predictor$new()
throws an error when it can't extract the data
automatically.
predict.function
function
The function to predict newdata. Only needed if model
is not a model
from mlr
or caret
package. The first argument of predict.fun
has to
be the model, the second the newdata
:
function(model, newdata)
y
character(1)
| numeric | factor
The target vector or
(preferably) the name of the target column in the data
argument.
Predictor tries to infer the target automatically from the model.
class
character(1)
The class column to be returned. You should use the column name of the
predicted class, e.g. class="setosa"
.
type
character(1)
)
This argument is passed to the prediction
function of the model. For regression models you usually don't have to
provide the type argument. The classic use case is to say type="prob"
for classification models. Consult the documentation of the machine
learning package you use to find which type options you have. If both
predict.fun
and type
are used, then type is passed as an argument
to predict.fun
.
batch.size
numeric(1)
The maximum number of rows to be input the model for prediction at once.
Currently only respected for FeatureImp, Partial and Interaction.
predict()
Predict new data with the machine learning model.
Predictor$predict(newdata)
newdata
data.frame
Data to predict on.
print()
Print the Predictor object.
Predictor$print()
clone()
The objects of this class are cloneable with this method.
Predictor$clone(deep = FALSE)
deep
Whether to make a deep clone.
library("mlr") task <- makeClassifTask(data = iris, target = "Species") learner <- makeLearner("classif.rpart", minsplit = 7, predict.type = "prob") mod.mlr <- train(learner, task) mod <- Predictor$new(mod.mlr, data = iris) mod$predict(iris[1:5, ]) mod <- Predictor$new(mod.mlr, data = iris, class = "setosa") mod$predict(iris[1:5, ]) library("randomForest") rf <- randomForest(Species ~ ., data = iris, ntree = 20) mod <- Predictor$new(rf, data = iris, type = "prob") mod$predict(iris[50:55, ]) # Feature importance needs the target vector, which needs to be supplied: mod <- Predictor$new(rf, data = iris, y = "Species", type = "prob")
library("mlr") task <- makeClassifTask(data = iris, target = "Species") learner <- makeLearner("classif.rpart", minsplit = 7, predict.type = "prob") mod.mlr <- train(learner, task) mod <- Predictor$new(mod.mlr, data = iris) mod$predict(iris[1:5, ]) mod <- Predictor$new(mod.mlr, data = iris, class = "setosa") mod$predict(iris[1:5, ]) library("randomForest") rf <- randomForest(Species ~ ., data = iris, ntree = 20) mod <- Predictor$new(rf, data = iris, type = "prob") mod$predict(iris[50:55, ]) # Feature importance needs the target vector, which needs to be supplied: mod <- Predictor$new(rf, data = iris, y = "Species", type = "prob")
Turn class probabilities into class labels
probs.to.labels(prediction)
probs.to.labels(prediction)
prediction |
Prediction object. |
Shapley
computes feature contributions for single predictions with the
Shapley value, an approach from cooperative game theory. The features values
of an instance cooperate to achieve the prediction. The Shapley value fairly
distributes the difference of the instance's prediction and the datasets
average prediction among the features.
For more details on the algorithm see https://christophm.github.io/interpretable-ml-book/shapley.html
iml::InterpretationMethod
-> Shapley
x.interest
data.frame
Single row with the instance to be explained.
y.hat.interest
numeric
Predicted value for instance of interest.
y.hat.average
numeric(1)
Average predicted value for data X
.
sample.size
numeric(1)
The number of times coalitions/marginals are
sampled from data X. The higher the more accurate the explanations
become.
new()
Create a Shapley object
Shapley$new(predictor, x.interest = NULL, sample.size = 100)
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
x.interest
data.frame
Single row with the instance to be explained.
sample.size
numeric(1)
The number of Monte Carlo samples for estimating the Shapley value.
data.frame
data.frame with the Shapley values (phi) per feature.
explain()
Set a new data point which to explain.
Shapley$explain(x.interest)
x.interest
data.frame
Single row with the instance to be explained.
clone()
The objects of this class are cloneable with this method.
Shapley$clone(deep = FALSE)
deep
Whether to make a deep clone.
Strumbelj, E., Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647-665. https://doi.org/10.1007/s10115-013-0679-x
A different way to explain predictions: LocalModel
library("rpart") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(rf, data = X) # Then we explain the first instance of the dataset with the Shapley method: x.interest <- X[1, ] shapley <- Shapley$new(mod, x.interest = x.interest) shapley # Look at the results in a table shapley$results # Or as a plot plot(shapley) # Explain another instance shapley$explain(X[2, ]) plot(shapley) ## Not run: # Shapley() also works with multiclass classification rf <- rpart(Species ~ ., data = iris) X <- iris[-which(names(iris) == "Species")] mod <- Predictor$new(rf, data = X, type = "prob") # Then we explain the first instance of the dataset with the Shapley() method: shapley <- Shapley$new(mod, x.interest = X[1, ]) shapley$results plot(shapley) # You can also focus on one class mod <- Predictor$new(rf, data = X, type = "prob", class = "setosa") shapley <- Shapley$new(mod, x.interest = X[1, ]) shapley$results plot(shapley) ## End(Not run)
library("rpart") # First we fit a machine learning model on the Boston housing data data("Boston", package = "MASS") rf <- rpart(medv ~ ., data = Boston) X <- Boston[-which(names(Boston) == "medv")] mod <- Predictor$new(rf, data = X) # Then we explain the first instance of the dataset with the Shapley method: x.interest <- X[1, ] shapley <- Shapley$new(mod, x.interest = x.interest) shapley # Look at the results in a table shapley$results # Or as a plot plot(shapley) # Explain another instance shapley$explain(X[2, ]) plot(shapley) ## Not run: # Shapley() also works with multiclass classification rf <- rpart(Species ~ ., data = iris) X <- iris[-which(names(iris) == "Species")] mod <- Predictor$new(rf, data = X, type = "prob") # Then we explain the first instance of the dataset with the Shapley() method: shapley <- Shapley$new(mod, x.interest = X[1, ]) shapley$results plot(shapley) # You can also focus on one class mod <- Predictor$new(rf, data = X, type = "prob", class = "setosa") shapley <- Shapley$new(mod, x.interest = X[1, ]) shapley$results plot(shapley) ## End(Not run)
TreeSurrogate
fits a decision tree on the predictions of a prediction model.
A conditional inference tree is fitted on the predicted from
the machine learning model and the data. The
partykit
package and
function are used to fit the tree. By default a tree of maximum depth of 2 is
fitted to improve interpretability.
To learn more about global surrogate models, read the Interpretable Machine Learning book: https://christophm.github.io/interpretable-ml-book/global.html
iml::InterpretationMethod
-> TreeSurrogate
tree
party
The fitted tree. See also partykit::ctree.
maxdepth
numeric(1)
The maximum tree depth.
r.squared
numeric(1|n.classes)
R squared measures how well the decision tree approximates the
underlying model. It is calculated as 1 - (variance of prediction
differences / variance of black box model predictions). For the
multi-class case, r.squared contains one measure per class.
new()
Create a TreeSurrogate object
TreeSurrogate$new(predictor, maxdepth = 2, tree.args = NULL)
predictor
Predictor
The object (created with Predictor$new()
) holding the machine
learning model and the data.
maxdepth
numeric(1)
The maximum depth of the tree. Default is 2.
tree.args
(named list)
Further arguments for party::ctree()
.
predict()
Predict new data with the tree. See also predict.TreeSurrogate
TreeSurrogate$predict(newdata, type = "prob", ...)
newdata
data.frame
Data to predict on.
type
Prediction type.
...
Further arguments passed to predict()
.
clone()
The objects of this class are cloneable with this method.
TreeSurrogate$clone(deep = FALSE)
deep
Whether to make a deep clone.
Craven, M., & Shavlik, J. W. (1996). Extracting tree-structured representations of trained networks. In Advances in neural information processing systems (pp. 24-30).
predict.TreeSurrogate plot.TreeSurrogate
For the tree implementation
partykit::ctree()
library("randomForest") # Fit a Random Forest on the Boston housing data set data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod) # Plot the resulting leaf nodes plot(dt) # Use the tree to predict new data predict(dt, Boston[1:10, ]) # Extract the results dat <- dt$results head(dat) # It also works for classification rf <- randomForest(Species ~ ., data = iris, ntree = 50) X <- iris[-which(names(iris) == "Species")] mod <- Predictor$new(rf, data = X, type = "prob") # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod, maxdepth = 2) # Plot the resulting leaf nodes plot(dt) # If you want to visualize the tree directly: plot(dt$tree) # Use the tree to predict new data set.seed(42) iris.sample <- X[sample(1:nrow(X), 10), ] predict(dt, iris.sample) predict(dt, iris.sample, type = "class") # Extract the dataset dat <- dt$results head(dat)
library("randomForest") # Fit a Random Forest on the Boston housing data set data("Boston", package = "MASS") rf <- randomForest(medv ~ ., data = Boston, ntree = 50) # Create a model object mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")]) # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod) # Plot the resulting leaf nodes plot(dt) # Use the tree to predict new data predict(dt, Boston[1:10, ]) # Extract the results dat <- dt$results head(dat) # It also works for classification rf <- randomForest(Species ~ ., data = iris, ntree = 50) X <- iris[-which(names(iris) == "Species")] mod <- Predictor$new(rf, data = X, type = "prob") # Fit a decision tree as a surrogate for the whole random forest dt <- TreeSurrogate$new(mod, maxdepth = 2) # Plot the resulting leaf nodes plot(dt) # If you want to visualize the tree directly: plot(dt$tree) # Use the tree to predict new data set.seed(42) iris.sample <- X[sample(1:nrow(X), 10), ] predict(dt, iris.sample) predict(dt, iris.sample, type = "class") # Extract the dataset dat <- dt$results head(dat)