Title: | The Residual-Based Predictiveness Curve |
---|---|
Description: | The RBP curve is a visual tool to assess the performance of prediction models. |
Authors: | Giuseppe Casalicchio, Bernd Bischl |
Maintainer: | Giuseppe Casalicchio <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2025-03-08 02:44:51 UTC |
Source: | https://github.com/giuseppec/rbpcurve |
The integral of the RBP curve is a measure for good calibration. If the sum of the two integrals (below and above the RBP curve) is close to 0, good calibration is satisfied and the prevalence is close to the average predicted probabilities.
addGoodCalib(obj, plot.values = TRUE, show.info = TRUE, col = grDevices::rgb(0, 0, 0, 0.25), border = NA, ...)
addGoodCalib(obj, plot.values = TRUE, show.info = TRUE, col = grDevices::rgb(0, 0, 0, 0.25), border = NA, ...)
obj |
[ |
plot.values |
[ |
show.info |
[ |
col |
[ |
border |
[ |
... |
[any] |
[invisible(NULL)
].
The PEV measure is the difference between the conditional expectation of the predicted
probabilities (conditional on the two groups that are determined by the target variable).
The PEV measure can be visually obtained by the RBP curve, namely by the difference of the
two areas that are Highlighted with addPEV
.
addPEV(obj, plot.values = TRUE, show.info = TRUE, text.col = "black", col = rgb(0, 0, 0, 0.25))
addPEV(obj, plot.values = TRUE, show.info = TRUE, text.col = "black", col = rgb(0, 0, 0, 0.25))
obj |
[ |
plot.values |
[ |
show.info |
[ |
text.col |
[ |
col |
[ |
[invisible(NULL)
].
The prevalence is the proportion of a population having a specific condition.
In binary classification, the condition refers to whether the target variable has the value
1
, that is, whether the target variable corresponds to the positive class.
addPrevalence(obj, plot.values = TRUE, digits = 3L, col = "grey")
addPrevalence(obj, plot.values = TRUE, digits = 3L, col = "grey")
obj |
[ |
plot.values |
[ |
digits |
[ |
col |
[ |
[invisible(NULL)
].
For a given threshold tresh
, the true positive rate (TPR)
and the false positive rate (FPR) can be visually assessed by the RBP curve
by the intersection of the RBP curve with the horizontal lines at
-thresh
and 1 - thresh
, respectively.
addRates(obj, plot.values = TRUE, digits = 3L, col = "black", thresh = obj$prev, thresh.label = "thresh")
addRates(obj, plot.values = TRUE, digits = 3L, col = "black", thresh = obj$prev, thresh.label = "thresh")
obj |
[ |
plot.values |
[ |
digits |
[ |
col |
[ |
thresh |
[ |
thresh.label |
[ |
[invisible(NULL)
].
A measure for a well calibrated model can be obtained by grouping the predicted probabilities via deciles yielding 10 groups. The equally collored areas belong to a specific group. When each of the two equally collored areas are similar, the model is well calibrated.
addWellCalib(obj, plot.values = TRUE, subplot.control = list(diff = TRUE), col = shape::greycol(10L, interval = c(0.3, 1)), pos = NULL)
addWellCalib(obj, plot.values = TRUE, subplot.control = list(diff = TRUE), col = shape::greycol(10L, interval = c(0.3, 1)), pos = NULL)
obj |
[ |
plot.values |
[ |
subplot.control |
[ |
col |
[ |
pos |
[ |
A matrix that contains the average of the “probabilities within deciles” conditional on Y.
Must be created for all subsequent plot function calls.
makeRBPObj(pred, y, positive = NULL)
makeRBPObj(pred, y, positive = NULL)
pred |
[ |
y |
[ |
positive |
[ |
Object members:
n
[numeric(1)
]Number of observations.
pred
[numeric(n)
]Predicted probabilities.
y
[numeric(n)
]Target variable having the values 0 and 1.
positive
[character(1)
]Positive class label of traget variable. Only present when y
is a factor.
e0
[numeric(1)
]Average of the predicted probabilities conditional on y=0
.
e1
[numeric(1)
]Average of the predicted probabilities conditional on y=1
.
pev
[numeric(1)
]Proportion of explained variation measure. Computed as e1-e0
.
tpr
[numeric(1)
]True positive rate.
fpr
[numeric(1)
]False positive rate.
prev
[numeric(1)
]Prevalence.
one.min.prev
[numeric(1)
]One minus the value of the prevalence.
axis.x
[numeric(n)
]Values for the X-Axis of the RBP curve.
axis.y
[numeric(n)
]Values for the Y-Axis of the RBP curve.
plots the RBP curve
plotRBPCurve(obj, main = "RBP Curve", xlab = "Cumulative Percentage", ylab = "Estimated Residuals", type = "l", ylim = c(-1, 1.2), x.adj = c(NA, -0.5), y.adj = c(NA, NA), cond.axis = FALSE, title.line = ifelse(cond.axis, 3, 2), add = FALSE, ...)
plotRBPCurve(obj, main = "RBP Curve", xlab = "Cumulative Percentage", ylab = "Estimated Residuals", type = "l", ylim = c(-1, 1.2), x.adj = c(NA, -0.5), y.adj = c(NA, NA), cond.axis = FALSE, title.line = ifelse(cond.axis, 3, 2), add = FALSE, ...)
obj |
[ |
main |
[ |
xlab |
[ |
ylab |
[ |
type |
[ |
ylim |
[ |
x.adj |
[ |
y.adj |
[ |
cond.axis |
[ |
title.line |
[ |
add |
[ |
... |
# Download data mydata = getTaskData(pid.task) head(mydata) # Build logit model and plot RBP curve mylogit <- glm(diabetes ~ ., data = mydata, family = "binomial") y = mydata$diabetes pred1 = predict(mylogit, type="response") obj1 = makeRBPObj(pred1, y) plotRBPCurve(obj1, cond.axis = TRUE, type = "b") ## Not run: # Build logit model using mlr and plot RBP curve task = pid.task lrn = makeLearner("classif.logreg", predict.type = "prob") tr = train(lrn, task) pred2 = getPredictionProbabilities(predict(tr, task)) obj2 = makeRBPObj(pred2, y) plotRBPCurve(obj2, cond.axis = TRUE, type = "b", col = 2) ## End(Not run)
# Download data mydata = getTaskData(pid.task) head(mydata) # Build logit model and plot RBP curve mylogit <- glm(diabetes ~ ., data = mydata, family = "binomial") y = mydata$diabetes pred1 = predict(mylogit, type="response") obj1 = makeRBPObj(pred1, y) plotRBPCurve(obj1, cond.axis = TRUE, type = "b") ## Not run: # Build logit model using mlr and plot RBP curve task = pid.task lrn = makeLearner("classif.logreg", predict.type = "prob") tr = train(lrn, task) pred2 = getPredictionProbabilities(predict(tr, task)) obj2 = makeRBPObj(pred2, y) plotRBPCurve(obj2, cond.axis = TRUE, type = "b", col = 2) ## End(Not run)