Skip to contents

Calculates the Expected Calibration Error, which measures how well predicted probabilities match observed frequencies. Lower is better.

Usage

compute_ece(probs, labels, n_bins = 10, weighting = c("samples", "uniform"))

Arguments

probs

Numeric vector of predicted probabilities (0-1)

labels

Binary labels (0/1 or logical)

n_bins

Number of bins for grouping probabilities (default: 10)

weighting

How to weight bins: "samples" (default) or "uniform"

Value

A list containing: - ece: Expected Calibration Error (0-1, lower is better) - mce: Maximum Calibration Error - bin_data: Data frame with per-bin statistics

Details

ECE is computed as: $$ECE = \sum_{b=1}^{B} \frac{n_b}{N} |accuracy_b - confidence_b|$$

where B is the number of bins, n_b is samples in bin b, N is total samples, accuracy_b is fraction of positives in bin, and confidence_b is mean predicted probability in bin.

Examples

if (FALSE) { # \dontrun{
# Simulated predictions
probs <- runif(100)
labels <- rbinom(100, 1, probs)  # Well-calibrated by construction

result <- compute_ece(probs, labels)
print(result$ece)  # Should be low
} # }