BenchmarkService: Nested Cross-Validation with Zero Leakage
Source:R/BenchmarkService.R
BenchmarkService.RdR6 class that enforces proper nested cross-validation for biomarker discovery. Implements the outer loop (evaluation) and inner loop (selection) pattern required for unbiased performance estimation.
Details
The BenchmarkService guarantees scientific validity by: - Enforcing that feature selection occurs in the inner loop only - Computing the Nogueira Stability Index across outer folds - Tracking which features are selected in each fold for consensus analysis - Preventing any access to test data during training/selection
Methods
Method new()
Create a new BenchmarkService
Usage
BenchmarkService$new(
task,
outer_folds = 5,
inner_folds = 3,
stratify = TRUE,
groups = NULL,
seed = NULL
)Arguments
taskAn mlr3 Task or OmicPipeline object
outer_foldsNumber of outer CV folds (evaluation)
inner_foldsNumber of inner CV folds (selection/tuning)
stratifyLogical, whether to stratify by outcome
groupsOptional column name for grouped CV (e.g., patient_id)
seedRandom seed for reproducibility
Method run()
Run the nested cross-validation benchmark
Examples
if (FALSE) { # \dontrun{
# Create benchmark service
service <- BenchmarkService$new(
task = my_task,
outer_folds = 5,
inner_folds = 3
)
# Add learners with embedded feature selection
service$add_learner(my_graph_learner)
# Run nested CV
result <- service$run()
# Get stability metrics
stability <- result$get_stability()
} # }