Apply Frozen ComBat Within Cross-Validation Folds
Source:R/frozen-combat.R
apply_frozen_combat_cv.RdProperly applies FrozenComBat batch correction within each fold of cross-validation. This is the CORRECT way to apply batch correction for ML pipelines - it prevents data leakage by fitting parameters only on training indices and applying to test.
Usage
apply_frozen_combat_cv(
data,
batch,
train_indices,
test_indices = NULL,
covariates = NULL,
parametric = TRUE
)Value
List with: - corrected_train: Batch-corrected training data - corrected_test: Batch-corrected test data (if test_indices provided) - frozen_combat: The fitted FrozenComBat object (for external validation)
Details
## IMPORTANT: Proper Usage in Nested CV
For nested cross-validation, you should use this function OR the PipeOp:
“`r # Option 1: Use PipeOp in mlr3pipelines (recommended) po_combat <- create_frozen_combat_pipeop(batch_col = "batch") graph <- po_combat
# Option 2: Manual application in custom CV loop for (fold in folds) result <- apply_frozen_combat_cv( data = features, batch = batch_vector, train_indices = fold$train, test_indices = fold$test ) # Use result$corrected_train and result$corrected_test “`
## WRONG: Do NOT do this! “`r # WRONG: Applying ComBat to all data before CV causes leakage! corrected_all <- sva::ComBat(all_data, batch) # LEAKAGE! cv_result <- run_cv(corrected_all) # Inflated performance “`
Examples
if (FALSE) { # \dontrun{
set.seed(42)
data <- matrix(rnorm(200), nrow = 40, ncol = 5)
batch <- rep(c("A", "B"), each = 20)
# 5-fold CV
folds <- split(1:40, rep(1:5, each = 8))
for (i in seq_along(folds)) {
test_idx <- folds[[i]]
train_idx <- setdiff(1:40, test_idx)
result <- apply_frozen_combat_cv(
data = data,
batch = batch,
train_indices = train_idx,
test_indices = test_idx
)
# Train model on result$corrected_train
# Evaluate on result$corrected_test
}
} # }