Skip to contents

Applies SMOTE to balance class distributions in omics data. Uses k-nearest neighbors to generate synthetic samples.

Usage

smote_augment(task, ratio = 1, k = 5L)

Arguments

task

An mlr3 classification task

ratio

Target ratio of minority to majority (default: 1.0 = balanced)

k

Number of nearest neighbors for SMOTE (default: 5)

Value

A new task with augmented data

Details

For omics data, SMOTE should be applied with caution: - Use within CV folds only (to prevent data leakage) - Consider feature selection before SMOTE (faster, better interpolation) - May create unrealistic expression profiles in high-dimensional space

Examples

if (FALSE) { # \dontrun{
# Create balanced task
task_balanced <- smote_augment(task, ratio = 1.0)
} # }