OmicSelector: Zero-Leakage Biomarker Discovery Toolkit
Source:R/OmicSelector-package.R
OmicSelector-package.RdOmicSelector 2.0 is a PhD-level toolkit for high-dimensional biomarker discovery that guarantees scientific validity through rigorous machine learning methodology.
Details
## Key Features
- **Zero Data Leakage**: All preprocessing, feature selection, and model training occurs within proper cross-validation folds via mlr3 GraphLearners.
- **Nested Cross-Validation**: Inner loop for feature selection and hyperparameter tuning, outer loop for unbiased performance estimation.
- **Stability Metrics**: Nogueira Stability Index to ensure selected features are robust across resamples, not just high accuracy.
- **Reproducibility**: renv lockfiles, Docker containers, and deterministic pipelines.
## Core Classes
- [OmicPipeline]: Central R6 class for creating leakage-free pipelines - [BenchmarkService]: Enforces proper nested cross-validation
## Quick Start
“`r library(OmicSelector)
# Create pipeline from data pipeline <- OmicPipeline$new( data = my_data, target = "outcome", positive = "Case" )
# Create graph learner with feature selection learner <- pipeline$create_graph_learner( filter = "anova", model = "ranger", n_features = 20 )
# Run nested CV benchmark service <- BenchmarkService$new(pipeline, outer_folds = 5, inner_folds = 3) service$add_learner(learner) result <- service$run() “`
## Philosophy
"Optimization without validation is hallucination."
OmicSelector 2.0 prioritizes **zero data leakage** and **feature stability** above raw accuracy metrics. High accuracy with unstable feature sets (the "Rashomon Effect") indicates overfitting, not real signal.
## Migration from v1.0
Legacy functions like `OmicSelector_iteratedRFE` are deprecated due to data leakage issues. See [list_deprecated_functions()] for the full list and migration guidance.
See also
- [OmicPipeline]: Main pipeline class - [BenchmarkService]: Nested CV service - [validate_no_leakage()]: Check for leakage risks - [list_deprecated_functions()]: Deprecated function list
Author
Maintainer: Konrad Stawiski konrad.stawiski@umed.lodz.pl (ORCID)
Authors:
Marcin Kaszkowiak
Damian Mikulski