Skip to contents

OmicSelector 2.0 is a PhD-level toolkit for high-dimensional biomarker discovery that guarantees scientific validity through rigorous machine learning methodology.

Details

## Key Features

- **Zero Data Leakage**: All preprocessing, feature selection, and model training occurs within proper cross-validation folds via mlr3 GraphLearners.

- **Nested Cross-Validation**: Inner loop for feature selection and hyperparameter tuning, outer loop for unbiased performance estimation.

- **Stability Metrics**: Nogueira Stability Index to ensure selected features are robust across resamples, not just high accuracy.

- **Reproducibility**: renv lockfiles, Docker containers, and deterministic pipelines.

## Core Classes

- [OmicPipeline]: Central R6 class for creating leakage-free pipelines - [BenchmarkService]: Enforces proper nested cross-validation

## Quick Start

“`r library(OmicSelector)

# Create pipeline from data pipeline <- OmicPipeline$new( data = my_data, target = "outcome", positive = "Case" )

# Create graph learner with feature selection learner <- pipeline$create_graph_learner( filter = "anova", model = "ranger", n_features = 20 )

# Run nested CV benchmark service <- BenchmarkService$new(pipeline, outer_folds = 5, inner_folds = 3) service$add_learner(learner) result <- service$run() “`

## Philosophy

"Optimization without validation is hallucination."

OmicSelector 2.0 prioritizes **zero data leakage** and **feature stability** above raw accuracy metrics. High accuracy with unstable feature sets (the "Rashomon Effect") indicates overfitting, not real signal.

## Migration from v1.0

Legacy functions like `OmicSelector_iteratedRFE` are deprecated due to data leakage issues. See [list_deprecated_functions()] for the full list and migration guidance.

See also

- [OmicPipeline]: Main pipeline class - [BenchmarkService]: Nested CV service - [validate_no_leakage()]: Check for leakage risks - [list_deprecated_functions()]: Deprecated function list

Author

Maintainer: Konrad Stawiski konrad.stawiski@umed.lodz.pl (ORCID)

Authors:

  • Marcin Kaszkowiak

  • Damian Mikulski