Builds an ensemble of models using stability-based feature selection. The approach: 1. Run N bootstrap resamples with feature selection 2. Compute feature selection frequencies 3. Create nested overlapping subsets based on stability tiers 4. Train K base models on different stability tiers 5. Combine predictions via stability-weighted voting
This approach addresses the "reproducibility crisis" in biomarker discovery by prioritizing stable, reproducible feature sets over single-run selections.
References
Meinshausen & Buhlmann (2010). Stability Selection. JRSS-B. Nogueira et al. (2018). On the Stability of Feature Selection Algorithms.
Public fields
n_bootstrapNumber of bootstrap resamples
filterFeature selection filter method
n_featuresNumber of features to select per bootstrap
base_learnerBase learner for ensemble members
threshold_mode"adaptive" (target set size) or "fixed" (percentage)
target_set_sizeTarget number of stable features (for adaptive mode)
fixed_thresholdsThresholds for fixed mode (creates nested subsets)
aggregation"weighted" (stability-weighted) or "average" (simple)
verbosePrint progress messages
frequenciesFeature selection frequencies from last fit
stable_featuresList of stable feature sets by tier
tier_weightsWeights for each tier in ensemble
fitted_modelsList of fitted models per tier
Methods
Method new()
Create a new StabilityEnsemble
Usage
StabilityEnsemble$new(
n_bootstrap = 100,
filter = "anova",
n_features = 50,
base_learner = "glmnet",
threshold_mode = "adaptive",
target_set_size = c(10, 50),
fixed_thresholds = c(0.9, 0.7, 0.5),
aggregation = "weighted",
verbose = TRUE
)Arguments
n_bootstrapNumber of bootstrap resamples (default: 100)
filterFeature selection filter: "anova", "mrmr", "importance"
n_featuresFeatures to select per bootstrap (default: 50)
base_learnerLearner for ensemble: "glmnet", "ranger", "xgboost"
threshold_mode"adaptive" or "fixed"
target_set_sizeTarget stable set size range for adaptive mode
fixed_thresholdsThresholds for fixed mode
aggregation"weighted" or "average"
verbosePrint progress
Method predict()
Predict on new data using the ensemble