Implements a multi-stage feature reduction pipeline to handle high-dimensional omics data efficiently. The pipeline chains: 1. Variance thresholding (remove near-zero variance) 2. Univariate filtering (ANOVA F-test) 3. RF importance filtering (single-pass approximation of RFE) 4. LASSO regularization for final selection
Note: The RF importance stage is a single-pass approximation using Random Forest variable importance, not true iterative Recursive Feature Elimination.
This hybrid approach outperforms single-method selection for high-dimensional data.
Public fields
stagesList of selection stages with parameters
selected_featuresFeatures selected at each stage
verboseWhether to print progress messages
Methods
Method new()
Create a new SequentialSelector
Usage
SequentialSelector$new(
variance_threshold = 0.01,
univariate_n = 5000,
univariate_method = "anova",
rfe_n = 1000,
lasso_n = NULL,
verbose = TRUE
)Arguments
variance_thresholdMinimum variance to keep a feature (default: 0.01)
univariate_nNumber of features after univariate filtering (default: 5000)
univariate_methodUnivariate method: "anova", "kruskal", "correlation"
rfe_nNumber of features after RFE (default: 1000)
lasso_nTarget number of features after LASSO (default: NULL = auto)
verbosePrint progress messages
Method create_learner()
Create a complete GraphLearner with HSFS and a final classifier