Radiomics Robustness Module¶
A built-in feature stability assessment module for rtpipeline that evaluates radiomics features under systematic perturbations following the NTCV methodology of Zwanenburg et al. (2019).1
Overview¶
The radiomics robustness module helps you identify stable, reproducible radiomics features for modeling by:
- Collecting segmentations from multiple sources:
- TotalSegmentator (RS_auto.dcm)
- Custom structures (RS_custom.dcm)
- Custom models (Segmentation_{model_name}/rtstruct.dcm)
- Generating systematic perturbations via the validated NTCV chain (Noise + Translation + Contour + Volume):1
- N: Image noise injection (Gaussian noise in HU)
- T: Rigid translations (±3-5 mm geometric shifts)
- C: Contour randomization (morphological boundary randomization via erosion/dilation smoothing)
- V: Volume adaptation (erosion/dilation ±15-30% volume change)
- Re-extracting radiomics features for each perturbation using PyRadiomics
- Computing robustness metrics: ICC(3,1) with analytical 95% CIs (via Pingouin4), CoV, QCD, and cohort-wide pass fractions.3
- Classifying features as "robust", "acceptable", or "poor" based on configurable thresholds informed by radiomics reproducibility literature.3
Scientific Background¶
This implementation follows the NTCV perturbation methodology from Zwanenburg et al. (2019)1 and related radiomics reproducibility research:
- Literature-validated perturbation chains: In the original Zwanenburg et al. (2019) test–retest validation, NTCV and RCV combinations detected ~98–99% of unstable features with <2% false positives in their specific datasets.1 Important: These operating characteristics were established on specific test–retest datasets with particular parameter configurations. rtpipeline currently implements an NTCV-like perturbation chain only (RCV is not implemented) and has not independently validated these exact figures—users should treat them as literature benchmarks, not performance guarantees.
- Volume adaptation: Iterative erosion/dilation (±15% volume change) provides a clinically plausible approximation of contour perturbations, as commonly used in CT radiomics reproducibility studies.
- Reliability statistics: ICC(3,1) with analytical confidence intervals and complementary CoV thresholds support conservative clinical adoption.3
- Standardization: The IBSI initiative2 provides standardized definitions for radiomics features and recommends reporting perturbation details and preprocessing provenance.
Robustness Thresholds (Configurable Defaults)¶
The default thresholds in rtpipeline are informed by published recommendations, particularly Koo & Li (2016)3 for ICC interpretation and common practices in CT radiomics reproducibility studies:
ICC (Intraclass Correlation Coefficient): Following Koo & Li (2016) qualitative descriptors: - ICC ≥ 0.90: Excellent — used as "robust" threshold in rtpipeline (conservative choice) - 0.75 ≤ ICC < 0.90: Good — used as "acceptable" threshold (standard) - 0.50 ≤ ICC < 0.75: Moderate — not recommended for clinical modeling - ICC < 0.50: Poor
CoV (Coefficient of Variation): CoV is computed as 100 × (standard deviation / mean) of perturbation-level feature values. rtpipeline defaults to CoV ≤10% as "robust" based on thresholds frequently reported in CT radiomics reproducibility studies: - CoV ≤ 10%: Robust (conservative default threshold) - 10% < CoV ≤ 20%: Acceptable - CoV > 20%: Poor
Note: For features with means close to zero, CoV can become numerically unstable; users should manually review features with extreme CoV values.
Important: These are configurable defaults inspired by published conventions, not consensus standards. Users should adapt thresholds to their specific clinical and statistical context.
For highly conservative applications (delta-radiomics, adaptive RT), users may configure a stricter CoV ≤5% threshold via config.yaml:
A feature is classified as "robust" if it meets both ICC and CoV thresholds. These are conventions commonly used in the radiomics literature, not universal standards; users should adapt thresholds to their specific clinical context.
Perturbation Intensity Levels¶
The module supports three intensity levels to balance computational cost and thoroughness. Typical use cases are annotated with recommended metrics:
- mild: ~10-15 perturbations (QA spot checks, contouring pilot studies)
- standard: 15-30 perturbations (recommended for most pelvic CT applications)
- aggressive: 30-60 perturbations (research-grade, adaptive RT / multi-centre validation)
Perturbation Parameter Defaults¶
The following table summarizes the default perturbation parameters used by rtpipeline:
| Parameter | Default Value | Description |
|---|---|---|
| N (Noise) | [0.0] HU |
Gaussian noise σ; set noise_levels: [0.0, 10.0, 20.0] to enable |
| T (Translation) | 0.0 mm |
Max shift; set max_translation_mm: 3.0 for ±3mm |
| C (Contour) | 0 realizations |
Boundary randomizations; set n_random_contour_realizations: 3 to enable |
| V (Volume) | [-0.15, 0.0, 0.15] |
Volume change ratios (±15% erosion/dilation) |
| Intensity | "standard" |
Controls perturbation count: mild/standard/aggressive |
Algorithm details:
- Volume adaptation: Iterative morphological erosion (τ < 0) or dilation (τ > 0) using ball structuring element until target volume change is achieved (max 20 iterations)
- Contour randomization: Random selection of erosion→dilation or dilation→erosion smoothing sequence with radius proportional to
max_translation_mm / 2 - Translation: Rigid shifts applied via SimpleITK
ResampleImageFilterwith nearest-neighbor interpolation
Reproducibility: Random Seed¶
For deterministic, reproducible perturbations, rtpipeline uses a fixed random seed:
This ensures that the same configuration produces identical perturbation sequences across runs. The seed is incremented per perturbation to ensure variation while maintaining reproducibility. Currently, this seed is not user-configurable; for different seed values, modify the source code directly.
Quick Start¶
1. Enable in config.yaml¶
Basic configuration (volume-only perturbations):
radiomics_robustness:
enabled: true
modes:
- segmentation_perturbation
segmentation_perturbation:
apply_to_structures:
- "GTV*"
- "CTV*"
- "BLADDER"
- "RECTUM"
small_volume_changes: [-0.15, 0.0, 0.15] # ±15% volume change
intensity: "standard" # Options: mild, standard, aggressive
Advanced NTCV configuration (comprehensive testing):
radiomics_robustness:
enabled: true
modes:
- segmentation_perturbation
segmentation_perturbation:
apply_to_structures:
- "GTV*"
- "CTV*"
- "BLADDER"
- "RECTUM"
# Volume perturbations (V in NTCV)
small_volume_changes: [-0.15, -0.10, -0.05, 0.0, 0.05, 0.10, 0.15]
large_volume_changes: [-0.30, -0.20, -0.10, 0.0, 0.10, 0.20, 0.30]
# Translation perturbations (T in NTCV)
max_translation_mm: 3.0 # ±3mm rigid shifts
# Contour randomization (C in NTCV)
n_random_contour_realizations: 3 # 3 boundary randomization variants
# Image noise injection (N in NTCV)
noise_levels: [0.0, 10.0, 20.0] # Gaussian noise std dev in HU
# Perturbation intensity
intensity: "aggressive" # Options: mild (10-15), standard (15-30), aggressive (30-60)
2. Install dependencies¶
pip install pingouin # Required for ICC computation
# or install with radiomics extras:
pip install -e ".[radiomics]"
3. Run with Snakemake¶
The robustness analysis runs after standard radiomics extraction and produces:
- Per-course results: Data_Snakemake/{patient}/{course}/radiomics_robustness_ct.parquet containing all perturbation-level feature values
- Aggregated summary: Data_Snakemake/_RESULTS/radiomics_robustness_summary.xlsx
4. Use robust features for modeling¶
The output Excel file contains multiple sheets:
global_summary: Features averaged across all structures/coursesrobust_features: Only features classified as "robust" (ICC ≥ 0.90, CoV ≤ 10%)acceptable_features: Features meeting "acceptable" thresholds (ICC ≥ 0.75, CoV ≤ 20%)per_source_summary(if available): Cohort metrics grouped by segmentation sourceper_structure_source: Detailed per-structure breakdown (preserving segmentation source)raw_values: All perturbation-level feature values used for the cohort statisticsrobust_features_per_source(if available): Robust features for each segmentation source
Example workflow:
import pandas as pd
# Load results
summary = pd.read_excel("Data_Snakemake/_RESULTS/radiomics_robustness_summary.xlsx",
sheet_name="robust_features")
# Filter radiomics data to robust features only
robust_feature_names = summary["feature_name"].tolist()
radiomics = pd.read_excel("Data_Snakemake/_RESULTS/radiomics_ct.xlsx")
radiomics_robust = radiomics[radiomics.columns.intersection(robust_feature_names)]
# Use radiomics_robust for modeling
CLI Commands¶
Per-course analysis¶
rtpipeline radiomics-robustness \
--course-dir Data_Snakemake/Patient001/Course001 \
--config config.yaml \
--output radiomics_robustness_ct.parquet
Aggregate results¶
rtpipeline radiomics-robustness-aggregate \
--inputs Data_Snakemake/*/*/radiomics_robustness_ct.parquet \
--output radiomics_robustness_summary.xlsx \
--config config.yaml
Configuration Reference¶
Full config.yaml example¶
radiomics_robustness:
enabled: true
modes:
- segmentation_perturbation # Currently supported
# - segmentation_method # Future: compare manual vs auto segmentation
# - scan_rescan # Future: test-retest reliability
segmentation_perturbation:
# Structures to analyze (wildcards supported)
apply_to_structures:
- "GTV*" # Matches GTV, GTV_primary, GTV_node, etc.
- "CTV*"
- "PTV*"
- "BLADDER"
- "RECTUM"
- "PROSTATE"
# V: Volume changes (τ parameter from Lo Iacono 2024)
small_volume_changes: [-0.15, 0.0, 0.15] # ±15% volume change (standard)
large_volume_changes: [-0.30, 0.0, 0.30] # ±30% for stress testing
# T: Translation perturbations (mm)
# Rigid shifts in x, y, z directions to simulate positioning uncertainty
max_translation_mm: 3.0 # Set to 0.0 to disable
# C: Contour randomization
# Simulates inter-observer variability in delineation
n_random_contour_realizations: 3 # Set to 0 to disable
# N: Noise injection (HU)
# Gaussian noise to simulate scanner variability
noise_levels: [0.0, 10.0, 20.0] # Standard deviations in HU
# Perturbation intensity (controls total perturbation count)
# - "mild": ~10-15 perturbations per ROI (quick testing)
# - "standard": 15-30 perturbations per ROI (recommended)
# - "aggressive": 30-60 perturbations per ROI (research-grade)
intensity: "standard"
metrics:
icc:
implementation: "pingouin" # ICC computation library
icc_type: "ICC3" # ICC(3,1): two-way mixed, consistency for fixed perturbations
ci: true # Compute 95% confidence intervals
cov:
enabled: true
qcd:
enabled: true # QCD = (Q3 - Q1) / (Q3 + Q1), a robust dispersion measure
thresholds:
icc:
robust: 0.90 # Conservative clinical threshold (2023-2025 research)
acceptable: 0.75 # Standard threshold (Zwanenburg 2019)
cov:
robust_pct: 10.0 # CoV ≤ 10%: "robust" (standard)
acceptable_pct: 20.0 # CoV ≤ 20%: "acceptable"
ICC Implementation Details¶
rtpipeline computes ICC using Pingouin's intraclass_corr function4 with the following configuration:
- ICC Type: ICC(3,1) — two-way mixed effects model, single measurement, consistency agreement
- Subject encoding: Each unique combination of
patient_id + course_id + structure + segmentation_sourceis treated as one "subject" (so repeated courses for the same patient are distinct subjects) - Rater encoding: Each
perturbation_id(e.g.,ntcv_n10_t1_0_0_c0_v-0.15) is treated as a "rater" - Confidence intervals: Pingouin's analytical 95% CIs (not bootstrap-based)
This design treats perturbations as fixed "raters" of the same underlying subject and uses ICC(3,1) (two-way mixed, single, consistency), following the fixed-rater rationale described by Koo & Li (2016).3 This is an adaptation of ICC to perturbation-based robustness analysis rather than a standard inter-rater scenario. The ICC(3,1) model was chosen because:
- Fixed perturbations: The perturbation set is chosen by the researcher, not randomly sampled from a population of possible raters
- Single measurements: Each perturbation produces one feature value per ROI
- Consistency: We measure relative agreement, not absolute agreement (small systematic offsets are acceptable)
Methodological caveat: This interpretation is one reasonable choice for perturbation-based robustness analysis, but it is not universally standardized. Other ICC formulations (e.g., absolute-agreement models or ICC(2,1)) could also be justified depending on study design. Users should verify that ICC(3,1) aligns with their specific reliability framework and may adjust the icc_type configuration if needed.
Conservative thresholding: When 95% confidence intervals are available, rtpipeline uses the lower CI bound for robustness classification rather than the point estimate. This means a feature is classified as "robust" only if ICC_CI95_lower ≥ 0.90. This conservative approach reduces false positives in robustness labeling but may exclude borderline features. If the CI is unavailable, the point estimate is used directly.
Sample size note: ICC estimates derived from small numbers of perturbations (e.g., <10) or small cohorts (<20 subjects) may be unstable with wide confidence intervals. Users should ensure sufficient perturbations (typically ≥10–15 per ROI) for reliable ICC estimation.
Example data structure for ICC computation:
| subject | rater | feature_value |
|----------------------------------|--------------------------|---------------|
| patient001_course1_bladder_auto | ntcv_n0_t0_0_0_c0_v0.0 | 0.456 |
| patient001_course1_bladder_auto | ntcv_n10_t0_0_0_c0_v0.0 | 0.461 |
| patient001_course1_bladder_auto | ntcv_n0_t1_0_0_c0_v-0.15 | 0.448 |
| ... | ... | ... |
NTCV Perturbation Chain Explanation¶
The NTCV chain follows the systematic perturbation methodology from Zwanenburg et al. (2019).1 The order (Noise → Translation → Contour → Volume) ensures proper propagation of uncertainty sources:
- N (Noise): Adds Gaussian noise to the CT image
- Simulates scanner variability and acquisition differences
- Applied to the image, not the mask
-
Typical values: 0, 10, 20 HU std dev
-
T (Translation): Applies rigid geometric shifts
- Simulates positioning uncertainty and registration errors
- Applied before contour randomization
-
Typical values: ±3-5 mm in x, y, z directions
-
C (Contour): Randomizes ROI boundaries
- Simulates inter-observer delineation variability
- Applied to the already-shifted ROI
-
Uses boundary noise simulation
-
V (Volume): Systematic erosion/dilation
- Final morphological adjustment
- Tests feature stability across ROI size variations
- Typical values: ±15% (standard) or ±30% (stress testing)
Total perturbations = N_noise × N_translation × N_contour × N_volume
Example:
- 3 noise levels × 3 translations × 2 contours × 5 volumes = 90 perturbations (too many)
- With intensity: "aggressive", the module intelligently selects subsets to reach 30-60 perturbations
Output Format¶
Per-course parquet file¶
Columns:
- structure: ROI name (e.g., "BLADDER", "GTV_primary")
- segmentation_source: Source of segmentation (e.g., "AutoRTS_total", "Custom", "CustomModel:cardiac_STOPSTORM")
- feature_name: PyRadiomics feature (e.g., "original_glcm_Correlation")
- n_perturbations: Number of perturbations tested
- icc: ICC point estimate
- icc_ci95_low, icc_ci95_high: 95% confidence interval
- cov_pct: Coefficient of Variation (%)
- qcd: Quartile Coefficient of Dispersion, defined as (Q3 − Q1) / (Q3 + Q1) from perturbation-level values; a robust alternative to CoV for skewed distributions
- robustness_label: "robust", "acceptable", or "poor"
- pass_seg_perturb: Boolean (True if robust or acceptable)
Aggregated Excel file¶
Multiple sheets for easy filtering: 1. global_summary: Features averaged across all structures/courses/sources 2. per_source_summary: Features averaged by segmentation source (TotalSegmentator vs Custom vs Custom Models) 3. per_structure_source: Detailed per-structure-source breakdown 4. robust_features: Only ICC ≥ 0.90 and CoV ≤ 10% (global) 5. acceptable_features: ICC ≥ 0.75 and CoV ≤ 20% (global) 6. robust_features_per_source: Robust features broken down by segmentation source
Best Practices¶
Feature Selection Strategy (Based on 2023-2025 Research)¶
- Primary recommendation: Use only "robust" features (ICC ≥ 0.90, CoV ≤ 10%) for predictive models
- Modern clinical applications increasingly demand ICC ≥0.90 (conservative threshold)
-
A substantial proportion of features can meet this threshold, though the exact fraction is highly dataset- and protocol-dependent
-
Multi-center studies: Consider "acceptable" features (ICC ≥ 0.75, CoV ≤ 20%) if data scarcity requires it
- Standard threshold from Zwanenburg 2019
-
Suitable for exploratory analysis or hypothesis generation
-
Cost-effective alternative: Perturbation-based stability analysis provides a practical alternative to expensive test-retest imaging
- In the original Zwanenburg et al. (2019) study, NTCV-like chains detected ~98–99% of unstable features with <2% false positives on their specific test-retest datasets
-
These are literature benchmarks; users should not assume identical performance on their data
-
Perturbation intensity selection:
- mild: Quick screening, pilot studies
- standard: Production use, clinical applications (recommended)
-
aggressive: Research-grade, publication-quality analysis, multi-center studies
-
Recommended NTCV configuration for clinical RT:
This generates ~25-30 perturbations per ROI - sufficient for robust stability assessment.
Structure and Modality Considerations¶
- Structure selection: Focus on clinically relevant structures (GTV, CTV, organs at risk)
- Volume thresholds: Features from very small ROIs are often unstable regardless of type
- Multi-source analysis: Compare robustness across segmentation sources (TotalSegmentator vs Custom structures)
- CT-specific: Current implementation optimized for CT-based radiomics in bladder, prostate, and rectal cancer
- Validation: If possible, validate feature stability on independent test-retest data
Advanced Considerations¶
- Harmonization: For multi-scanner studies, consider CovBat harmonization (outperforms traditional ComBat)
-
Note: CovBat implementation not included in this module - apply as preprocessing
-
Feature selection methods: Consider stability-aware approaches like Graph-FS that maintain performance across institutions
-
Combines stability metrics with predictive performance
-
Discretization: Ensure consistent bin width (HU) or bin count across all perturbations
-
PyRadiomics settings controlled via radiomics_params.yaml
-
Preprocessing chains: Robustness to resampling and discretization variations important for multi-center studies
- Test with different voxel sizes if data will come from multiple scanners
Example Methods Paragraph for Publications¶
When describing rtpipeline's robustness analysis in a manuscript, consider using language similar to:
Radiomics Feature Stability Assessment
Radiomics feature stability was assessed using rtpipeline's perturbation-based robustness module, which implements a perturbation chain methodology inspired by Zwanenburg et al.[1] For each ROI, [N] systematic perturbations were generated combining Gaussian noise injection (σ = 0, 10, 20 HU), rigid translations (±3 mm), contour randomization, and volume adaptation (±15% erosion/dilation). Features were re-extracted for each perturbation using PyRadiomics [version].
Feature stability was quantified using the intraclass correlation coefficient ICC(3,1) computed via Pingouin[2], with each perturbation treated as a fixed rater measuring the same underlying subject (patient-course-structure combination). This interpretation follows the fixed-rater rationale described by Koo & Li[3] but represents an adaptation of ICC to perturbation-based analysis rather than a standard inter-rater scenario. Features with ICC ≥ 0.90 and coefficient of variation (CoV) ≤ 10% were classified as "robust" following commonly used thresholds in the radiomics literature.[3] Only robust features were retained for subsequent modeling.
[1] Zwanenburg A, et al. Assessing robustness of radiomic features by image perturbation. Sci Rep. 2019;9:614. DOI: 10.1038/s41598-018-36938-4 [2] Vallat R. Pingouin: statistics in Python. JOSS. 2018;3(31):1026. DOI: 10.21105/joss.01026 [3] Koo TK, Li MY. A guideline of selecting and reporting ICC for reliability research. J Chiropr Med. 2016;15(2):155-163. DOI: 10.1016/j.jcm.2016.02.012
Important: Adjust the specific parameter values (noise levels, translation distances, volume changes, thresholds) to match your actual configuration. If using the CTV1 D95 heuristic for Rx estimation, explicitly note this limitation. If using Fast Mode for segmentation, note that lower-resolution segmentations were used.
Typical Feature Families by Robustness¶
Based on radiomics reproducibility literature (feature-type patterns are generally consistent across studies, though specific results vary):
Generally Robust: - Shape features (Volume, SurfaceArea, Sphericity) - First-order statistics (Mean, Median, Energy) - Some GLDM features
Moderately Robust: - GLCM features (with appropriate normalization) - GLRLM features
Often Fragile: - High-order texture features without careful preprocessing - Features from very small ROIs - Features sensitive to discretization
Troubleshooting¶
"Radiomics robustness is disabled"¶
Enable it in config.yaml: radiomics_robustness.enabled: true
"Pingouin not available"¶
Install: pip install pingouin>=0.5.3
"No structures matched robustness patterns"¶
Check apply_to_structures patterns in config. Verify structure names match your patterns in:
- RS_auto.dcm (TotalSegmentator)
- RS_custom.dcm (Custom structures)
- Segmentation_{model_name}/rtstruct.dcm (Custom models)
"Insufficient perturbations for X; skipping"¶
Some small structures may fail erosion/dilation. This is expected; they'll be skipped automatically.
Key References¶
- Zwanenburg A, et al. (2019). "Assessing robustness of radiomic features by image perturbation." Scientific Reports 9, 614. DOI: 10.1038/s41598-018-36938-4
- Zwanenburg A, et al. (2020). "The Image Biomarker Standardization Initiative (IBSI)." Radiology 295(2):328-338. DOI: 10.1148/radiol.2020191145
- Koo TK, Li MY. (2016). "A guideline of selecting and reporting intraclass correlation coefficients for reliability research." Journal of Chiropractic Medicine 15(2):155-163. DOI: 10.1016/j.jcm.2016.02.012
- Vallat R. (2018). "Pingouin: statistics in Python." Journal of Open Source Software 3(31):1026. DOI: 10.21105/joss.01026
Note: Additional references cited in the literature review sections represent reported findings from the radiomics reproducibility literature. Users should verify specific citations for their own publications.
What's New (2025)¶
NTCV Perturbation Chain Implementation¶
Based on 2023-2025 radiomics stability research:
✅ Implemented: - NTCV (Noise + Translation + Contour + Volume) perturbation chains - Image noise injection (Gaussian noise in HU) - Rigid translation perturbations (±3-5 mm shifts) - Contour randomization (boundary noise simulation) - Configurable perturbation intensity (mild/standard/aggressive) - Research-grade testing: 30-60 perturbations per ROI - Conservative clinical thresholds: ICC >0.90 and CoV <10%
Key improvements over basic volume-only perturbations: - Comprehensive stability testing following Zwanenburg 2019 NTCV methodology - Literature-reported performance: Zwanenburg et al. achieved ~98–99% sensitivity with <2% false positives on their specific test-retest datasets—these serve as benchmarks, not guarantees - Cost-effective alternative to expensive test-retest imaging - Multi-axis perturbations capture different sources of variability
Research Basis¶
The implementation is informed by radiomics reproducibility literature findings: 1. ICC ≥0.75 and CoV ≤10% are commonly used thresholds (ICC ≥0.90 for conservative clinical use)3 2. Systematic perturbation chains combining geometric and image-based variations improve robustness assessment 3. 30-60 perturbed versions per ROI are often recommended for comprehensive assessment 4. A substantial proportion of features (varies by study and structure) may meet stability thresholds 5. Perturbation-based methods provide a practical alternative when test-retest data is unavailable
Future Enhancements¶
The following methods are referenced in the literature sections above but are not yet implemented in rtpipeline. They are planned for future releases:
- CovBat harmonization: Advanced harmonization method (outperforms traditional ComBat) — must be applied as external preprocessing
- Graph-FS feature selection: Stability-aware feature selection maintaining multi-institution performance
- Segmentation-method robustness: Compare Manual vs TotalSegmentator vs custom models
- Scan-rescan ICC: Test-retest reliability for longitudinal studies
- Panel-averaged features: Zwanenburg-style feature averaging across perturbations
- Preprocessing variations: Resampling and discretization robustness testing
Footnote References¶
Note: Some citations in earlier sections reference findings from the broader radiomics reproducibility literature. Users should independently verify specific references for publication purposes.
-
A. Zwanenburg et al., "Assessing robustness of radiomic features by image perturbation," Scientific Reports 9, 614 (2019). DOI: 10.1038/s41598-018-36938-4 ↩↩↩↩↩
-
A. Zwanenburg et al., "The Image Biomarker Standardization Initiative," Radiology 295(2):328–338 (2020). DOI: 10.1148/radiol.2020191145 ↩
-
T. K. Koo and M. Y. Li, "A guideline of selecting and reporting intraclass correlation coefficients for reliability research," Journal of Chiropractic Medicine 15(2):155–163 (2016). DOI: 10.1016/j.jcm.2016.02.012 ↩↩↩↩↩↩
-
R. Vallat, "Pingouin: statistics in Python," Journal of Open Source Software 3(31):1026 (2018). DOI: 10.21105/joss.01026 ↩↩