Reproducible Research with RTpipeline¶

A guide to ensuring methodological rigor and reproducibility in radiotherapy data science

This document provides guidance for researchers—particularly PhD students and postdocs—on how to leverage RTpipeline's features to produce reproducible, publication-ready analyses.

Why Reproducibility Matters¶

In radiotherapy research, reproducibility failures often stem from:

Undocumented preprocessing - DVH interpolation methods, dose grid resolutions, and structure definitions are not specified
Environment drift - Package versions change between analysis runs
Configuration ambiguity - Key parameters (e.g., radiomics bin width) are buried in code
Data provenance gaps - No clear link between raw DICOM and final features

RTpipeline addresses these by encoding all preprocessing decisions in version-controlled configuration files and providing complete audit trails.

The Reproducibility Checklist¶

Use this checklist before submitting a manuscript or thesis:

Essential Elements¶

[ ] RTpipeline version recorded (git commit hash or release tag)
[ ] Complete config.yaml saved with results
[ ] Conda environment exported (conda env export > environment.yml)
[ ] QC reports reviewed and saved
[ ] Structure mapping dictionary documented
[ ] Random seeds fixed (where applicable)

For Radiomics Studies¶

[ ] PyRadiomics parameters documented (radiomics_params.yaml)
[ ] IBSI feature definitions referenced
[ ] Robustness analysis (ICC, CoV) completed
[ ] Perturbation chain parameters recorded

For Multi-Center Studies¶

[ ] Shared configuration bundle created and versioned
[ ] All centers using identical RTpipeline version
[ ] Structure mapping harmonized across sites
[ ] QC criteria applied consistently

Configuration as Protocol¶

The Key Principle¶

Your config.yaml IS your preprocessing protocol. It should be:

Version-controlled (git)
Published as supplementary material
Citable (via DOI if possible)

Example Configuration Header¶

# RTpipeline Configuration for Prostate Toxicity Study
# Protocol Version: 1.2
# Date: 2025-01-15
# DOI: 10.5281/zenodo.XXXXXXX
#
# This configuration defines the complete preprocessing pipeline
# for the PROTECT-01 multi-center prostate radiotherapy study.
#
# Modifications from v1.1:
# - Updated ICC threshold from 0.75 to 0.90 per reviewer feedback
# - Added V75Gy metric for rectal dose-response analysis

version: "1.2"
study_id: "PROTECT-01"

Key Configuration Sections to Document¶

# Structure harmonization
structure_mapping:
  RECTUM:
    patterns: ["Rectum*", "RECT*", "rectum*"]
    required: true
    # Documentation: Maps all local rectum names to canonical RECTUM

# DVH computation parameters
dvh:
  interpolation: "linear"  # Method for DVH curve interpolation
  dose_units: "Gy"         # Absolute dose units
  volume_type: "relative"  # Relative (%) or absolute (cc)
  metrics:
    - Dmean
    - Dmax
    - D2cc
    - V70Gy

# Radiomics extraction
radiomics:
  binWidth: 25              # Fixed bin width in HU (IBSI compliant)
  resampling_mm: [1, 1, 3]  # Isotropic x/y, 3mm z
  interpolator: "sitkBSpline"

# Robustness assessment
radiomics_robustness:
  enabled: true
  thresholds:
    icc:
      robust: 0.90      # Features with ICC >= 0.90 classified as robust
      acceptable: 0.75
    cov:
      robust_pct: 10.0  # Features with CoV <= 10% classified as robust

Environment Reproducibility¶

Dual Environment Architecture¶

RTpipeline uses two conda environments to resolve dependency conflicts:

Environment	Purpose	Key Dependencies
`rtpipeline`	Main processing	NumPy 2.x, TotalSegmentator, SimpleITK
`rtpipeline-radiomics`	Radiomics extraction	NumPy 1.x, PyRadiomics

Exporting Environments¶

# Export main environment
conda activate rtpipeline
conda env export --no-builds > rtpipeline-environment.yml

# Export radiomics environment
conda activate rtpipeline-radiomics
conda env export --no-builds > rtpipeline-radiomics-environment.yml

Docker for Complete Reproducibility¶

For maximum reproducibility, use the Docker image with a specific tag:

# Use a specific version, not :latest
docker pull kstawiski/rtpipeline:v2.1.0

# Record the image digest
docker inspect --format='{{.RepoDigests}}' kstawiski/rtpipeline:v2.1.0

Document in your methods:

"All preprocessing was performed using RTpipeline Docker image kstawiski/rtpipeline:v2.1.0 (SHA256: abc123...)."

For v2.1.0, also record whether radiomics and robustness were executed through the default dual-environment setup (rtpipeline + rtpipeline-radiomics) or a custom local installation.

Methods Section Templates¶

Template 1: DVH Extraction¶

**Dose-Volume Histogram Analysis**

DVH metrics were extracted using RTpipeline (version 2.1.0, git commit: XXXXXX; replace if using another tag) [1].
DICOM RTSTRUCT, RTDOSE, and RTPLAN files were exported from [TPS name/version]
and processed using the following standardized protocol:

1. **Structure Harmonization:** Contour names were mapped to canonical nomenclature
   using a predefined dictionary (Supplementary Table S1). Structures named
   [list variations] were mapped to the canonical label [STRUCTURE_NAME].

2. **DVH Computation:** Cumulative DVH curves were computed using [interpolation
   method] interpolation with a dose resolution of [X] Gy. The following metrics
   were derived:
   - Mean dose (D_mean)
   - Maximum dose (D_max)
   - Dose to hottest 2cc (D_2cc)
   - Volume receiving ≥X Gy (V_XGy)

3. **Quality Control:** Cases with [QC criteria] were flagged and reviewed.
   [N] cases were excluded due to [reasons].

The complete preprocessing configuration is available as Supplementary File S2.

[1] RTpipeline. https://github.com/kstawiski/rtpipeline

Template 2: Radiomics with Robustness Assessment¶

**Radiomic Feature Extraction and Stability Assessment**

CT radiomic features were extracted using RTpipeline (version 2.1.0; replace if using another tag) [1] with
PyRadiomics (version 3.0.1) [2] following Image Biomarker Standardisation
Initiative (IBSI) recommendations [3].

**Preprocessing:**
- Images were resampled to [X × X × X mm] isotropic voxels using
  [interpolation method]
- Intensity discretization used a fixed bin width of [X] HU
- [CT cropping method, if applicable]

**Feature Extraction:**
A total of [N] features were extracted from [structure(s)], comprising:
- Shape features (N = X)
- First-order statistics (N = X)
- Texture features: GLCM (N = X), GLRLM (N = X), GLSZM (N = X), GLDM (N = X)
- [Filtered features if applicable]

**Robustness Assessment:**
Feature stability was assessed using NTCV perturbation chains following
Zwanenburg et al. [4]. For each ROI, [N] perturbations were generated:
- Gaussian noise injection (σ = 0, 10, 20 HU)
- Rigid translations (±3 mm in each axis)
- Contour randomization (N = X realizations)
- Volume adaptation (±15% erosion/dilation)

Intraclass correlation coefficients ICC(3,1) were computed using Pingouin [5].
Features were classified as:
- **Robust:** ICC ≥ 0.90 and CoV ≤ 10%
- **Acceptable:** ICC ≥ 0.75 and CoV ≤ 20%
- **Poor:** ICC < 0.75 or CoV > 20%

[N] features ([X]%) met robustness criteria and were retained for modeling.

**References:**
[1] RTpipeline. https://github.com/kstawiski/rtpipeline
[2] van Griethuysen JJM, et al. Cancer Res. 2017;77(21):e104-e107.
[3] Zwanenburg A, et al. Radiology. 2020;295(2):328-338.
[4] Zwanenburg A, et al. Sci Rep. 2019;9:614.
[5] Vallat R. JOSS. 2018;3(31):1026.

Template 3: Multi-Center Harmonization¶

**Multi-Center Data Harmonization**

Data from [N] centers were harmonized using RTpipeline (version 2.1.0; replace if using another tag) [1] as a
standardized ETL framework. Each center independently processed local DICOM
exports using an identical configuration bundle (Supplementary File S3).

**Harmonization Protocol:**
1. **Structure Mapping:** A consortium-wide structure dictionary was defined
   (Supplementary Table S1) mapping site-specific nomenclature to [N] canonical
   structure labels.

2. **Preprocessing Standardization:**
   - CT resampling: [X × X × X mm]
   - Dose grid interpolation: [method]
   - Field-of-view cropping: [region]-based using TotalSegmentator landmarks

3. **Quality Control:**
   - All centers applied identical QC criteria (Supplementary Table S2)
   - [N] cases across all sites were excluded due to [reasons]

4. **Feature Extraction:**
   - DVH metrics: [list]
   - Radiomics features: [list or reference to table]

**Data Governance:**
Raw DICOM data remained at each institution. Only [harmonized feature
tables / model weights / aggregated statistics] were shared with the
coordinating center.

The RTpipeline configuration bundle is available at [DOI/URL].

[1] RTpipeline. https://github.com/kstawiski/rtpipeline

Provenance Tracking¶

RTpipeline Output Structure¶

Each processed course includes provenance metadata:

Output/{PatientID}/{CourseID}/
├── metadata/
│   ├── case_metadata.xlsx       # DICOM tag extraction
│   ├── processing_log.txt       # Complete processing log
│   ├── config_snapshot.yaml     # Config used for this case
│   └── cropping_metadata.json   # CT cropping boundaries
├── qc_reports/
│   └── qc_summary.json          # Quality control flags
└── ...

Linking Outputs to Inputs¶

import pandas as pd

# Load metadata with DICOM provenance
metadata = pd.read_excel("Output/Patient001/Course001/metadata/case_metadata.xlsx")

# Key provenance fields
print("Source DICOM UIDs:")
print(f"  CT Series: {metadata['ct_series_uid'].iloc[0]}")
print(f"  RTSTRUCT: {metadata['rtstruct_sop_uid'].iloc[0]}")
print(f"  RTDOSE: {metadata['dose_sop_uid'].iloc[0]}")
print(f"  RTPLAN: {metadata['plan_sop_uid'].iloc[0]}")

Reproducibility Validation¶

Regression Testing¶

When updating RTpipeline versions or configurations:

# Run on reference dataset
rtpipeline run --config reference_config.yaml --input test_data/ --output test_output_v2/

# Compare outputs
python -c "
import pandas as pd
import numpy as np

v1 = pd.read_excel('test_output_v1/_RESULTS/dvh_metrics.xlsx')
v2 = pd.read_excel('test_output_v2/_RESULTS/dvh_metrics.xlsx')

# Numerical columns
numeric_cols = v1.select_dtypes(include=[np.number]).columns

# Compare
diff = (v1[numeric_cols] - v2[numeric_cols]).abs()
max_diff = diff.max().max()
print(f'Maximum absolute difference: {max_diff}')

if max_diff > 1e-6:
    print('WARNING: Outputs differ!')
    print(diff[diff > 1e-6].dropna(how='all'))
"

Hash Verification¶

For critical analyses, compute checksums:

import hashlib
import pandas as pd

def compute_df_hash(filepath):
    """Compute SHA256 hash of a DataFrame."""
    df = pd.read_excel(filepath)
    return hashlib.sha256(
        pd.util.hash_pandas_object(df).values
    ).hexdigest()

# Record hashes
hashes = {
    "dvh_metrics": compute_df_hash("_RESULTS/dvh_metrics.xlsx"),
    "radiomics_ct": compute_df_hash("_RESULTS/radiomics_ct.xlsx")
}

# Save for reproducibility verification
import json
with open("output_hashes.json", "w") as f:
    json.dump(hashes, f, indent=2)

Publication Checklist¶

Before submitting your manuscript:

Data Availability Statement¶

**Data Availability**

The RTpipeline configuration files used in this study are available at
[DOI/GitHub URL]. Due to [privacy/IRB/GDPR] restrictions, the raw DICOM
data cannot be shared. Aggregated results tables are provided as
Supplementary Data.

Code Availability Statement¶

**Code Availability**

All data preprocessing was performed using RTpipeline (version 2.1.0,
https://github.com/kstawiski/rtpipeline, DOI: XXX). The complete
configuration bundle, including structure mapping dictionaries and
radiomics parameters, is available at [DOI]. Analysis scripts for
statistical modeling are available at [GitHub URL].

Supplementary Materials to Include¶

config.yaml - Complete preprocessing configuration
structure_mapping.yaml - Structure name harmonization dictionary
radiomics_params.yaml - PyRadiomics extraction settings
environment.yml - Conda environment specification
qc_criteria.md - Quality control inclusion/exclusion criteria

Best Practices Summary¶

Area	Recommendation
Version Control	Git-track all configuration files
Environment	Use Docker or export conda environments
Configuration	Document all parameters in config files, not code
Provenance	Save config snapshots with each analysis
QC	Review and document all exclusions
Methods	Use templates with specific parameter values
Sharing	Publish config bundles with DOI