Skip to content

Home

RTpipeline
The Big Data Radiotherapy Pipeline
From raw clinical exports to research-ready datasets in one command.
  • Automated Workflow From DICOM to Analysis in one step
  • AI Segmentation Built-in TotalSegmentator & nnU-Net
  • Standardized Data Comparable metrics across cohorts

RTpipeline v2.1.0

The current release centers on three manuscript-critical capabilities: a first-class radiomics robustness stage with per-course Parquet plus cohort-level Excel summaries, a dual-environment execution model that separates TotalSegmentator from PyRadiomics, and configurable NTCV perturbation chains for robustness screening.


Introduction

Radiation oncology is undergoing a rapid transformation driven by the increasing availability of large-scale, routinely collected clinical and imaging data. Modern radiotherapy departments generate comprehensive digital records for each treated patient, including planning computed tomography (CT) scans, delineated target volumes and organs at risk (OARs), three-dimensional dose distributions, and detailed treatment plans stored in standardized Digital Imaging and Communications in Medicine (DICOM) objects. Together with electronic health records and cancer registries, these data sources constitute a rich substrate for developing normal tissue complication probability (NTCP) models, radiomics-based biomarkers, and data-driven clinical decision support systems.

However, the realization of this potential hinges critically on the ability to transform heterogeneous, clinically oriented data into reproducible, analysis-ready research datasets.

The Technical Gap

Despite substantial efforts in standardization, several technical barriers impede large-scale radiotherapy data analysis:

Challenge Impact
Vendor heterogeneity DICOM RT objects exhibit non-trivial variability across TPS vendors (Eclipse, RayStation, Monaco), software versions, and local conventions
Structure naming chaos Identical anatomical regions named Heart, heart, hrt, Coeur across patients and institutions
Bespoke scripts Individual researchers re-implement DICOM parsing for each project, creating fragile, undocumented code
Scale limitations Manual QC and structure mapping become infeasible beyond a few hundred patients
Reproducibility collapse Each student maintains their own version of preprocessing scripts

RTpipeline: An ETL Framework for Radiotherapy

Within health informatics, the Extract-Transform-Load (ETL) paradigm has emerged as a foundational concept for managing complex data flows. RTpipeline is a dedicated, research-grade ETL framework specifically tailored for radiotherapy DICOM data.

┌─────────────────┐     ┌──────────────────────────────────┐     ┌─────────────────┐
│     EXTRACT     │     │           TRANSFORM              │     │      LOAD       │
│                 │     │                                  │     │                 │
│  • DICOM CT     │     │  • Structure harmonization       │     │  • DVH tables   │
│  • RTSTRUCT     │ ──► │  • TotalSegmentator              │ ──► │  • Radiomics    │
│  • RTDOSE       │     │  • Systematic cropping           │     │  • Metadata     │
│  • RTPLAN       │     │  • Robustness analysis           │     │  • QC reports   │
│                 │     │                                  │     │                 │
└─────────────────┘     └──────────────────────────────────┘     └─────────────────┘

Key Capabilities

1. Automated Data Engineering

Problem: Clinical TPS exports produce messy, unstructured DICOM files with scattered series, inconsistent naming, and locked binary data.

Solution: RTpipeline's Organization Engine automatically:

  • Groups thousands of DICOM files into patient courses (e.g., Patient123/Course_2023-01)
  • Links Plans, Doses, and Images even across different folders using DICOM UIDs
  • Reconciles frame-of-reference mismatches and dose grid transformations

2. Standardized Anatomy via AI Segmentation

Problem: Physicians only contour clinically relevant structures. A toxicity study on splenic dose requires manual re-contouring of hundreds of patients.

Solution: RTpipeline integrates TotalSegmentator to automatically generate 100+ standardized anatomical structures (version- and task-dependent) for every patient:

Structure Category Examples
Cardiovascular heart, aorta, pulmonary_artery
Respiratory lung_left, lung_right, trachea
Gastrointestinal esophagus, stomach, small_bowel, colon
Genitourinary kidney_left, kidney_right, urinary_bladder
Musculoskeletal All vertebrae, ribs, pelvis, femurs

Research Impact: Every patient now has a heart contour named exactly heart, regardless of what the physician drew.

3. Systematic Field-of-View Standardization

Problem: Percentage metrics like \(V_{20\%}\) are meaningless when total body volume denominators differ due to variable scan lengths.

Solution: Systematic CT Cropping uses anatomical landmarks (vertebrae, femoral heads) to crop every CT to consistent physical boundaries:

Before cropping:
  Patient A (long scan): V20Gy = 500cc / 18,000cc = 2.8%
  Patient B (short scan): V20Gy = 500cc / 15,000cc = 3.3%

After cropping to L1 → Femoral heads:
  Patient A: V20Gy = 500cc / 12,000cc = 4.2%
  Patient B: V20Gy = 500cc / 11,500cc = 4.3%

This makes cohort-level comparison statistically defensible.

4. Robustness-Aware Radiomics

Problem: Radiomics features can be unstable under minor variations in image noise, contour delineation, or scanner settings—leading to non-reproducible signatures.

Solution: RTpipeline implements NTCV perturbation chains (Zwanenburg et al., 2019):

  • Noise: Gaussian noise injection simulating scanner variability
  • Translation: Rigid geometric shifts simulating positioning uncertainty
  • Contour: Boundary randomization simulating inter-observer variability
  • Volume: Erosion/dilation simulating segmentation uncertainty

The quick-start container profile can run a conservative, volume-first robustness configuration, while manuscript-grade N/T/C/V chains are enabled explicitly in config.yaml.

Features are classified by robustness:

ICC Threshold Classification Recommendation
ICC ≥ 0.90 Robust Use for predictive modeling
0.75 ≤ ICC < 0.90 Acceptable Use with caution
ICC < 0.75 Poor Exclude from analysis

5. Analysis-Ready Outputs

Forget parsing DICOM tags. RTpipeline produces tidy, standardized data tables:

_RESULTS/
├── dvh_metrics.xlsx      # Dmean, D95%, V20Gy for every structure
├── radiomics_ct.xlsx     # 1000+ IBSI-aligned features (via PyRadiomics)
├── radiomics_robustness_summary.xlsx  # ICC/CoV/QCD summary across perturbations
├── case_metadata.xlsx    # Clinical tags, scanner info, kernels
└── qc_reports.xlsx       # Quality control flags and warnings

6. High-Performance Architecture

Problem: Analyzing 1,000 patients with deep learning models usually takes weeks of compute time.

Solution: RTpipeline is built for speed:

  • GPU Acceleration: Uses CUDA for TotalSegmentator and custom models
  • Parallel Optimization: Automatically saturates available CPU cores for DVH and Radiomics
  • Smart Scaling: Prevents crashes by adapting to available RAM

Technical Details →


Quick Start

curl -sSL https://raw.githubusercontent.com/kstawiski/rtpipeline/main/setup_docker_project.sh | bash

Follow the wizard to generate your docker-compose.yml and start the Web UI.

Option 2: Manual Docker Start

# Create project structure
mkdir -p Input Output Logs

# Start pipeline
docker-compose up -d

# Open Web UI at http://localhost:8080

Option 3: Google Colab

Try RTpipeline in the cloud with free GPU access:


Who Is RTpipeline For?

PhD Students & Postdocs

"Spend your PhD on science, not on reinventing DICOM parsing."

  • Accelerate from data collection to analysis in days, not months
  • Focus thesis time on methods and hypotheses, not infrastructure
  • Built-in tools for methodological rigor (ICC, perturbations, IBSI-aligned features)

PhD Quick Start Guide →

Clinical Researchers

"A bridge between your TPS and your statistician."

  • Minimal coding required—use prepared configs and the Web UI
  • Turn routine clinical plans into analyzable datasets
  • Excel/CSV outputs with clinically meaningful variable names

Web UI Guide →

Multi-Center Consortia

"Same code and settings at every center."

  • Shared configuration files ensure methodological consistency
  • Federated analysis—raw data never needs to leave the institution
  • Publish config bundles as supplementary materials with DOI

Multi-Center Case Study →


Data Flow Architecture

graph LR
    A[TPS Export\nRaw DICOM] --> B(RTpipeline\nOrchestrator);
    B --> C{Organization\nEngine};
    C --> D[AI Segmentation\nTotalSegmentator];
    C --> E[Custom Models\nnnU-Net];
    C --> F[CT Cropping\nFOV Standardization];
    D --> G[Analysis Engine];
    E --> G;
    F --> G;
    G --> H[DVH Calculator];
    G --> I[Radiomics Extractor\n+ NTCV Robustness];
    H --> J[Tidy Tables\n.xlsx / .csv];
    I --> J;

Case Studies

Learn how RTpipeline is used in real research scenarios:

Case Study Description
NTCP Modeling Build rectal toxicity models from DVH metrics
Radiomics Signatures Develop robust imaging biomarkers with NTCV
Multi-Center AI Federated learning with harmonized data

Technical Innovations

NTCV Perturbation Chains

Implements the published methodology from Zwanenburg et al. (2019) for radiomics feature robustness assessment. Learn more →

Systematic CT Cropping

Anatomical landmark-based cropping for standardized FOV across cohorts. Learn more →

Dual Environment Architecture

Resolves NumPy 1.x vs 2.x incompatibility between PyRadiomics and TotalSegmentator via isolated rtpipeline and rtpipeline-radiomics conda environments. Learn more →

Custom nnU-Net Models

Plug-and-play support for institution-specific segmentation models. Learn more →


Documentation Sections


Limitations & Disclaimers

Research Use Only

RTpipeline is a research tool and is not a medical device. It has not been validated for clinical decision-making and should not be used for patient care without independent validation by qualified professionals.

Important limitations:

  • Segmentation accuracy: TotalSegmentator and custom models may produce errors. Always review AI-generated contours before clinical use.
  • Robustness benchmarks: The 98-99% sensitivity figures cited in documentation are literature benchmarks from Zwanenburg et al. (2019), not performance guarantees for this implementation.
  • IBSI alignment: Features are extracted via PyRadiomics with IBSI-informed settings, but full IBSI compliance requires independent validation against digital phantoms.
  • Multi-center use: Site-specific validation is essential before deploying across institutions.

Citation

If you use RTpipeline for research, please cite the repository and the underlying tools:

@software{rtpipeline,
  title = {RTpipeline: Automated Radiotherapy DICOM Processing Pipeline},
  author = {Stawiski, Konrad},
  url = {https://github.com/kstawiski/rtpipeline},
  year = {2025}
}

Additionally, please cite:

  • TotalSegmentator: Wasserthal et al., Radiology: Artificial Intelligence (2023)
  • PyRadiomics: van Griethuysen et al., Cancer Research (2017)
  • IBSI: Zwanenburg et al., Radiology (2020)

License

This project is licensed under the MIT License. See LICENSE for details.