Home

RTpipeline

The Big Data Radiotherapy Pipeline
From raw clinical exports to research-ready datasets in one command.

Automated Workflow From DICOM to Analysis in one step
AI Segmentation Built-in TotalSegmentator & nnU-Net
Standardized Data Comparable metrics across cohorts

RTpipeline v2.1.0

The current release centers on three manuscript-critical capabilities: a first-class radiomics robustness stage with per-course Parquet plus cohort-level Excel summaries, a dual-environment execution model that separates TotalSegmentator from PyRadiomics, and configurable NTCV perturbation chains for robustness screening.

Introduction¶

Radiation oncology is undergoing a rapid transformation driven by the increasing availability of large-scale, routinely collected clinical and imaging data. Modern radiotherapy departments generate comprehensive digital records for each treated patient, including planning computed tomography (CT) scans, delineated target volumes and organs at risk (OARs), three-dimensional dose distributions, and detailed treatment plans stored in standardized Digital Imaging and Communications in Medicine (DICOM) objects. Together with electronic health records and cancer registries, these data sources constitute a rich substrate for developing normal tissue complication probability (NTCP) models, radiomics-based biomarkers, and data-driven clinical decision support systems.

However, the realization of this potential hinges critically on the ability to transform heterogeneous, clinically oriented data into reproducible, analysis-ready research datasets.

The Technical Gap¶

Despite substantial efforts in standardization, several technical barriers impede large-scale radiotherapy data analysis:

Challenge	Impact
Vendor heterogeneity	DICOM RT objects exhibit non-trivial variability across TPS vendors (Eclipse, RayStation, Monaco), software versions, and local conventions
Structure naming chaos	Identical anatomical regions named `Heart`, `heart`, `hrt`, `Coeur` across patients and institutions
Bespoke scripts	Individual researchers re-implement DICOM parsing for each project, creating fragile, undocumented code
Scale limitations	Manual QC and structure mapping become infeasible beyond a few hundred patients
Reproducibility collapse	Each student maintains their own version of preprocessing scripts

RTpipeline: An ETL Framework for Radiotherapy¶

Within health informatics, the Extract-Transform-Load (ETL) paradigm has emerged as a foundational concept for managing complex data flows. RTpipeline is a dedicated, research-grade ETL framework specifically tailored for radiotherapy DICOM data.

┌─────────────────┐     ┌──────────────────────────────────┐     ┌─────────────────┐
│     EXTRACT     │     │           TRANSFORM              │     │      LOAD       │
│                 │     │                                  │     │                 │
│  • DICOM CT     │     │  • Structure harmonization       │     │  • DVH tables   │
│  • RTSTRUCT     │ ──► │  • TotalSegmentator              │ ──► │  • Radiomics    │
│  • RTDOSE       │     │  • Systematic cropping           │     │  • Metadata     │
│  • RTPLAN       │     │  • Robustness analysis           │     │  • QC reports   │
│                 │     │                                  │     │                 │
└─────────────────┘     └──────────────────────────────────┘     └─────────────────┘

Key Capabilities¶

1. Automated Data Engineering¶

Problem: Clinical TPS exports produce messy, unstructured DICOM files with scattered series, inconsistent naming, and locked binary data.

Solution: RTpipeline's Organization Engine automatically:

Groups thousands of DICOM files into patient courses (e.g., Patient123/Course_2023-01)
Links Plans, Doses, and Images even across different folders using DICOM UIDs
Reconciles frame-of-reference mismatches and dose grid transformations

2. Standardized Anatomy via AI Segmentation¶

Problem: Physicians only contour clinically relevant structures. A toxicity study on splenic dose requires manual re-contouring of hundreds of patients.

Solution: RTpipeline integrates TotalSegmentator to automatically generate 100+ standardized anatomical structures (version- and task-dependent) for every patient:

Structure Category	Examples
Cardiovascular	`heart`, `aorta`, `pulmonary_artery`
Respiratory	`lung_left`, `lung_right`, `trachea`
Gastrointestinal	`esophagus`, `stomach`, `small_bowel`, `colon`
Genitourinary	`kidney_left`, `kidney_right`, `urinary_bladder`
Musculoskeletal	All vertebrae, ribs, pelvis, femurs

Research Impact: Every patient now has a heart contour named exactly heart, regardless of what the physician drew.

3. Systematic Field-of-View Standardization¶

Problem: Percentage metrics like \(V_{20\%}\) are meaningless when total body volume denominators differ due to variable scan lengths.

Solution: Systematic CT Cropping uses anatomical landmarks (vertebrae, femoral heads) to crop every CT to consistent physical boundaries:

Before cropping:
  Patient A (long scan): V20Gy = 500cc / 18,000cc = 2.8%
  Patient B (short scan): V20Gy = 500cc / 15,000cc = 3.3%

After cropping to L1 → Femoral heads:
  Patient A: V20Gy = 500cc / 12,000cc = 4.2%
  Patient B: V20Gy = 500cc / 11,500cc = 4.3%

This makes cohort-level comparison statistically defensible.

4. Robustness-Aware Radiomics¶

Problem: Radiomics features can be unstable under minor variations in image noise, contour delineation, or scanner settings—leading to non-reproducible signatures.

Solution: RTpipeline implements NTCV perturbation chains (Zwanenburg et al., 2019):

Noise: Gaussian noise injection simulating scanner variability
Translation: Rigid geometric shifts simulating positioning uncertainty
Contour: Boundary randomization simulating inter-observer variability
Volume: Erosion/dilation simulating segmentation uncertainty

The quick-start container profile can run a conservative, volume-first robustness configuration, while manuscript-grade N/T/C/V chains are enabled explicitly in config.yaml.

Features are classified by robustness:

ICC Threshold	Classification	Recommendation
ICC ≥ 0.90	Robust	Use for predictive modeling
0.75 ≤ ICC < 0.90	Acceptable	Use with caution
ICC < 0.75	Poor	Exclude from analysis

5. Analysis-Ready Outputs¶

Forget parsing DICOM tags. RTpipeline produces tidy, standardized data tables:

_RESULTS/
├── dvh_metrics.xlsx      # Dmean, D95%, V20Gy for every structure
├── radiomics_ct.xlsx     # 1000+ IBSI-aligned features (via PyRadiomics)
├── radiomics_robustness_summary.xlsx  # ICC/CoV/QCD summary across perturbations
├── case_metadata.xlsx    # Clinical tags, scanner info, kernels
└── qc_reports.xlsx       # Quality control flags and warnings

6. High-Performance Architecture¶

Problem: Analyzing 1,000 patients with deep learning models usually takes weeks of compute time.

Solution: RTpipeline is built for speed:

GPU Acceleration: Uses CUDA for TotalSegmentator and custom models
Parallel Optimization: Automatically saturates available CPU cores for DVH and Radiomics
Smart Scaling: Prevents crashes by adapting to available RAM

Technical Details →

Quick Start¶

Option 1: Interactive Docker Setup (Recommended)¶

curl -sSL https://raw.githubusercontent.com/kstawiski/rtpipeline/main/setup_docker_project.sh | bash

Follow the wizard to generate your docker-compose.yml and start the Web UI.

Option 2: Manual Docker Start¶

# Create project structure
mkdir -p Input Output Logs

# Start pipeline
docker-compose up -d

# Open Web UI at http://localhost:8080

Option 3: Google Colab¶

Try RTpipeline in the cloud with free GPU access:

Who Is RTpipeline For?¶

PhD Students & Postdocs¶

"Spend your PhD on science, not on reinventing DICOM parsing."

Accelerate from data collection to analysis in days, not months
Focus thesis time on methods and hypotheses, not infrastructure
Built-in tools for methodological rigor (ICC, perturbations, IBSI-aligned features)

PhD Quick Start Guide →

Clinical Researchers¶

"A bridge between your TPS and your statistician."

Minimal coding required—use prepared configs and the Web UI
Turn routine clinical plans into analyzable datasets
Excel/CSV outputs with clinically meaningful variable names

Web UI Guide →

Multi-Center Consortia¶

"Same code and settings at every center."

Shared configuration files ensure methodological consistency
Federated analysis—raw data never needs to leave the institution
Publish config bundles as supplementary materials with DOI

Multi-Center Case Study →

Data Flow Architecture¶

graph LR
    A[TPS Export\nRaw DICOM] --> B(RTpipeline\nOrchestrator);
    B --> C{Organization\nEngine};
    C --> D[AI Segmentation\nTotalSegmentator];
    C --> E[Custom Models\nnnU-Net];
    C --> F[CT Cropping\nFOV Standardization];
    D --> G[Analysis Engine];
    E --> G;
    F --> G;
    G --> H[DVH Calculator];
    G --> I[Radiomics Extractor\n+ NTCV Robustness];
    H --> J[Tidy Tables\n.xlsx / .csv];
    I --> J;

Case Studies¶

Learn how RTpipeline is used in real research scenarios:

Case Study	Description
NTCP Modeling	Build rectal toxicity models from DVH metrics
Radiomics Signatures	Develop robust imaging biomarkers with NTCV
Multi-Center AI	Federated learning with harmonized data

Technical Innovations¶

NTCV Perturbation Chains¶

Implements the published methodology from Zwanenburg et al. (2019) for radiomics feature robustness assessment. Learn more →

Systematic CT Cropping¶

Anatomical landmark-based cropping for standardized FOV across cohorts. Learn more →

Dual Environment Architecture¶

Resolves NumPy 1.x vs 2.x incompatibility between PyRadiomics and TotalSegmentator via isolated rtpipeline and rtpipeline-radiomics conda environments. Learn more →

Custom nnU-Net Models¶

Plug-and-play support for institution-specific segmentation models. Learn more →

Documentation Sections¶

Getting Started

From zero to your first analyzed patient

Introduction
User Guide

Output formats, interpretation, troubleshooting

Output Reference
Features

CT cropping, radiomics robustness, custom models

Feature Docs
Technical

Architecture, parallelization, security

Technical Docs
Case Studies

Real-world research applications

Case Studies

Limitations & Disclaimers¶

Research Use Only

RTpipeline is a research tool and is not a medical device. It has not been validated for clinical decision-making and should not be used for patient care without independent validation by qualified professionals.

Important limitations:

Segmentation accuracy: TotalSegmentator and custom models may produce errors. Always review AI-generated contours before clinical use.
Robustness benchmarks: The 98-99% sensitivity figures cited in documentation are literature benchmarks from Zwanenburg et al. (2019), not performance guarantees for this implementation.
IBSI alignment: Features are extracted via PyRadiomics with IBSI-informed settings, but full IBSI compliance requires independent validation against digital phantoms.
Multi-center use: Site-specific validation is essential before deploying across institutions.

Citation¶

If you use RTpipeline for research, please cite the repository and the underlying tools:

@software{rtpipeline,
  title = {RTpipeline: Automated Radiotherapy DICOM Processing Pipeline},
  author = {Stawiski, Konrad},
  url = {https://github.com/kstawiski/rtpipeline},
  year = {2025}
}

Additionally, please cite:

TotalSegmentator: Wasserthal et al., Radiology: Artificial Intelligence (2023)
PyRadiomics: van Griethuysen et al., Cancer Research (2017)
IBSI: Zwanenburg et al., Radiology (2020)

License¶

This project is licensed under the MIT License. See LICENSE for details.