Home
From raw clinical exports to research-ready datasets in one command.
- Automated Workflow From DICOM to Analysis in one step
- AI Segmentation Built-in TotalSegmentator & nnU-Net
- Standardized Data Comparable metrics across cohorts
RTpipeline v2.1.0
The current release centers on three manuscript-critical capabilities: a first-class radiomics robustness stage with per-course Parquet plus cohort-level Excel summaries, a dual-environment execution model that separates TotalSegmentator from PyRadiomics, and configurable NTCV perturbation chains for robustness screening.
Introduction¶
Radiation oncology is undergoing a rapid transformation driven by the increasing availability of large-scale, routinely collected clinical and imaging data. Modern radiotherapy departments generate comprehensive digital records for each treated patient, including planning computed tomography (CT) scans, delineated target volumes and organs at risk (OARs), three-dimensional dose distributions, and detailed treatment plans stored in standardized Digital Imaging and Communications in Medicine (DICOM) objects. Together with electronic health records and cancer registries, these data sources constitute a rich substrate for developing normal tissue complication probability (NTCP) models, radiomics-based biomarkers, and data-driven clinical decision support systems.
However, the realization of this potential hinges critically on the ability to transform heterogeneous, clinically oriented data into reproducible, analysis-ready research datasets.
The Technical Gap¶
Despite substantial efforts in standardization, several technical barriers impede large-scale radiotherapy data analysis:
| Challenge | Impact |
|---|---|
| Vendor heterogeneity | DICOM RT objects exhibit non-trivial variability across TPS vendors (Eclipse, RayStation, Monaco), software versions, and local conventions |
| Structure naming chaos | Identical anatomical regions named Heart, heart, hrt, Coeur across patients and institutions |
| Bespoke scripts | Individual researchers re-implement DICOM parsing for each project, creating fragile, undocumented code |
| Scale limitations | Manual QC and structure mapping become infeasible beyond a few hundred patients |
| Reproducibility collapse | Each student maintains their own version of preprocessing scripts |
RTpipeline: An ETL Framework for Radiotherapy¶
Within health informatics, the Extract-Transform-Load (ETL) paradigm has emerged as a foundational concept for managing complex data flows. RTpipeline is a dedicated, research-grade ETL framework specifically tailored for radiotherapy DICOM data.
┌─────────────────┐ ┌──────────────────────────────────┐ ┌─────────────────┐
│ EXTRACT │ │ TRANSFORM │ │ LOAD │
│ │ │ │ │ │
│ • DICOM CT │ │ • Structure harmonization │ │ • DVH tables │
│ • RTSTRUCT │ ──► │ • TotalSegmentator │ ──► │ • Radiomics │
│ • RTDOSE │ │ • Systematic cropping │ │ • Metadata │
│ • RTPLAN │ │ • Robustness analysis │ │ • QC reports │
│ │ │ │ │ │
└─────────────────┘ └──────────────────────────────────┘ └─────────────────┘
Key Capabilities¶
1. Automated Data Engineering¶
Problem: Clinical TPS exports produce messy, unstructured DICOM files with scattered series, inconsistent naming, and locked binary data.
Solution: RTpipeline's Organization Engine automatically:
- Groups thousands of DICOM files into patient courses (e.g.,
Patient123/Course_2023-01) - Links Plans, Doses, and Images even across different folders using DICOM UIDs
- Reconciles frame-of-reference mismatches and dose grid transformations
2. Standardized Anatomy via AI Segmentation¶
Problem: Physicians only contour clinically relevant structures. A toxicity study on splenic dose requires manual re-contouring of hundreds of patients.
Solution: RTpipeline integrates TotalSegmentator to automatically generate 100+ standardized anatomical structures (version- and task-dependent) for every patient:
| Structure Category | Examples |
|---|---|
| Cardiovascular | heart, aorta, pulmonary_artery |
| Respiratory | lung_left, lung_right, trachea |
| Gastrointestinal | esophagus, stomach, small_bowel, colon |
| Genitourinary | kidney_left, kidney_right, urinary_bladder |
| Musculoskeletal | All vertebrae, ribs, pelvis, femurs |
Research Impact: Every patient now has a heart contour named exactly heart, regardless of what the physician drew.
3. Systematic Field-of-View Standardization¶
Problem: Percentage metrics like \(V_{20\%}\) are meaningless when total body volume denominators differ due to variable scan lengths.
Solution: Systematic CT Cropping uses anatomical landmarks (vertebrae, femoral heads) to crop every CT to consistent physical boundaries:
Before cropping:
Patient A (long scan): V20Gy = 500cc / 18,000cc = 2.8%
Patient B (short scan): V20Gy = 500cc / 15,000cc = 3.3%
After cropping to L1 → Femoral heads:
Patient A: V20Gy = 500cc / 12,000cc = 4.2%
Patient B: V20Gy = 500cc / 11,500cc = 4.3%
This makes cohort-level comparison statistically defensible.
4. Robustness-Aware Radiomics¶
Problem: Radiomics features can be unstable under minor variations in image noise, contour delineation, or scanner settings—leading to non-reproducible signatures.
Solution: RTpipeline implements NTCV perturbation chains (Zwanenburg et al., 2019):
- Noise: Gaussian noise injection simulating scanner variability
- Translation: Rigid geometric shifts simulating positioning uncertainty
- Contour: Boundary randomization simulating inter-observer variability
- Volume: Erosion/dilation simulating segmentation uncertainty
The quick-start container profile can run a conservative, volume-first robustness configuration, while manuscript-grade N/T/C/V chains are enabled explicitly in config.yaml.
Features are classified by robustness:
| ICC Threshold | Classification | Recommendation |
|---|---|---|
| ICC ≥ 0.90 | Robust | Use for predictive modeling |
| 0.75 ≤ ICC < 0.90 | Acceptable | Use with caution |
| ICC < 0.75 | Poor | Exclude from analysis |
5. Analysis-Ready Outputs¶
Forget parsing DICOM tags. RTpipeline produces tidy, standardized data tables:
_RESULTS/
├── dvh_metrics.xlsx # Dmean, D95%, V20Gy for every structure
├── radiomics_ct.xlsx # 1000+ IBSI-aligned features (via PyRadiomics)
├── radiomics_robustness_summary.xlsx # ICC/CoV/QCD summary across perturbations
├── case_metadata.xlsx # Clinical tags, scanner info, kernels
└── qc_reports.xlsx # Quality control flags and warnings
6. High-Performance Architecture¶
Problem: Analyzing 1,000 patients with deep learning models usually takes weeks of compute time.
Solution: RTpipeline is built for speed:
- GPU Acceleration: Uses CUDA for TotalSegmentator and custom models
- Parallel Optimization: Automatically saturates available CPU cores for DVH and Radiomics
- Smart Scaling: Prevents crashes by adapting to available RAM
Quick Start¶
Option 1: Interactive Docker Setup (Recommended)¶
curl -sSL https://raw.githubusercontent.com/kstawiski/rtpipeline/main/setup_docker_project.sh | bash
Follow the wizard to generate your docker-compose.yml and start the Web UI.
Option 2: Manual Docker Start¶
# Create project structure
mkdir -p Input Output Logs
# Start pipeline
docker-compose up -d
# Open Web UI at http://localhost:8080
Option 3: Google Colab¶
Try RTpipeline in the cloud with free GPU access:
Who Is RTpipeline For?¶
PhD Students & Postdocs¶
"Spend your PhD on science, not on reinventing DICOM parsing."
- Accelerate from data collection to analysis in days, not months
- Focus thesis time on methods and hypotheses, not infrastructure
- Built-in tools for methodological rigor (ICC, perturbations, IBSI-aligned features)
Clinical Researchers¶
"A bridge between your TPS and your statistician."
- Minimal coding required—use prepared configs and the Web UI
- Turn routine clinical plans into analyzable datasets
- Excel/CSV outputs with clinically meaningful variable names
Multi-Center Consortia¶
"Same code and settings at every center."
- Shared configuration files ensure methodological consistency
- Federated analysis—raw data never needs to leave the institution
- Publish config bundles as supplementary materials with DOI
Data Flow Architecture¶
graph LR
A[TPS Export\nRaw DICOM] --> B(RTpipeline\nOrchestrator);
B --> C{Organization\nEngine};
C --> D[AI Segmentation\nTotalSegmentator];
C --> E[Custom Models\nnnU-Net];
C --> F[CT Cropping\nFOV Standardization];
D --> G[Analysis Engine];
E --> G;
F --> G;
G --> H[DVH Calculator];
G --> I[Radiomics Extractor\n+ NTCV Robustness];
H --> J[Tidy Tables\n.xlsx / .csv];
I --> J;
Case Studies¶
Learn how RTpipeline is used in real research scenarios:
| Case Study | Description |
|---|---|
| NTCP Modeling | Build rectal toxicity models from DVH metrics |
| Radiomics Signatures | Develop robust imaging biomarkers with NTCV |
| Multi-Center AI | Federated learning with harmonized data |
Technical Innovations¶
NTCV Perturbation Chains¶
Implements the published methodology from Zwanenburg et al. (2019) for radiomics feature robustness assessment. Learn more →
Systematic CT Cropping¶
Anatomical landmark-based cropping for standardized FOV across cohorts. Learn more →
Dual Environment Architecture¶
Resolves NumPy 1.x vs 2.x incompatibility between PyRadiomics and TotalSegmentator via isolated rtpipeline and rtpipeline-radiomics conda environments. Learn more →
Custom nnU-Net Models¶
Plug-and-play support for institution-specific segmentation models. Learn more →
Documentation Sections¶
-
Getting Started
From zero to your first analyzed patient
-
User Guide
Output formats, interpretation, troubleshooting
-
Features
CT cropping, radiomics robustness, custom models
-
Technical
Architecture, parallelization, security
-
Case Studies
Real-world research applications
Limitations & Disclaimers¶
Research Use Only
RTpipeline is a research tool and is not a medical device. It has not been validated for clinical decision-making and should not be used for patient care without independent validation by qualified professionals.
Important limitations:
- Segmentation accuracy: TotalSegmentator and custom models may produce errors. Always review AI-generated contours before clinical use.
- Robustness benchmarks: The 98-99% sensitivity figures cited in documentation are literature benchmarks from Zwanenburg et al. (2019), not performance guarantees for this implementation.
- IBSI alignment: Features are extracted via PyRadiomics with IBSI-informed settings, but full IBSI compliance requires independent validation against digital phantoms.
- Multi-center use: Site-specific validation is essential before deploying across institutions.
Citation¶
If you use RTpipeline for research, please cite the repository and the underlying tools:
@software{rtpipeline,
title = {RTpipeline: Automated Radiotherapy DICOM Processing Pipeline},
author = {Stawiski, Konrad},
url = {https://github.com/kstawiski/rtpipeline},
year = {2025}
}
Additionally, please cite:
- TotalSegmentator: Wasserthal et al., Radiology: Artificial Intelligence (2023)
- PyRadiomics: van Griethuysen et al., Cancer Research (2017)
- IBSI: Zwanenburg et al., Radiology (2020)
License¶
This project is licensed under the MIT License. See LICENSE for details.