RTpipeline v2.1.0 Architecture¶
RTpipeline is a radiotherapy ETL pipeline that turns raw DICOM exports into research-ready tables, derived RTSTRUCTs, QC artifacts, and robustness summaries. In v2.1.0, the architecture centers on three design choices:
- Course-first orchestration: organize DICOM into patient/course units, then run every downstream stage on those units.
- Dual-environment execution: keep TotalSegmentator and the rest of the pipeline on a modern NumPy 2.x stack while routing PyRadiomics and robustness analysis through a compatible NumPy 1.26 environment.
- Robustness as a first-class stage: radiomics stability screening is integrated into the main workflow, not bolted on afterward.
High-Level Flow¶
graph LR
A[Raw DICOM export] --> B[Organize and hydrate courses]
B --> C[Segmentation and RTSTRUCT synthesis]
B --> D[CT cropping]
C --> E[DVH and metadata]
C --> F[Radiomics extraction]
D --> E
D --> F
F --> G[Robustness perturbations]
E --> H[_RESULTS tables]
F --> H
G --> H
Core Runtime Components¶
| Layer | Primary modules | What it does | Main outputs |
|---|---|---|---|
| Organization | rtpipeline.cli, rtpipeline.organize |
Groups scattered DICOM into coherent patient/course folders and hydrates existing manifests | Course directories under output_dir/{patient}/{course} |
| Segmentation | rtpipeline.segmentation, rtpipeline.auto_rtstruct, rtpipeline.custom_models |
Runs TotalSegmentator and optional custom nnU-Net models, then emits standardized RTSTRUCTs | RS_auto.dcm, custom-model RTSTRUCTs, segmentation NIfTIs |
| CT standardization | rtpipeline.anatomical_cropping |
Applies anatomy-aware FOV normalization when enabled | Cropped CT/RTSTRUCT variants |
| Dose/QC/metadata | rtpipeline.dvh, rtpipeline.quality_control, rtpipeline.metadata |
Computes DVH metrics, QC reports, and case-level metadata | dvh_metrics.xlsx, qc_reports.xlsx, case_metadata.xlsx |
| Radiomics | rtpipeline.radiomics, rtpipeline.radiomics_conda, rtpipeline.radiomics_parallel |
Extracts IBSI-oriented CT/MR radiomics with process isolation and thread caps | radiomics_ct.xlsx, radiomics_mr.xlsx |
| Robustness | rtpipeline.radiomics_robustness, CLI subcommands radiomics-robustness and radiomics-robustness-aggregate |
Runs perturbation-based feature stability analysis and cohort aggregation | Per-course radiomics_robustness_ct.parquet, aggregate radiomics_robustness_summary.xlsx |
Dual-Environment Design¶
v2.1.0 deliberately separates the pipeline into two conda environments:
| Environment | Defined in | Main purpose | Key packages |
|---|---|---|---|
rtpipeline |
envs/rtpipeline.yaml |
Organization, segmentation, DVH, QC, orchestration | Python 3.11, NumPy 2.x, TotalSegmentator 2.12.0, PyTorch 2.3 |
rtpipeline-radiomics |
envs/rtpipeline-radiomics.yaml |
Radiomics extraction and robustness statistics | Python 3.10, NumPy 1.26, PyRadiomics 3.0.1, Pingouin, PyArrow |
Why this exists¶
TotalSegmentator and the modern imaging toolchain are happiest on newer NumPy and Python versions, while PyRadiomics remains pinned to an older compatibility window. Rather than forcing one compromise environment, RTpipeline detects when radiomics work must be delegated and launches it in rtpipeline-radiomics.
What this means operationally¶
- Docker builds both environments ahead of time in the image, so users do not need to manage them manually.
- Local/conda runs should preserve both YAMLs for reproducibility if radiomics or robustness are enabled.
- Radiomics and robustness inherit their own thread limits to avoid BLAS/OpenMP oversubscription.
Robustness Module Architecture¶
The robustness module is a standard stage in the pipeline rather than a side workflow.
Perturbation model¶
RTpipeline implements a configurable NTCV-style perturbation chain:
- N: Gaussian noise injection
- T: rigid translations
- C: contour randomization
- V: volume adaptation via erosion/dilation
The shipped container profile keeps a conservative default with volume perturbations enabled and the other axes disabled unless explicitly configured. For manuscript-grade robustness studies, noise_levels, max_translation_mm, and n_random_contour_realizations should be set in config.yaml.
Outputs¶
- Per course:
radiomics_robustness_ct.parquet - Cohort aggregate:
_RESULTS/radiomics_robustness_summary.xlsx - Typical aggregate sheets:
global_summary,robust_features,acceptable_features, and source-aware breakdowns when multiple segmentation sources are present
Failure model¶
Robustness is designed to be informative without making the entire ETL brittle:
- per-course sentinel files record success/failure
- aggregation skips missing or failed course-level outputs
- thread and worker limits mirror the same defensive scheduling used for radiomics
Orchestration and Scheduling¶
RTpipeline can be launched directly through rtpipeline CLI commands or via Snakemake.
CLI surface¶
rtpipeline doctor: environment and GPU sanity checksrtpipeline validate: config and environment validationrtpipeline radiomics-robustness: course-level robustness extractionrtpipeline radiomics-robustness-aggregate: cohort-level aggregation
Scheduling model¶
- Inter-course parallelism: adaptive worker pools fan out independent patient/course tasks
- Segmentation: typically serialized or lightly parallelized on GPU hosts to avoid memory contention
- Radiomics/robustness: process-isolated workloads with explicit thread caps
- Memory pressure handling: worker pools can back off automatically instead of hard-failing immediately
Versioned Artifacts¶
The architecture described here corresponds to RTpipeline 2.1.0, with the version declared in:
pyproject.tomlrtpipeline/__init__.pyDockerfileimage label
If you publish results, cite the exact Docker tag or git commit alongside the configuration file used for the run.
Adaptive Worker Progress Logging¶
Function: _log_progress() (utils.py, lines 306-320)
Format: "Label: X/Y (Z%) elapsed Ats ETA Bs"
Updates: Per task completion (if show_progress=True)
Example: "Segmentation: 3/10 (30%) elapsed 12.5s ETA 29.2s"
Memory Pressure Detection¶
Pattern Matching: (utils.py, lines 286-303)
_MEMORY_PATTERNS = (
"out of memory",
"cuda out of memory",
"cublas status alloc failed",
"std::bad_alloc",
"cannot allocate memory",
"failed to allocate",
"not enough memory",
"mmap failed",
"oom",
)
8. KEY CONFIGURATION DEFAULTS¶
Absolute Defaults (Hardcoded)¶
workers: min(--cores, CPU count) - 1 (auto)
segmentation_workers: auto (GPU sequential, CPU inherits workers) # override via config
segmentation_thread_limit: None (no limit per worker)
radiomics_thread_limit: 4 (if config specifies)
radiomics max_workers: _calculate_optimal_workers()
custom_models_workers: 1 (GPU constraint)
totalseg_device: "gpu"
totalseg_force_split: True
totalseg_nr_thr_resamp: 1
totalseg_nr_thr_saving: 1
totalseg_num_proc_pre: 1
totalseg_num_proc_export: 1
Conda Dependencies (GPU-enabled)¶
File: /home/user/rtpipeline/envs/rtpipeline.yaml
- pytorch=2.3.*
- pytorch-cuda=12.1 # CUDA 12.1 support
- torchvision=0.18.*
- torchaudio=2.3.*
- TotalSegmentator>=2.4.0
9. CURRENT OPTIMIZATION OPPORTUNITIES¶
Identified in Codebase¶
- Thread Limit Per Worker: Can be set but defaults to unlimited
- GPU Memory Optimization: force_split already enabled
- Process Pool Caching: Weights pre-loading enabled
- Sequential Fallback: Available via --sequential-radiomics flag
Configuration Examples¶
Maximum GPU Utilization:
rtpipeline \
--dicom-root data/ \
--outdir output/ \
--seg-workers 4 \
--max-workers 8 \
--totalseg-device gpu
Memory-Constrained System:
rtpipeline \
--dicom-root data/ \
--outdir output/ \
--max-workers 2 \
--seg-workers 1 \
--seg-proc-threads 4 \
--radiomics-proc-threads 2 \
--sequential-radiomics
CPU-Only Mode:
rtpipeline \
--dicom-root data/ \
--outdir output/ \
--totalseg-device cpu \
--max-workers 8 \
--seg-workers 2
10. SUMMARY TABLE¶
| Component | Type | Location | Key Parameters |
|---|---|---|---|
| Main Orchestrator | CLI | cli.py | --max-workers, --seg-workers |
| Inter-course Parallelism | ThreadPoolExecutor | utils.py:323 | max_workers, min_workers |
| Radiomics Parallelism | ProcessPoolExecutor | radiomics_parallel.py:311 | max_workers, thread_limit |
| GPU Segmentation | External (TotalSegmentator) | segmentation.py:243 | --totalseg-device |
| Memory Adaptation | Dynamic Scaling | utils.py:363-460 | Memory error detection |
| Thread Management | Environment Variables | Across modules | OMP_NUM_THREADS, etc. |
| Configuration | YAML + CLI | config.yaml, cli.py | All parameters |
| Container Orchestration | Docker Compose | docker-compose.yml | GPU/CPU profiles |
| Logging | File + Console | Logs/rtpipeline.log | Progress tracking |
File Reference Summary¶
Core Pipeline Files¶
- config.py (89 lines) - Configuration dataclass
- cli.py (924 lines) - Command-line interface & orchestration
- utils.py (466 lines) - Adaptive worker pool implementation
- segmentation.py (836 lines) - TotalSegmentator integration
- radiomics_parallel.py (590 lines) - Process-based radiomics
- radiomics.py (first 150+ lines) - Feature extraction
- custom_models.py (100+ lines) - Custom model execution
Configuration Files¶
- config.yaml - Runtime configuration
- docker-compose.yml - Container orchestration
- envs/rtpipeline.yaml - Conda environment definition
- custom_structures_*.yaml - Structure definitions