Docker and Singularity Guide for rtpipeline¶
Overview¶
This document provides comprehensive guidance for running rtpipeline in Docker and Singularity containers, including compatibility information for optimization features, hang prevention, web UI support, and the latest radiomics robustness module.
✅ Container Features¶
All pipeline features work correctly in Docker and Singularity containers:
Latest Updates (v2.1.0)¶
- Radiomics Robustness Module: Comprehensive feature stability assessment with ICC, CoV, and QCD metrics
- Web UI: Browser-based interface for drag-and-drop DICOM upload and processing
- Enhanced Configuration: Container-specific config includes all recent pipeline features
Core Features¶
1. Subprocess Timeouts ✅¶
subprocess.run()with timeout parameter works in Docker- SIGTERM/SIGKILL signals properly handled with tini init system
- Tested with TotalSegmentator and dcm2niix operations
2. Process Spawning ✅¶
- multiprocessing with
spawncontext works in containers - No fork-related issues (spawn is safer than fork in Docker)
- Process pool handling is container-safe
3. CPU Detection ✅¶
os.cpu_count()correctly detects available CPUs- Works with Docker
--cpuslimits (when using psutil) - Respects cgroup CPU quotas in Docker/Kubernetes
4. Signal Handling ✅¶
- Tini init system properly forwards signals
- Graceful shutdown on SIGTERM from
docker stop - Prevents zombie processes from multiprocessing
5. GPU Support ✅¶
- CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES work
- Timeout mechanisms don't interfere with GPU operations
- GPU memory management works as expected
Docker-Specific Enhancements¶
Tini Init System¶
Added to Dockerfile:
Benefits: - Properly reaps zombie processes from multiprocessing - Forwards signals (SIGTERM, SIGINT) to child processes - Essential for parallel processing in containers
Psutil for Better CPU Detection¶
Added to Dockerfile:
Benefits:
- Respects Docker CPU limits (--cpus)
- Detects actual available CPUs vs. host CPUs
- Works with Kubernetes CPU requests/limits
Timeout Environment Variables¶
Added to Dockerfile:
Benefits: - Set reasonable defaults for containers - Can be overridden in docker-compose.yml or at runtime - Prevents indefinite hangs in containerized environments
Usage Examples¶
Running with Docker¶
Basic Usage (GPU)¶
docker run --gpus all \
-v ./Input:/data/input:ro \
-v ./Output:/data/output:rw \
-v ./Logs:/data/logs:rw \
kstawiski/rtpipeline:latest \
rtpipeline \
--dicom-root /data/input \
--outdir /data/output \
--logs /data/logs
Custom Timeouts¶
docker run --gpus all \
-e TOTALSEG_TIMEOUT=7200 \
-e DCM2NIIX_TIMEOUT=600 \
-e RTPIPELINE_RADIOMICS_TASK_TIMEOUT=1200 \
-v ./Input:/data/input:ro \
-v ./Output:/data/output:rw \
kstawiski/rtpipeline:latest \
rtpipeline --dicom-root /data/input --outdir /data/output
CPU-Limited Container¶
docker run \
--cpus 8 \
--memory 16g \
-e TOTALSEG_TIMEOUT=7200 \
-v ./Input:/data/input:ro \
-v ./Output:/data/output:rw \
kstawiski/rtpipeline:latest \
rtpipeline \
--dicom-root /data/input \
--outdir /data/output \
--totalseg-device cpu \
--max-workers 7
Running with Docker Compose¶
GPU Mode (default)¶
CPU-Only Mode¶
Custom Configuration¶
# docker-compose.override.yml
version: '3.8'
services:
rtpipeline:
environment:
- TOTALSEG_TIMEOUT=7200 # 2 hours
- RTPIPELINE_RADIOMICS_TASK_TIMEOUT=1800 # 30 min per ROI
deploy:
resources:
limits:
cpus: '12.0'
memory: 48G
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Testing Docker Compatibility¶
We provide a comprehensive test script to verify all features work in your container:
# Run inside container
docker run --rm -it kstawiski/rtpipeline:latest python docker_test.py
# Or with docker-compose
docker-compose run --rm rtpipeline python docker_test.py
Tests performed: - ✅ CPU detection (os.cpu_count(), cgroup limits, psutil) - ✅ Subprocess timeouts (subprocess.run with timeout) - ✅ Signal handling (SIGTERM, SIGINT) - ✅ Multiprocessing (spawn context with Pool) - ✅ Environment variables (timeout config) - ✅ GPU detection (CUDA availability)
CPU Detection in Containers¶
How It Works¶
Without Docker limits:
With Docker --cpus 8 limit:
os.cpu_count() # Still returns 16 (host count)
# But psutil respects cgroup limits:
psutil.Process().cpu_affinity() # [0,1,2,3,4,5,6,7] (8 CPUs)
Recommendation: When using Docker CPU limits, manually set workers:
Or use environment variable:
Cgroup CPU Limits¶
The pipeline respects cgroup CPU quotas:
- cgroups v1: /sys/fs/cgroup/cpu/cpu.cfs_quota_us
- cgroups v2: /sys/fs/cgroup/cpu.max
To see your container's CPU limit:
# Inside container
cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
# Actual CPUs = quota / period
Signal Handling and Graceful Shutdown¶
With Tini (Recommended)¶
# Graceful stop (10 second timeout)
docker stop rtpipeline
# Tini forwards SIGTERM to pipeline
# Pipeline finishes current task and exits cleanly
Without Tini (Not Recommended)¶
# Without tini, signals may not propagate properly
# Can leave zombie processes
# May need docker kill -9 (forceful)
Always use tini - it's included in our Dockerfile!
Kubernetes Compatibility¶
The pipeline works in Kubernetes with proper resource limits:
apiVersion: v1
kind: Pod
metadata:
name: rtpipeline
spec:
containers:
- name: rtpipeline
image: kstawiski/rtpipeline:latest
command: ["rtpipeline"]
args:
- "--dicom-root"
- "/data/input"
- "--outdir"
- "/data/output"
- "--max-workers"
- "7" # Set based on CPU limits
resources:
requests:
cpu: "4"
memory: "16Gi"
limits:
cpu: "8"
memory: "32Gi"
nvidia.com/gpu: "1"
env:
- name: TOTALSEG_TIMEOUT
value: "3600"
- name: CUDA_VISIBLE_DEVICES
value: "0"
Common Issues and Solutions¶
Issue: Container uses all host CPUs despite --cpus limit¶
Solution:
Manually set --max-workers to respect the limit:
Issue: Pipeline doesn't shut down gracefully¶
Solution: Ensure tini is being used:
# Check ENTRYPOINT
docker inspect kstawiski/rtpipeline:latest | grep -A1 Entrypoint
# Should show: ["/usr/bin/tini", "--"]
Issue: Timeouts not working in container¶
Solution: Check environment variables are set:
docker run --rm kstawiski/rtpipeline:latest env | grep TIMEOUT
# Should show:
# TOTALSEG_TIMEOUT=3600
# DCM2NIIX_TIMEOUT=300
# RTPIPELINE_RADIOMICS_TASK_TIMEOUT=600
Issue: GPU not detected in container¶
Solution: Check NVIDIA runtime and environment variables:
# Verify GPU runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# Check environment in our container
docker run --rm --gpus all kstawiski/rtpipeline:latest \
bash -c "nvidia-smi && python -c 'import torch; print(torch.cuda.is_available())'"
Issue: Out of shared memory errors¶
Solution: Increase shared memory size:
Performance Tuning for Docker¶
Memory-Constrained Containers¶
docker run \
--memory 16g \
--memory-swap 16g \
kstawiski/rtpipeline:latest \
rtpipeline \
--max-workers 4 \
--seg-workers 1 \
--sequential-radiomics
CPU-Heavy Workload¶
docker run \
--cpus 32 \
kstawiski/rtpipeline:latest \
rtpipeline \
--max-workers 31 \
--totalseg-device cpu \
--seg-workers 2
GPU-Optimized¶
docker run \
--gpus all \
--shm-size=8g \
kstawiski/rtpipeline:latest \
rtpipeline \
--max-workers 15 \
--seg-workers 1 \
--totalseg-device gpu \
--totalseg-force-split
Monitoring¶
Watch Pipeline Progress¶
# Stream logs
docker logs -f rtpipeline
# Watch for heartbeat messages
docker logs -f rtpipeline 2>&1 | grep "Still processing"
# Monitor resource usage
docker stats rtpipeline
Check for Timeouts¶
# Look for timeout errors
docker logs rtpipeline 2>&1 | grep -i timeout
# Should see entries like:
# "ERROR: Command timed out after 3600s"
# "ERROR: Segmentation: task #5 timed out after 7200s"
Best Practices¶
- Always use tini - included in our Dockerfile ✅
- Set explicit workers when using
--cpuslimit - Increase timeouts for large datasets or slow systems
- Monitor logs for heartbeat messages and timeouts
- Use
--shm-sizefor GPU workloads (4-8GB) - Cache TotalSegmentator weights with volume mount
- Use restart policies for production:
Singularity Support¶
rtpipeline fully supports Singularity for HPC and secure computing environments.
Building Singularity Containers¶
Option 1: From Docker Hub (Recommended)¶
Option 2: From Local Docker Image¶
# First build Docker image
./build.sh
# Convert to Singularity
singularity build rtpipeline.sif docker-daemon://kstawiski/rtpipeline:latest
Option 3: From Definition File (Advanced)¶
# Build from rtpipeline.def
# Note: Requires repository files in build context
singularity build --fakeroot rtpipeline.sif rtpipeline.def
Running with Singularity¶
Interactive Shell¶
singularity shell --nv \
--bind /path/to/input:/data/input:ro \
--bind /path/to/output:/data/output:rw \
--bind /path/to/logs:/data/logs:rw \
rtpipeline.sif
Execute Pipeline¶
singularity exec --nv \
--bind /path/to/input:/data/input:ro \
--bind /path/to/output:/data/output:rw \
--bind /path/to/logs:/data/logs:rw \
rtpipeline.sif \
snakemake --cores all --use-conda --configfile /app/config.container.yaml
Web UI Mode¶
singularity run --nv \
--bind /path/to/uploads:/data/uploads:rw \
--bind /path/to/input:/data/input:rw \
--bind /path/to/output:/data/output:rw \
--bind /path/to/logs:/data/logs:rw \
rtpipeline.sif
# Access at http://localhost:8080
HPC/SLURM Integration¶
Example Job Script:
#!/bin/bash
#SBATCH --job-name=rtpipeline
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --time=48:00:00
#SBATCH --gres=gpu:1
module load singularity
# Set paths
INPUT_DIR=/scratch/$USER/dicom_input
OUTPUT_DIR=/scratch/$USER/rtpipeline_output
LOGS_DIR=/scratch/$USER/rtpipeline_logs
# Create output directories
mkdir -p $OUTPUT_DIR $LOGS_DIR
# Run pipeline
singularity exec --nv \
--bind ${INPUT_DIR}:/data/input:ro \
--bind ${OUTPUT_DIR}:/data/output:rw \
--bind ${LOGS_DIR}:/data/logs:rw \
rtpipeline.sif \
snakemake --cores $SLURM_CPUS_PER_TASK --use-conda --configfile /app/config.container.yaml
echo "Pipeline completed at $(date)"
With Custom Configuration:
#!/bin/bash
#SBATCH --job-name=rtpipeline-robustness
#SBATCH --cpus-per-task=32
#SBATCH --mem=128G
#SBATCH --time=72:00:00
module load singularity
# Custom config with radiomics robustness enabled
cat > /tmp/config.custom.yaml << 'EOF'
dicom_root: "/data/input"
output_dir: "/data/output"
logs_dir: "/data/logs"
workers: 30
radiomics_robustness:
enabled: true
modes:
- segmentation_perturbation
segmentation_perturbation:
apply_to_structures:
- "GTV*"
- "CTV*"
- "PTV*"
intensity: "aggressive"
EOF
singularity exec \
--bind /tmp/config.custom.yaml:/tmp/config.custom.yaml:ro \
--bind ${INPUT_DIR}:/data/input:ro \
--bind ${OUTPUT_DIR}:/data/output:rw \
rtpipeline.sif \
snakemake --cores 30 --use-conda --configfile /tmp/config.custom.yaml
Singularity-Specific Notes¶
- GPU Access: Use
--nvflag for NVIDIA GPU support -
Writable Overlays: For caching TotalSegmentator weights:
-
Environment Variables: Pass via
--envorSINGULARITYENV_: -
Conda Environments: All environments are pre-built in the container
- Web UI: Requires binding appropriate ports and directories
Summary¶
✅ All optimization and hang prevention features work correctly in Docker and Singularity
The pipeline has been enhanced with: - Radiomics Robustness Module: ICC-based feature stability assessment (latest update) - Web UI: Browser-based upload and processing interface - Tini init system for proper signal handling - Timeout environment variables with sensible defaults - Psutil for accurate CPU detection - Singularity support for HPC environments
Container-specific features: - Respects container CPU/memory limits - Proper signal forwarding for graceful shutdown - Zombie process reaping for parallel operations - GPU support (NVIDIA Docker runtime / Singularity --nv) - Pre-built conda environments - Container-optimized configuration
Recommended deployment:
Docker (Development/Production):
Singularity (HPC/Secure Environments):
# Pull from Docker Hub
singularity pull rtpipeline.sif docker://kstawiski/rtpipeline:latest
# Run pipeline
singularity exec --nv \
--bind /data:/data \
rtpipeline.sif \
snakemake --cores all --use-conda --configfile /app/config.container.yaml
For production, see docker-compose.yml configuration with restart policies, resource limits, and health checks.