Carbon-Aware ML Training on Kubernetes
Reduce your AI training carbon footprint by up to 30% without changing a single line of code. Intelligent scheduling for deferrable GPU workloads based on real-time grid carbon intensity.
The Hidden Carbon Cost of ML Training
Every day, thousands of ML engineers and data scientists submit training jobs that spin up GPUs immediately, regardless of grid carbon intensity. A single ResNet50 training run can consume ~0.475 kWh of energy. At high carbon intensity times (300+ gCO2eq/kWh) versus low intensity times (~100 gCO2eq/kWh), the same training job can produce 3x more carbon emissions simply due to when it runs.
Carbon variation by time of day
Energy per ResNet50 training run
Most training jobs run immediately
The Solution: Temporal Shifting for Deferrable Workloads
Carbon-Aware Scheduling
Automatically delay deferrable training jobs (batch retraining, experimental iterations, overnight pipelines) until grid carbon intensity is lower, without any code changes.
- Works with existing Kubernetes clusters
- Configurable max delay (default 24hrs)
- Real-time carbon intensity from Electricity Maps
GPU Workload Classification
Accurate power profiling for different GPU workload types (inference, training, rendering) ensures precise carbon and cost estimation for your ML jobs.
- Hardware-specific power profiles
- Supports NVIDIA GPUs (A100, H100, RTX series)
- Datacenter PUE consideration
Real Results: 30% Carbon Reduction
In a week-long experiment with ResNet50 training on CIFAR-100 (4 jobs per day simulating typical ML workflows), we achieved significant carbon reductions by shifting jobs to cleaner energy windows.
Key finding: Overnight automation jobs (3-6 AM submissions) saw the largest savings, typically delayed 6-7 hours until morning when solar generation ramped up.
How It Works
Install Compute Gardener Scheduler
Deploy our free open-source scheduler to your Kubernetes cluster via Helm. Takes ~3 minutes. Requires an Electricity Maps API key (free tier works).
helm install compute-gardener-scheduler \ compute-gardener/compute-gardener-scheduler \ --set carbonAware.electricityMap.apiKey=YOUR_KEY
Annotate Your Training Jobs
Add a single line to your Kubernetes Job specs to opt into carbon-aware scheduling.
spec: schedulerName: compute-gardener-scheduler
Monitor Carbon & Cost Savings
View Prometheus metrics showing carbon intensity, scheduling delays, estimated emissions, and cost savings. Validate your sustainability impact with real data.
Who Is This For?
ML Engineers
Running experimental training jobs, hyperparameter tuning, or batch retraining pipelines that can tolerate delays.
Perfect for: Non-critical training runs, overnight automation
Research Teams
Academic and industrial research labs with compute-intensive workloads and sustainability reporting requirements.
Perfect for: Grant compliance, carbon accounting
Platform Teams
Infrastructure teams managing multi-tenant ML platforms who want to provide sustainability as a service.
Perfect for: Enterprise ML platforms, shared GPU clusters
Ready to Reduce Your ML Carbon Footprint?
Start with our free open-source scheduler or get expert guidance from our team.
Get ML Sustainability Insights
Subscribe for case studies, benchmarks, and best practices for carbon-aware AI training.