ML/AI Use Case

Carbon-Aware ML Training on Kubernetes

Reduce your AI training carbon footprint by up to 30% without changing a single line of code. Intelligent scheduling for deferrable GPU workloads based on real-time grid carbon intensity.

The Hidden Carbon Cost of ML Training

Every day, thousands of ML engineers and data scientists submit training jobs that spin up GPUs immediately, regardless of grid carbon intensity. A single ResNet50 training run can consume ~0.475 kWh of energy. At high carbon intensity times (300+ gCO2eq/kWh) versus low intensity times (~100 gCO2eq/kWh), the same training job can produce 3x more carbon emissions simply due to when it runs.

3x

Carbon variation by time of day

0.475 kWh

Energy per ResNet50 training run

24/7

Most training jobs run immediately

The Solution: Temporal Shifting for Deferrable Workloads

Carbon-Aware Scheduling

Automatically delay deferrable training jobs (batch retraining, experimental iterations, overnight pipelines) until grid carbon intensity is lower, without any code changes.

  • Works with existing Kubernetes clusters
  • Configurable max delay (default 24hrs)
  • Real-time carbon intensity from Electricity Maps

GPU Workload Classification

Accurate power profiling for different GPU workload types (inference, training, rendering) ensures precise carbon and cost estimation for your ML jobs.

  • Hardware-specific power profiles
  • Supports NVIDIA GPUs (A100, H100, RTX series)
  • Datacenter PUE consideration

Real Results: 30% Carbon Reduction

In a week-long experiment with ResNet50 training on CIFAR-100 (4 jobs per day simulating typical ML workflows), we achieved significant carbon reductions by shifting jobs to cleaner energy windows.

29%
Carbon reduction
231 → 165
gCO2eq/kWh avg
3.6 hrs
Average delay
100%
Jobs completed

Key finding: Overnight automation jobs (3-6 AM submissions) saw the largest savings, typically delayed 6-7 hours until morning when solar generation ramped up.

How It Works

1

Install Compute Gardener Scheduler

Deploy our free open-source scheduler to your Kubernetes cluster via Helm. Takes ~3 minutes. Requires an Electricity Maps API key (free tier works).

helm install compute-gardener-scheduler \
  compute-gardener/compute-gardener-scheduler \
  --set carbonAware.electricityMap.apiKey=YOUR_KEY
2

Annotate Your Training Jobs

Add a single line to your Kubernetes Job specs to opt into carbon-aware scheduling.

spec:
  schedulerName: compute-gardener-scheduler
3

Monitor Carbon & Cost Savings

View Prometheus metrics showing carbon intensity, scheduling delays, estimated emissions, and cost savings. Validate your sustainability impact with real data.

Who Is This For?

ML Engineers

Running experimental training jobs, hyperparameter tuning, or batch retraining pipelines that can tolerate delays.

Perfect for: Non-critical training runs, overnight automation

Research Teams

Academic and industrial research labs with compute-intensive workloads and sustainability reporting requirements.

Perfect for: Grant compliance, carbon accounting

Platform Teams

Infrastructure teams managing multi-tenant ML platforms who want to provide sustainability as a service.

Perfect for: Enterprise ML platforms, shared GPU clusters

Ready to Reduce Your ML Carbon Footprint?

Start with our free open-source scheduler or get expert guidance from our team.

Get ML Sustainability Insights

Subscribe for case studies, benchmarks, and best practices for carbon-aware AI training.