Snapshot and Migrate Containers Across Instances

Cedana brings live migration to your containerized, stateful CPU and GPU workloads.

Backed by
Supports all your Cloud NativeTools, Containers and CPU  and GPU workloads
NVIDIA Logo
Docker logo
Kubernetes logo
NVIDIA Logo
Helm charts
Terraform logo
Kata containers
Intel logo
Podman logo
AMD Logo
Docker logo
Kubernetes logo
Helm charts
Terraform logo
Kata containers
Intel logo
Podman logo

Why Cedana?

Cedana (/ce'dana/) is a save/migrate/resume system for compute. We leverage insight into the Linux Kernel (through CRIU and other methods) to checkpoint and restore workloads across instances and vendors.

Reduce compute costs by 20%-80%

Eliminate idle compute. Automatically suspends and resumes your workloads based on activity. Automatically bin-packs containers across instances, freeing up resources at fine-grain resolution.

Never lose work — even if hardware fails

Upon hardware or OOM failure, automatically resume workload on a new instance without losing work.


3x your performance

Accelerate cold start and time to first token by resuming your CPU/GPU workload from it's previous state. Eliminate boot time, initialization and other steps.

"We reduced our cloud cost by 50% by integrating Cedana's Save, Migrate, and Resume capability into our product. If an instance fails, we can continue workloads without losing work, increasing reliability."
Debo Ray DevZero
Debo Ray
CEO, DevZero

Use Cases

01

Reduce Cloud Cost by 20%-80%

Reduce Cloud Cost by 20%-80%

Cedana automatically suspends and resumes 
your workloads based on idle 
or active status. Automatically bin-packs 
containers across instances, freeing up resources
 at very fine-grain resolution.

02

Improve  Start Times 2x

Improve Start Times 2x

Cedana automatically suspends and resumes 
your workloads based on idle 
or active status. Automatically bin-packs 
containers across instances, freeing up resources
at very fine-grain resolution.

Improve Performance and SLA Automatically

Save, Migrate, and Resume stateful GPU workloads. Live-migrate workloads dynamically based on user load and resource availability to improve performance and utilization. Policy-based automation enables workload-level SLAs to be enforced.

Speed Time to First Token 2x

Accelerate your inferencing time-to-first-token by eliminating the cold boot process including library and model initialization and optimization. Leverage your existing model optimizations. Significantly reduce network bottlenecks.

01

Reduce Compute Cost

Reduce Compute Cost by 20%-80%

Automatically suspends and resumes your workloads based on idle or active status. Automatically bin-packs containers across instances, freeing up resourcesat very fine-grain resolution.

02

Increase Throughput 2x

Increase Throughput 2x

Automation increases throughput and capacity. When a GPU fails, workloads are automatically resumed on a new instance without losing work.  This reduces the significant manual overhead of restarting jobs on new instances. GPU utilization is automatically increased by treating the entire pool of available GPUs as a single, logical shared cluster, avoiding resource fragmentation or static reservation of capacity. These capabilities automatically increase throughput and capacity.

01

Minimize Downtime

Minimize Downtime by  90%

Databases, analytics, webservers and other stateful workloads continue without losing work even through node (instance) failures. Avoid costly over-provisioning to meet SLA requirements.

02

Reduce Costs  

Reduce Cloud Cost by 5x

Long-running jobs can be run on spot instances.
When a spot instance is revoked, your job is
automatically continued on a new instance
without any manual intervention.

Maximize your CPU/GPU Utilization

See Cedana in Action
Request API Access

How it Works?

Save a process or container using our API. Saves the complete state of the workload including process and filesystem state, open network connections, in-memory (RAM and VRAM), data, namespaces and everything in between

Migrate the workload onto another instance.

Resume workloads as new process/container on another instance. Realtime performance with not service disruption.

Use Save, Migrate, Resume (SMR) to implement policy-based automation. Cedana automatically suspends and resumes workloads based on activity, enabling fine-grained bin packing of containers. This saves up to 80% of compute costs.

Easy Integration

Use Cedana REST API to checkpoint your application’s state, transfer it to a new instance, cloud or resource, and resume operations. No code modifications needed.

5x Your Utilization. Increase Availability. Reduce Cost.  

See Cedana in Action
Request API Access