NVIDIA + Kata Containers: Trusted AI at GPU Scale

Running AI on public data is easy. The hard part is moving sensitive data and valuable models into production without risking leakage while operating at the scale modern GPU clusters demand. That was the blunt message from NVIDIA’s Zvonko Kaiser at the OpenInfra Summit Europe 2025, where he outlined how NVIDIA is using Kata Containers and the CNCF Confidential Containers stack to deliver “trusted AI anywhere”: on-prem, in private clouds, across public CSPs, and out to the edge.

“The real challenge is running AI pipelines on confidential data and protecting model IP,” Kaiser said, noting that for many enterprises, that trust gap is why “66% of enterprises leave >50% of private data unused.”

Below is a concise walkthrough of the problem space, the architecture NVIDIA is advancing with Kata Containers, and what it means for teams building secure AI on Kubernetes.

The trust problem (and why 2025 is different)

Kaiser framed the landscape as three pillars of security for AI:

Cryptographic compute (e.g., HE, MPC, ZKP): powerful but often orders of magnitude slower for deep learning.
Software sandboxes (e.g., gVisor, Firecracker) reduce the blast radius but still assume trust in the host.
Trusted Execution Environments (TEEs): hardware-backed isolation that flips the model: the workload doesn’t trust the infrastructure.

The inflection point: modern CPU TEEs (AMD SEV-SNP, Intel TDX) now combine with GPU-level protections (Hopper and newer), and Kubernetes plumbing has matured. That alignment makes it practical to enforce confidentiality and integrity without rewriting your AI code.

“Scale is spelled GPU,” Kaiser reminded the audience. “Enterprises care that you can run pipelines across hundreds of nodes and thousands of GPUs.”

Why Kata Containers?

Containers are a great packaging and delivery mechanism, but they don’t provide a strong isolation boundary on their own. Kata Containers adds a lightweight VM boundary around each container, giving you:

Stronger isolation: guest kernel and userspace are independent from the host; host changes are far less likely to break your workload stack.
OCI and Kubernetes compatibility: Kata integrates cleanly with containerd/CRI-O and Kubernetes primitives (e.g., RuntimeClass), so you can keep your workflows.
A glide path to Confidential Containers: the same mechanics that make Kata useful for multi-tenant isolation also power Confidential Containers (Kata + guest components + attestation), where the VM memory is encrypted and measured.

Kaiser emphasized this “no surprises” posture: NVIDIA’s enablement patterns for bare-metal GPUs are replicated within the Kata guest, so the software experience is consistent across bare-metal, Kata, and Confidential Containers.

Kubernetes-native, lift-and-shift security

NVIDIA’s stack builds on familiar Kubernetes constructs:

RuntimeClass to select between bare metal, Kata, or Confidential Containers per pod.
DRA (Dynamic Resource Allocation) for fine-grained, policy-driven device assignment.
CDI (Container Device Interface) to surface GPUs into containers/Kata VMs with the right binaries, libraries, and device nodes.
NVIDIA GPU Operator to automate the cluster-level pieces (driver lifecycle, GPU feature discovery, networking, storage hooks).
Peer-pods to support hybrid cloud scenarios, bursting Confidential Containers to CSPs while keeping isolation boundaries intact.
“Rustifying” the stack to reduce memory-safety issues across critical components.

The result is a Kubernetes-native path: annotate your pods, choose your RuntimeClass, and let the stack handle device plumbing, NUMA/topology awareness, and attestation. As Kaiser put it, it’s lift-and-shift for your AI pipelines, not just individual containers.

Getting GPUs right inside VMs

When you put GPUs behind a VM boundary, topology matters. P2P transfers, GPUDirect RDMA, and NUMA constraints all care about PCIe placement and capabilities (ACS/ATS, switch hierarchies, etc.). NVIDIA addressed this with two complementary approaches in Kata:

Topology flattening when you don’t need strict host mirroring.
Host topology replication when you do, so drivers see the “right” layout and enable the fast paths automatically.

CDI metadata helps map which NIC belongs to which GPU for P2P and RDMA. Kata also supports PF/VF pass-through and lets you choose per-pod PCIe topology (e.g., one workload uses MIG, another uses time-sliced VFs, another uses GPUDirect RDMA). These are pragmatic features born from real customers pushing real scale.

From Hopper to Blackwell and toward TDISP

On the hardware side, NVIDIA started with Hopper (single-GPU pass-through for confidential compute) and is extending with Blackwell, which adds multi-GPU pass-through and scales out across multi-node jobs. Performance improves further with TDISP, which encrypts and integrity-protects traffic on the PCIe itself, reducing overhead compared to bounce buffers. The message: hardware is ready; now the race is software, standards, and ops.

Attestation, secrets, and the “data clean room”

Kaiser underscored that attestation isn’t just for CPUs anymore. GPU state must be part of the measured trust chain, and NVIDIA is working with the community on composite attestation across CPU, GPU, NIC/DPU, and storage. Once a workload proves it’s in the expected state, key release can unlock encrypted model weights, datasets, and storage volumes.

That unlocks new multi-party trust models. Imagine a data clean room where a data owner, model owner, and infrastructure provider each receive verifiable assurances, and where the client can confidently execute sensitive AI workloads, knowing that every layer —from silicon to container and service —is attested and verified before any data or keys are exposed.

What this means for you

If you’re running AI on Kubernetes and you care about protecting model IP, complying with data regulations, or just not trusting the infrastructure by default, here’s why Kata + Confidential Containers should be on your shortlist:

Familiar UX: keep your container images and Kubernetes workflows; select a RuntimeClass and go.
Operational consistency: NVIDIA’s GPU Operator and CDI make bare metal, Kata, and CC feel the same to your pipelines.
Scale with safety: VM isolation for noisy/malicious neighbors; confidential VMs for encrypted and attested execution.
Performance-aware: topology replication and per-pod PCIe decisions preserve the fast paths GPUs need.

“It’s the same image, the same attestation, different postcodes,” Kaiser said. “Run it anywhere.”

How to get started

Test your current workloads with a Kata RuntimeClass on a small node pool. Validate that the GPU paths and drivers behave as expected.
Turn on attestation with Confidential Containers for sensitive pipelines. Wire it to a key broker so secrets are only released to measured states.
Adopt DRA and CDI to control device assignment and expose the right GPU/NIC topology per job.
Engage upstream: NVIDIA, Kata, and the Confidential Containers community are actively collaborating on topology, attestation, and reference architectures. Bring your use cases and performance traces.

Kaiser’s call to action was simple: test, contribute, deploy. If your teams have been holding back high-value data because the trust wasn’t there, this is your opportunity to close that gap without rewriting your AI stack.

“With confidential compute, the workload doesn’t trust the infrastructure,” he said. “Kata and Confidential Containers make that model practical at GPU scale.”

—

Interested in sharing your results or learning more about NVIDIA’s reference architectures with Kata? Join the Kata Containers Slack and the CNCF Confidential Containers discussions. Your feedback directly shapes what ships next.

Tags: AI, CNCF, Confidential Containers, GPU, Kata Containers, Kubernetes, OpenInfra