WTH is Pod Sandboxing for AKS?

This week we saw an announcement of a new preview feature for AKS, called pod sandboxing. This is a pretty exciting new feature for AKS that many people have been asking for a while, and it will make life a lot easier for those trying to create a more secure environment for running containers. Let’s look at what pod sandboxing is, how it works and why you would want to use it.

What is Pod Sandboxing?

Containers and Virtual Machines differ in several ways, but one of the key areas is how the operating system kernel works. In a VM, each VM on a host gets a separate copy of the kernel, whereas with a container, all containers on a host share the same kernel. This is part of what makes containers so fast, small and flexible, but it brings issues, especially around security. If an attacker manages to compromise one container running on a host and get access to the kernel, it can likely get access to all the containers running on a host. This concerns anyone running hostile multi-tenanted workloads in containers and Kubernetes.

Pod Sandboxing is a solution to this problem, bringing a way to run a container with its own copy of the kernel rather than sharing it with the rest of the host. An attack on one container on the host will no longer compromise all containers on the host.

How does Pod Sandboxing work?

Pod Sandboxing in AKS is based on a technology called Kata Containers. Kata Containers look and behave like containers, but wrap your container in a small, lightweight virtual machine. This virtual machine has its own kernel that is separate from the host kernel, and this means you are now protected from an attack on one container, being able to access that host kernel.

Kata Containers

Enabling Pod Sandboxing

Pod Sandboxing must be turned on on your AKS cluster on a node pool basis. You can run sandboxed and non-sandboxed containers on the same host, so you do not need to dedicate a whole node pool just to this; you can mix and match.

You will need Azure CLI version 2.44.1 or later to set this up. Once you have that, you need to enable the AKS preview extension.

az extension add --name aks-preview

Or update it if you already have it installed.

az extension update --name aks-preview

Next, we need to enable the preview:

az feature register --namespace "Microsoft.ContainerService" --name "KataVMIsolationPreview"

Next, we can deploy a new cluster with Sandboxing enabled or add a new node pool to an existing AKS cluster. There is no way to enable this on an existing node pool. There are a couple of pre-requisites for the resources you deploy in the node pool for this to work:

  1. The node pool must be running the Mariner OS, Microsoft’s internal Linux distribution tuned for AKS
  2. You must use a generation 2 VM that supports nested virtualisation

To deploy a new cluster that has sandboxing enabled, use this command:

az aks create --name myAKSCluster --resource-group myResourceGroup --os-sku mariner --workload-runtime KataMshvVmIsolation --node-vm-size Standard_D4s_v3 --node-count 1

The key commands here are the OS-Sku using mariner, a D4s v3 VM, a gen 2 VM and then enabling the KataMshvVmIsolation runtime.

To add a node pool to an existing cluster with sandboxing enabled, run this command:

az aks nodepool add --cluster-name myAKSCluster --resource-group myResourceGroup --name nodepool2 --os-sku mariner --workload-runtime KataMshvVmIsolation --node-vm-size Standard_D4s_v3

az aks update --name myAKSCluster --resource-group myResourceGroup

Running a Sandboxed Pod

Once you have the cluster deployed, we can deploy a Pod to the cluster that runs in sandboxed mode. This is very simple; all we need to do is set the runtimeClassName property in the spec section to be kata-mshv-vm-isolation

kind: Pod
apiVersion: v1
metadata:
  name: untrusted
spec:
  runtimeClassName: kata-mshv-vm-isolation
  containers:
  - name: untrusted
    image: mcr.microsoft.com/aks/fundamental/base-ubuntu:v0.0.11
    command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]

This is all that is needed to run your container in a sandbox. It should now behave exactly as a standard container but more securely.

Why would I want to use Pod Sandboxing?

It’s all about security and access to that host kernel. If you are running multiple workloads on a cluster, multi-tenant applications, or applications that may be attacked or want your workloads to be as secure as possible, then you would want to use this feature. Sandboxing has some downsides (see the next section), so be aware of these, but balance them against the additional security it delivers.

If all your AKS-based applications are internal to your network, used by trusted users and never exposed to the outside world, then maybe you don’t need this feature. Still, if you are concerned about possible attacks and compromises of your hosts and containers, you want to check this out.

What issues does Pod Sandboxing have?

I’m not sure this is an issue; it’s the way things work, but having your container run in a VM adds some overhead to your container. The VM will consume some of your CPU and Memory allocation for your pod rather than being accessible to the container, so you will need to scale your pods appropriately.

A more significant issue is that the I/O performance of sandbox containers tends to be worse than non-isolated containers and can be considerable in some scenarios. Ensure you test the performance to see if it meets your needs.

Finally, this feature is a preview, so it shouldn’t be used in production. It also has some limitations that may go away before release but that you should be aware of:

  • Microsoft Defender for containers does not support assessing sandboxed pods
  • Container Insights doesn’t support monitoring of runtime pods
  • Kata host network is not supported
  • Container Storage Interface (CSI) drivers and the Secrets Store CSI drive (for mounting Key Vault secrets) are not supported for sandboxed containers