Azure Spring Clean: Replacing Kubernetes Pod Security Policies with Azure Policy on AKS

This post is part of the Azure Spring Clean series being run by Joe Carlyle and Thomas Thornton, which is an event focussing on well managed Azure Tenants. Check out all the other articles at https://www.azurespringclean.com/.

If you want to lock down what sort of workloads can run in your Kubernetes clusters, then you have probably looked at using Pod Security Policies or PSPs. PSPs allow you to force your workloads to run in a certain way and disallow them from running if not. For example, you can enforce that pods must run as a non-root user, allow only specific volume mounts or deny access to the host network. PSP’s have been around for a while now and have been in preview in Azure AKS for a couple of years; however, they are not going to make it out of preview.

PSP as a concept is being deprecated by Kubernetes for various reasons, in favour of Open Policy Agent and Gatekeeper. This provides a much more flexible approach to Kubernetes security than the relatively rigid, pod only approach of PSP’s. In Azure and AKS, PSP’s will not leave preview and will be deprecated as of 20th 2021 (although this date keeps moving). PSP’s in AKS are being replaced by Azure Policy for AKS.

Azure Policy for AKS is an extension of the Azure Policy tooling to allow you to apply policies to workloads running inside your AKS cluster. Under the hood, this uses a managed version of Gatekeeper with policies defined using Open Policy Agent. In the rest of this article, we will look at how you can migrate from PSP to Azure Policy.

Defining Policies

Mapping PSP’s to Azure Policies

Before we can apply Azure Policy to our AKS cluster, we need to define the policies we want to use. We’re going to focus here on the capabilities provided to replicate PSP’s. There are actually several more AKS policies that provide features outside of PSP’s, such as restricting what container registries can be used for containers and so on. If you want to look at all the policies available for AKS, go to the Policies section in the Azure portal, go to definitions, then filter by category for “Kubernetes”. Note that at the moment, there is only support for built-in policies; you cannot create custom policies.

Polices

To migrate existing PSP’s over to Azure Policy, you will need to map the settings in the PSP over to the appropriate policy on Azure Policy. The table below provides a mapping between these to help you translate between them. Note that it is not always a one to one mapping, several of the Azure Policies have grouped together with multiple PSP settings, like Host Network and Host Ports, and run as user, group etc.

AreaPSP NameAzure Policy Name
Running of Privilaged ContainersprivilagedKubernetes cluster should not allow privileged containers
Restricting escalation to root privilegesallowPrivilageExcalationKubernetes clusters should not allow container privilege escalation
Usage of host networking and portshostNetwork, hostPortsKubernetes cluster pods should only use approved host network and port range
Usage of host namespaceshostPID, hostIPCKubernetes cluster containers should not share host process ID or host IPC namespace
Usage of the host filesystemallowedHostPathsKubernetes cluster pod hostPath volumes should only use allowed host paths
Usage of the host filesystemallowedHostPathsKubernetes cluster pod hostPath volumes should only use allowed host paths
The user and group IDs of the containerrunAsUser, runAsGroup, supplementalGroups, fsGroupKubernetes cluster pods and containers should only run with approved user and group IDs
Usage of volume typesvolumesKubernetes cluster pods should only use allowed volume types
Allow specific FlexVolume driversallowedFlexVolumesKubernetes cluster pod FlexVolume volumes should only use allowed drivers
Requiring the use of a read only root file systemreadOnlyRootFilesystemKubernetes cluster containers should run with a read only root file system
Linux capabilitiesdefaultAddCapabilities, requiredDropCapabilities, allowedCapabilitiesKubernetes cluster containers should only use allowed capabilities
The Allowed Proc Mount types for the containerallowedProcMountTypesKubernetes cluster containers should only use allowed ProcMountType
The seccomp profile used by containersannotationsKubernetes cluster containers should only use allowed seccomp profiles
The SELinux context of the containerseLinuxKubernetes cluster pods and containers should only use allowed SELinux options
The sysctl profile used by containersforbiddenSysctlsKubernetes cluster containers should not use forbidden sysctl interfaces
The AppArmor profile used by containersannotationsKubernetes cluster containers should only use allowed AppArmor profiles

Creating Initiatives

It’s unlikely that you will want to assign these policies one at a time; more likely, you will want to create different “initiatives” for different security scenarios. Initiatives allow you to group policies together and provide default parameters for these policies to build a pre-defined configuration, much like a PSP for a specific workload. You are likely going to need multiple different initiatives for different types of workloads. For example, I have found that I need a less restrictive set of policies for running Nginx ingress controllers. In contrast, I want to be more restrictive for my application workloads, so I have two different initiatives defining my security posture for those workloads.

There are two built-in initiatives for PSP’s already in Azure Policy:

NameDescriptionPoliciesVersion
Kubernetes cluster pod security baseline standards for Linux-based workloadsThis initiative includes the policies for the Kubernetes cluster pod security baseline standards. This policy is generally available for Kubernetes Service (AKS), and preview for AKS Engine and Azure Arc enabled Kubernetes. For instructions on using this policy, visit https://aka.ms/kubepolicydoc.51.1.1
Kubernetes cluster pod security restricted standards for Linux-based workloadsThis initiative includes the policies for the Kubernetes cluster pod security restricted standards. This policy is generally available for Kubernetes Service (AKS), and preview for AKS Engine and Azure Arc enabled Kubernetes. For instructions on using this policy, visit https://aka.ms/kubepolicydoc.82.1.1

You can look at how these initiatives and how they are defined to determine if they fit your workloads. If they do, you’re lucky, and you can just take those initiatives and use them. More likely, however, you will need to define some of your own custom initiatives. If you want to do this, I would suggest using one of the built-in initiatives as your starting point. If you go to the Azure Portal and locate them, you can click on the “Duplicate initiative” button. This will create a copy of the initiative, which you can then customise however you like.

When you’re creating an initiative, there are two key areas to look at:

  1. Policies - which policies do you actually want to use in your initiative
  2. Policy Parameters - what values do you want to pass into the individual policies to define how they work. For example, what values do you want in the “Allows Group ID Ranges”. It is possible to allow the person assigning the initiative to specify all these values through initiative parameters; however, this is going to make your initiatives very complex and difficult to police in terms of what a clusters security posture is. You are better off defining each initiative as providing a specific security posture and have the person assigning pick an initiative.

Policy Parameters

Once you go through this process, you should have a set of initiatives you can then apply. To give you some examples, I have the following initiatives in one of my environments:

  1. Nginx Ingress PSP Initiative - Nginx Ingress needs some slightly elevated rights that I don’t want to give other applications, so I have a separate initiative that only gets applied to Nginx namespaces (see next section on how to do this)
  2. Prometheus PSP Initiative - Prometheus needs access to several host resources, which I want to deny for other applications, and so I have a specific initiative for this workload
  3. Standard Workload Restricted PSP - for the rest of my workloads which are relatively simple and don’t need any elevated rights, I have a policy that is restrictive and locks down most things

Assigning Initiatives

Now we have created the required initiatives, we need to assign them so they are used. There are two things we need to look at here.

First, we need to assign the initiatives so that they are actually applied to our AKS clusters. To do this, we click the “Assign” button on the initiative in the portal, and we then need to define the scope where this initiative is applied. This can be done at the management group, subscription or resource group level. Any AKS clusters under that scope will have this policy assigned. We can also add exceptions to not apply the policy to specific sub-resources in our scope.

Scope

This will apply the policy to our cluster, but we want to actually scope some of our policies to specific namespaces, if you recall. For example, the Nginx Ingress policy only wants to be applied to our Nginx namespace; we don’t want it applied to the rest of our workloads as it is too permissive. To do this, we can look at the assignment’s Parameters tab, which allows us to define initiative parameters.

First, we can define what effect the policy has:

  1. Audit - this will log whether workloads are in compliance or not, but not do anything to stop them from running. This is an excellent place to start when applying policies so you can see what workloads are not in compliance and look to fix them without breaking anything
  2. Deny - this will prevent any workloads that do not meet the policy from running. It will not disable existing pods that are already running, but if they get restarted, they will not start
  3. Disabled - The policy does nothing

Once we select the effect, we then have two namespace parameters:

  1. Namespace Exclusions - we can use this to define which namespaces this policy does not apply to. If you have a policy that you want to apply to all namespaces, except a few, this is the one to use. This defaults to include some built-in namespaces that you should not apply policies to
  2. Namespace Inclusions - this defines which namespaces the initiative should be applied to and excludes everything else. Use this if you want a policy to be applied only to a few namespaces, like our Nginx Ingress policy. Here’s what that would look like.

Namespaces

Once we do this, our policies are being applied, but if you’ve not enabled Azure Policy on your cluster, nothing will happen until you do that.

Enabling Azure Policy in your AKS Cluster

If you have previously deployed your cluster either with PSP’s enabled or no security features, then Azure Policy will not be running in your cluster, so we need to enable that.

Firstly, if you had PSP’s running previously, you need to disable them. You cannot have PSP and Azure Policy running at the same time. To disable PSP’s you need to use the Azure CLI with this command:

az aks update \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --disable-pod-security-policy

You can also delete any PSP’s in your cluster if you want to clean them up, but you don’t have to.

Once you disable PSP’s, you can then enable Azure Policy. This can be done through the portal by selecting your AKS cluster, going to policies and then clicking enable. Or you can do this through the CLI with the command below.

az aks enable-addons --addons azure-policy --name MyAKSCluster --resource-group MyResourceGroup

If you look at your cluster, you should now see a new “gatekeeper-system” namespace, with pods running in it. You will need to wait a while before things are fully running, and compliance data is reported to the Azure portal.

Review Compliance and Debug Policies

Once you have assigned your initiatives and enabled Azure Policy, you can review compliance data reported to the Azure Portal. This is particularly important if your policies are in Audit mode, so you can check whether they are working as expected before you turn on enforcement mode.

To view compliance data, you can use the Policy section of the Azure portal. Go to the “Compliance” section and then filter to see the initiatives you are interested in. This will give you a top-level view of whether your initiatives are compliant.

Compliance

We can then click on the specific initiative, allowing us to see what specific policies are not in compliance.

Policies not in compliance

If you click on a specific policy, this will then show you which AKS clusters are not in compliance, and if you then click on the details for that, it will show you which specific pods are not in compliance.

Pod Compliance

Whilst you can see what pods are not in compliance, what you can’t see is what part of the policy they are not in compliance with. If you have a policy with multiple rules, like the user and group ID policy, you cannot tell which of these the container is having issues with. To diagnose this, you need to look at the details on the cluster itself.

Gatekeeper stores all its data in a CRD object called “Constraints”, so if you run kubectl get constraints you should see a list of all the constraints running in your cluster.

Constraints

You will notice that there are multiple instances of the same constraint; I’m not overly clear at this point why this is the case. There is also no easy way to find the constraint you are looking for; you need to view them all to find the data you need. To find the policy that is being violated, you can describe a specific constraint, and you will see details about why the audit failed.

Violations

This should allow you to find which specific item is causing the violation and look to resolve it.

Once you have all your policies working as you expect, you can then edit your initiative assignment to “Deny” mode. This will prevent launching any containers that violate policy. You can see this happening in the logs of the replicaset.

Denied Pod

Pod Security Contexts

I have noticed with Azure Policy that you really need to make sure your security context on your pod is correct, especially when using the audit functionality. With PSPs, the validation was done at deploy time, and so long as your pod complied with the policy, so didn’t try and run as root. For example, then it would pass. With Azure Policy, if your pod’s security context section doesn’t define the restrictions, then you may fail the audit, even if the pod is actually complying. So make sure that you define your security context in your container correctly.

spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  volumes:
  - name: sec-ctx-vol
    emptyDir: {}
  containers:
  - name: sec-ctx-demo
    image: busybox
    command: [ "sh", "-c", "sleep 1h" ]
    volumeMounts:
    - name: sec-ctx-vol
      mountPath: /data/demo
    securityContext:
      allowPrivilegeEscalation: false