AKS Finally Supports FQDN-Based Network Policies - At a Cost

Network policies in Kubernetes allow you to control ingress and egress traffic at the pod or namespace. They can also control traffic at the network level between pods in your cluster and from pods to the outside world.

There are various types of network policies depending on your Kubernetes network configuration. Standard Kubernetes network policies, which AKS has supported for a long time, allow defining traffic rules between pods using namespaces and labels and can be fairly dynamic; however, when it comes to controlling traffic to external resources, this gets more tricky. When defining which external resources your pods can talk to, your only option is to provide an IP address (or multiple IP addresses). This makes it challenging to work with external resources that have many IP addresses, IP addresses that change regularly, or the service does not provide a fixed list of IPs.

Certain, more advanced network layers do offer FQDN-based policies; however, these are more complicated to implement, as the policy still has to translate FQDNs into IPs. AKS has, up until now, not supported any of these more advanced policies; however, this is now possible with a new feature in the Advanced Container Network Services, using the Cillium network provider. Using a DNS proxy we can use FQDNs to filter network traffic rather than being restricted to IP only.

How It works

Advanced Container Network Services

Advanced Container Network Services (ACNS) is an add-on feature for AKS that provides additional network features around observability and security. One of these features is the ability to use FQDN filtering for network policies. ACNS is not a free service; it comes with a fee of about $18 per node per month. There are a good number of additional benefits you get from ACNS, but if the only one you are interested in is FQDN filtering, then this could be pretty expensive.

Enabling FQDN Filtering

To enable FQDN filtering, we first need to enable the preview feature.

az extension add --name aks-preview
az extension update --name aks-preview
az feature register --namespace "Microsoft.ContainerService" --name "AdvancedNetworkingPreview"

We can then check to see when the feature is enabled:

az feature show --namespace "Microsoft.ContainerService" --name "AdvancedNetworkingPreview"

Next we need to setup the AKS cluster. If your creating a new cluster you can enable this at deployment time.

az aks create \
    --name <cluster name> \
    --resource-group <resource group name> \
    --generate-ssh-keys \
    --network-plugin azure \
    --network-plugin-mode overlay \
    --pod-cidr 192.168.0.0/16 \
    --network-dataplane cilium \
    --enable-acns

You can enable ACNS on an existing cluster, however, it needs to be already running the Cillium data plane for this to work. If you are not, you will need to recreate the cluster.

az aks update \
    --name <cluster name> \
    --resource-group <resource group name> \
    --enable-acns

Once your cluster is ready you can create a Cillium policy that defines the rules for which traffic you allow out of your pod. In the policy below we are blocking all inbound and outbound traffic, and then explicitly allowing traffic to samcogan.com. You will also note that we also allow traffic to the Kubernetes clusters DNS servers (kube-dns). This is needed to allow resolving the FQDN as part of the policy.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: outbound-policy
  namespace: demo
spec:
  endpointSelector:
    matchLabels:
      demo: "true"
  ingress:
    - {}
  egress:
    - toEndpoints:
      - matchLabels:
          "k8s:io.kubernetes.pod.namespace": kube-system
          "k8s:k8s-app": kube-dns
      toPorts:
        - ports:
           - port: "53"
             protocol: ANY
          rules:
            dns:
              - matchPattern: "*"
    - toFQDNs:
      - matchName: "samcogan.com"

Once we deploy this policy to our cluster, we will see that any pods in the “demo” namespace with a label of “demo=true” can reach samcogan.com but not other URLs.