Kubernetes Pod Allocation in a Multi-OS Cluster

This week we saw the announcement of the public preview of Windows Server Containers on Azure Kubernetes Service (AKS). Windows support in AKS has been something we've been waiting for, for a while now and is excellent news for those trying to lift and shift Windows applications into containers. As we've discussed previously, if you can, using Linux containers is always going to give you the fastest start-up time and fewer complications, however not everyone can re-write their apps in .net core. If you have applications that can't run on Linux, but still want the benefits of containerisation, then Windows containers are a good choice, and being able to deploy them in AKS makes it that bit easier.

Running Windows containers on Kubernetes (AKS or not) has some complications, however. One of these is ensuring that your pods end up running on nodes with a matching OS. In most scenarios where you are running Windows on AKS, you are going to also have some Linux Nodes. With AKS and Windows, you must have a Linux node pool, even if it is just one node. You are also likely to want to run applications that require Linux, given that a lot of the Kubernetes ecosystem is based on this OS. By default, Kubernetes does not know what OS your Pods need and will schedule them where it sees fit, which could be on the wrong OS. This issue applies both ways, with the potential for Linux pods to try and run on Windows nodes, and Windows pods on Linux nodes. We need to pro-actively work to make sure this does not happen. Below are two methods that can be used to do this.

OS Node Selectors

Kubernetes has the concept of a node selector. A node selector is a key-value pair that you can specify that allows you to indicate that the node you are assigning the pod to must have the capabilities defined in the selector. These can be custom selectors that you have added to your nodes (for example SSD, GPU etc.) or one of the built-in selectors. One that comes on every node out of the box is the "kubernetes.io/os" selector, which indicates the OS that the node is running. In our scenario, We can use this selector in our Pod spec to ensure that the pod is only deployed to nodes with the right address.

For Windows pods, our pod spec YAML would look like this:

apiVersion: v1
kind: Pod
metadata:
  name: IISApp
  labels:
    env: test
spec:
  containers:
  - name: IISApp
    image: IISApp
    imagePullPolicy: IfNotPresent
  nodeSelector:
    kubernetes.io/os: Windows

A Linux Pod would have a spec with the OS set to Linux.

apiVersion: v1
kind: Pod
metadata:
  name: nginxApp
  labels:
    env: test
spec:
  containers:
  - name: nginxApp
    image: nginxApp
    imagePullPolicy: IfNotPresent
  nodeSelector:
    kubernetes.io/os: Linux

Taints and Tolerations

The above approach works well if you are in control of the pod spec for what you are deploying. However, if you are using a resource from say a public Helm chart, it is not as simple as this, as generally, you won't have access to the pod spec unless you want to download the chart and edit it. Some well-written charts have a parameter you can pass which will allow you to specify an OS selector. If that is the case, then we can follow the same process as above. However, if the chart does not allow this, and you don't want to edit it manually, we can look at using Taints to help with this.

Taints allow us to apply a value to a node, which will mean that no pods will be deployed on it unless the pod specifically has a Toleration that indicates it will accept this Taint. Using this, we can pick one OS and Taint all the nodes with this OS, so that any pods that do not specify toleration will use the non-tainted nodes. This Taint effectively allows us to specify a default OS for the cluster, and if you want to use the other OS, you need to state this explicitly. Given that most Helm charts in the public repo are going to run Linux, it makes sense to taint the Windows nodes and leave the Linux nodes as default. We then only have to worry about setting the toleration on the Windows nodes pod spec.

This solution only works if you can leave one OS's nodes untainted. If you taint both OS types, then you are back to the same situation we had with label selectors, where you need to make a change to the pod spec for either OS.

To set this up, the first thing we would do is run a Kubectl command to taint our Windows nodes. We would run this command for each Windows node:

kubectl taint nodes <nodeName> OS=Windows:NoSchedule

The name of the Key Vault pair ("OS-Windows") can be whatever you wish. Now, this is applied it will prevent any pods that do not tolerate that condition from running on that node.

Taint Command

Now we need to update our pod spec for our Windows app to tolerate this.

apiVersion: v1
kind: Pod
metadata:
  name: IISApp
  labels:
    env: test
spec:
  containers:
  - name: IISApp
    image: IISApp
    imagePullPolicy: IfNotPresent
  tolerations:
  - key: "OS"
    operator: "Equal"
    value: "Windows"
    effect: "NoSchedule"

Now, only pods with this toleration will be allowed on that node, and everything else will default to running on the Linux nodes.

Image Attribution

this way or that flickr photo by Robert Couse-Baker shared under a Creative Commons (BY) license