Updating Windows Nodes in AKS Windows Preview
When you deploy an Azure Kubernetes Service (AKS) cluster, you deploy the management plane, which is a PaaS service, and then your worker nodes. These worker nodes appear in your subscription as virtual machines or virtual machine scale sets. Because these are IaaS VMs, the responsibility for updating these nodes with patches falls to you. For Linux nodes this is relatively easy, patches are applied automatically, and all you need to do is handle the rebooting of the nodes to apply them. This process can even be automated through the use of Kured.
With latest AKS preview, we can now deploy Windows nodes for running our Windows workloads. Updating these nodes is not as straight forward. If you look at the documentation, it states:
The process to keep Windows Server nodes (currently in preview in AKS) up to date is a little different. Windows Server nodes don't receive daily updates. Instead, you perform an AKS upgrade that deploys new nodes with the latest base Window Server image and patches.
When I initially read this I thought "Ok, so I just need to run an update process to update the nodes with the latest patches", but things are not as simple as that. Note that the text says "Instead you perform an AKS upgrade". That's right, to apply patches to your Windows nodes, you have to upgrade the version of Kubernetes, you cannot just apply Windows updates and leave everything else alone. This means you must change the version of Kubernetes on your agent pool to a later version; you cannot perform an upgrade to the same version of Kubernetes; nothing will happen if you try. So this leads to two questions:
- What if I don't want to upgrade the version of Kubernetes on my agent pool. Usually, people are going to pick a specific supported version of Kubernetes to use until the consciously choose to upgrade. You are not going to want to upgrade Kubernetes every patch Tuesday
- What if I am already on the latest supported version of Kubernetes, I have nowhere to go
After raising some issues on GitHub and with the AKS team directly, the answer currently is that you have two options:
- Update the Kubernetes version
- Delete the node pool and recreate it
To be very clear here, Windows node support is in preview in AKS, and the product team understand clearly that this situation is not suitable for going to production. You can see from the issue I raised on Github that this is being worked on. You can also track this item here.
So given all this, how do we go about doing one of the two options listed above?
Update Kubernetes Version
The first option we have to refresh our nodes with the latest updates is to update the Kubernetes version on the node pool. Note that this is only updating things on the node pool; we are not updating the management plane. If you want to update your whole cluster to a later version you need to do both node pool and management plane updates.
To upgrade the node pool, we are going to run an AZ CLI command:
az aks nodepool upgrade \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool \ --kubernetes-version 1.13.10 \ --no-wait
Pay careful attention to Kubernetes version, this must be greater than the current version on that node pool.
When this runs, what this will do is create new nodes, transfer your workloads to the new nodes and remove the old ones.
Once this completes the nodes should be running the latest version of the node OS.
Delete and Recreate the Pool
If you can't upgrade to a new version of Kubernetes or don't want to, then the only other option is to delete and recreate the node pool. This will be disruptive and cause downtime, as there is no way to transition workloads. You will also need to re-deploy your workloads to the new pool once it is created.
Note that you cannot delete the first node pool created in a cluster. In a Windows cluster, you always have to have a Linux node pool as well, so you will want to make sure that this pool is the one created first.
To undertake this work, we are first going to destroy the current node pool:
az aks nodepool delete --cluster-name myAKSCluster \ --name mynodepool \ --resource-group myResourceGroup
Then we will create a new pool with the same details:
az aks nodepool add --cluster-name myAKSCluster \ --name mynodepool \ --resource-group myResourceGroup \ --kubernetes-version 1.13.10 \ --node-count 2 \ --node-vm-size Standard_D4_v3 \ --os-type Windows
Once completed, you have a new pool with nodes running the latest OS image. At this point, you will need to re-deploy your workloads to these empty nodes.