If you are running your own bare-metal Kubernetes cluster you may decide that you want to using ansible to automate updating of the hosts; specifically, making sure that all updates are installed and any reboots completed. This post targets the Ubuntu Operating System but there’s not reason why it should not work for other with appropriate changes (e.g. using yum module instead of apt).
N.B If you’re in a hurry then just jump to the final document.
The first thing that we’re going to need is some way of updating the applications. For this we can use the ansible.builtin.apt module. Here are my ansible tasks:
- name: Allow release info change
lineinfile:
path: /etc/apt/apt.conf.d/99releaseinfochange
state: present
create: true
line: Acquire::AllowReleaseInfoChange::Suite "true";
- name: Run the equivalent of "apt-get update" as a separate step
apt:
update_cache: yes
become: true
register: apt
- name: Upgrade all packages to the latest version
become: true
apt:
name: "*"
state: latest
register: appsupdated
- name: Remove useless packages from the cache
apt:
autoclean: yes
become: true
- name: Remove dependencies that are no longer required
apt:
autoremove: yes
become: true
The first task ‘Allow release info change’ isn’t essential but I tend to find that it helps avoid this bug.
The second task just runs apt-get update. You dispense with this and run it in the next task but I prefer to have it run on its own.
The third task does the heavy-lifting and gets all packages up to the latest versions.
I’ve also used the register keyword to save the output of the task to a variable so that we can leverage some conditionals.
Then we’re removing unnecessary and unneeded packages.
N.B. If you’d like to update the apt cache in this single task then simply add update_cache and set it to true:
- name: Upgrade all packages to the latest version
become: true
apt:
name: "*"
state: latest
update_cache: true
register: appsupdated
All of this is standard stuff and so next we need a way to check if a reboot is required.
Fortunately, Ubuntu has a file which exists when a reboot is required and so all we need to do is use the stat module to check whether it exists or not and register the result.
- name: check for reboot file
stat:
path: /var/run/reboot-required
register: reboot_file
Unfortunately, at the present time, ansible has no nice way to wait for reboots. So we’re going to use the shell module (pretty sure I got this task from somewhere else on the Internet but can’t for my life find where).
The shell module task is going to reboot the Operating System (if required):
- name: Reboot system if required
shell: ( /bin/sleep 5 ; shutdown -r now "Ansible updates triggered" ) &
removes=/var/run/reboot-required
ignore_errors: true
async: 30
poll: 0
notify:
- waiting for reboot
when: reboot_file.stat.exists
Note the use of the when keyword to ensure that it runs only when the reboot file exists.
And finally, we’re notifying a handler named ‘waiting for reboot’, which looks like this:
handlers:
- name: waiting for reboot
local_action: wait_for
host="{{ inventory_hostname }}"
port=22
delay=10
timeout=120
Here we’re using local_action in conjunction with the wait_for module against an ssh port. You can use any port you want but ssh is probably the best choice since ansible uses ssh and without ssh being up on the hosts no further tasks will be able to execute. You will definitely want to play with the timeout value as that will determine how long the task/module waits for the machine to come back up before throwing an error.
Finally we can get to the kubernetes-specific stuff. We’re going to be leveraging the kubernetes.core.k8s_drain module which is part of the kubernetes.core collection. To install that collection you can run this in a terminal:
ansible-galaxy collection install kubernetes.core
Further, the k8s_drain module requires the Kubernetes Python client, which itself requires Python/Pip for installation. The core collection docs linked have all the steps but if you want ansible to handle this then use something like this:
- name: install pip3
apt:
name: python3-pip
state: present
And then to install the actual client-side bits to talk to the kubernetes API you need to have the following execute on wherever you manage your cluster from. In my case, I run my kubectl commands from the same local machine that I run my ansible commands from.
- name: Install multi python packages
pip:
name:
- openshift
- pyyaml
state: present
delegate_to: localhost
Note that I’m using the delegate_to controller to ensure that the installation only happens on my k8s control plane node. As per the docs, you can delegate to any other hosts manageable by ansible.
We now need to put nodes into a maintenance mode. For this we’ll use the k8s_drain module:
- name: drain node
become: false
kubernetes.core.k8s_drain:
state: drain
name: "{{ inventory_hostname }}"
delete_options:
ignore_daemonsets: true
delete_emptydir_data: true
delegate_to: localhost
when:
- appsupdated.changed
- reboot_file.stat.exists
register: nodedrained
The first thing to note is that I have used ‘become: false‘. This is important because the module will look for your kubeconfig file under ‘~/.kube/config’ which will be as relative-path in the home directory of your user account; however, if you elevate using ‘become: true‘ then the drain module will look for that file in the home directory of the root account. It won’t be able to find the .kube/config file and the task will fail.
The second thing to note is that we’re draining the node. You can also cordon the node, as per your preference. I’m also ignoring daemon-sets, which is fairly normal; but I’m also allowing the deletion of empty data directories which would block a drain/cordon if you have a local storage provider.
The when clauses allow us to run this task only when we’ve actually updated applications and a reboot is indicated. The first condition might seem redundant but I’d rather not bounce a machine if I don’t know exactly why it’s pending a reboot.
And finally we’re registering the output of the task.
There are a couple of things that we shall add but I’ll just describe them:
- pause after draining the node (optional, but a nice buffer to have as it gives you an opportunity to abort)
- set a value of ‘serial: 1‘ for the playbook so that we do this on one k8s node at a time.
So, in the end, the whole thing will look something like this:
- hosts: rpi_cluster
gather_facts: false
become: true
ignore_unreachable: true
ignore_errors: false
serial: 1
tasks:
- name: install pip3
apt:
name: python3-pip
state: present
- name: Install multi python packages
pip:
name:
- openshift
- pyyaml
state: present
##################################
# Required to avoid the following bug:
# https://github.com/ansible/ansible/issues/48352
- name: Allow release info change
lineinfile:
path: /etc/apt/apt.conf.d/99releaseinfochange
state: present
create: true
line: Acquire::AllowReleaseInfoChange::Suite "true";
- name: Run the equivalent of "apt-get update" as a separate step
apt:
update_cache: yes
become: true
register: apt
- name: Upgrade all packages to the latest version
become: true
apt:
name: "*"
state: latest
update_cache: false
register: appsupdated
- name: Remove useless packages from the cache
apt:
autoclean: yes
become: true
- name: Remove dependencies that are no longer required
apt:
autoremove: yes
become: true
- name: check for reboot file
stat:
path: /var/run/reboot-required
register: reboot_file
##################################
- name: drain node
become: false
kubernetes.core.k8s_drain:
state: drain
name: "{{ inventory_hostname }}"
delete_options:
ignore_daemonsets: true
delete_emptydir_data: true
delegate_to: localhost
when:
- appsupdated.changed
- reboot_file.stat.exists
register: nodedrained
- name: Pause for 1 minutes
ansible.builtin.pause:
minutes: 1
when: nodedrained.changed
- name: Reboot system if required
shell: ( /bin/sleep 5 ; shutdown -r now "Ansible updates triggered" ) &
removes=/var/run/reboot-required
ignore_errors: true
async: 30
poll: 0
notify:
- waiting for reboot
when: reboot_file.stat.exists
- name: Flush handlers
meta: flush_handlers
- name: Pause for 1 minutes
ansible.builtin.pause:
minutes: 1
when: nodedrained.changed
- name: uncordon node
become: false
kubernetes.core.k8s_drain:
state: uncordon
name: "{{ inventory_hostname }}"
delete_options:
ignore_daemonsets: true
delete_emptydir_data: true
delegate_to: localhost
when: nodedrained.changed| default(omit)
handlers:
- name: waiting for reboot
local_action: wait_for
host="{{ inventory_hostname }}"
port=2222
delay=10
timeout=120
In a v2, I’d like to test the current state of the node before initiating the node draining as it is not stateful at present.