kubernetes: rolling update of hosts using ansible

If you are running your own bare-metal Kubernetes cluster you may decide that you want to using ansible to automate updating of the hosts; specifically, making sure that all updates are installed and any reboots completed. This post targets the Ubuntu Operating System but there’s not reason why it should not work for other with appropriate changes (e.g. using yum module instead of apt).

N.B If you’re in a hurry then just jump to the final document.

The first thing that we’re going to need is some way of updating the applications. For this we can use the ansible.builtin.apt module. Here are my ansible tasks:

YAML
  - name: Allow release info change
    lineinfile:
      path: /etc/apt/apt.conf.d/99releaseinfochange
      state: present
      create: true
      line: Acquire::AllowReleaseInfoChange::Suite "true";

  - name: Run the equivalent of "apt-get update" as a separate step
    apt:
      update_cache: yes
    become: true
    register: apt

  - name: Upgrade all packages to the latest version
    become: true
    apt:
      name: "*"
      state: latest
    register: appsupdated

  - name: Remove useless packages from the cache
    apt:
      autoclean: yes
    become: true

  - name: Remove dependencies that are no longer required
    apt:
      autoremove: yes
    become: true

The first task ‘Allow release info change’ isn’t essential but I tend to find that it helps avoid this bug.
The second task just runs apt-get update. You dispense with this and run it in the next task but I prefer to have it run on its own.
The third task does the heavy-lifting and gets all packages up to the latest versions.
I’ve also used the register keyword to save the output of the task to a variable so that we can leverage some conditionals.
Then we’re removing unnecessary and unneeded packages.

N.B. If you’d like to update the apt cache in this single task then simply add update_cache and set it to true:

  - name: Upgrade all packages to the latest version
    become: true
    apt:
      name: "*"
      state: latest
      update_cache: true
    register: appsupdated

All of this is standard stuff and so next we need a way to check if a reboot is required.
Fortunately, Ubuntu has a file which exists when a reboot is required and so all we need to do is use the stat module to check whether it exists or not and register the result.

  - name: check for reboot file
    stat:
      path: /var/run/reboot-required
    register: reboot_file

Unfortunately, at the present time, ansible has no nice way to wait for reboots. So we’re going to use the shell module (pretty sure I got this task from somewhere else on the Internet but can’t for my life find where).
The shell module task is going to reboot the Operating System (if required):

  - name: Reboot system if required
    shell: ( /bin/sleep 5 ; shutdown -r now "Ansible updates triggered" ) &
            removes=/var/run/reboot-required
    ignore_errors: true
    async: 30
    poll: 0
    notify:
      - waiting for reboot
    when: reboot_file.stat.exists

Note the use of the when keyword to ensure that it runs only when the reboot file exists.
And finally, we’re notifying a handler named ‘waiting for reboot’, which looks like this:

  handlers:
  - name: waiting for reboot
    local_action: wait_for
      host="{{ inventory_hostname }}"
      port=22
      delay=10
      timeout=120

Here we’re using local_action in conjunction with the wait_for module against an ssh port. You can use any port you want but ssh is probably the best choice since ansible uses ssh and without ssh being up on the hosts no further tasks will be able to execute. You will definitely want to play with the timeout value as that will determine how long the task/module waits for the machine to come back up before throwing an error.

Finally we can get to the kubernetes-specific stuff. We’re going to be leveraging the kubernetes.core.k8s_drain module which is part of the kubernetes.core collection. To install that collection you can run this in a terminal:

ansible-galaxy collection install kubernetes.core

Further, the k8s_drain module requires the Kubernetes Python client, which itself requires Python/Pip for installation. The core collection docs linked have all the steps but if you want ansible to handle this then use something like this:

  - name: install pip3
    apt:
      name: python3-pip
      state: present

And then to install the actual client-side bits to talk to the kubernetes API you need to have the following execute on wherever you manage your cluster from. In my case, I run my kubectl commands from the same local machine that I run my ansible commands from.

  - name: Install multi python packages
    pip:
      name:
        - openshift
        - pyyaml
      state: present
    delegate_to: localhost

Note that I’m using the delegate_to controller to ensure that the installation only happens on my k8s control plane node. As per the docs, you can delegate to any other hosts manageable by ansible.

We now need to put nodes into a maintenance mode. For this we’ll use the k8s_drain module:

  - name: drain node
    become: false
    kubernetes.core.k8s_drain:
      state: drain
      name: "{{ inventory_hostname }}"
      delete_options:
        ignore_daemonsets: true
        delete_emptydir_data: true
    delegate_to: localhost
    when:
      - appsupdated.changed
      - reboot_file.stat.exists
    register: nodedrained

The first thing to note is that I have used ‘become: false‘. This is important because the module will look for your kubeconfig file under ‘~/.kube/config’ which will be as relative-path in the home directory of your user account; however, if you elevate using ‘become: true‘ then the drain module will look for that file in the home directory of the root account. It won’t be able to find the .kube/config file and the task will fail.

The second thing to note is that we’re draining the node. You can also cordon the node, as per your preference. I’m also ignoring daemon-sets, which is fairly normal; but I’m also allowing the deletion of empty data directories which would block a drain/cordon if you have a local storage provider.

The when clauses allow us to run this task only when we’ve actually updated applications and a reboot is indicated. The first condition might seem redundant but I’d rather not bounce a machine if I don’t know exactly why it’s pending a reboot.
And finally we’re registering the output of the task.

There are a couple of things that we shall add but I’ll just describe them:

  1. pause after draining the node (optional, but a nice buffer to have as it gives you an opportunity to abort)
  2. set a value of ‘serial: 1‘ for the playbook so that we do this on one k8s node at a time.

So, in the end, the whole thing will look something like this:

- hosts: rpi_cluster
  gather_facts: false
  become: true
  ignore_unreachable: true
  ignore_errors: false
  serial: 1
    tasks:

    - name: install pip3
    apt:
      name: python3-pip
      state: present

  - name: Install multi python packages
    pip:
      name:
        - openshift
        - pyyaml
      state: present

##################################

# Required to avoid the following bug:
# https://github.com/ansible/ansible/issues/48352
  - name: Allow release info change
    lineinfile:
      path: /etc/apt/apt.conf.d/99releaseinfochange
      state: present
      create: true
      line: Acquire::AllowReleaseInfoChange::Suite "true";

  - name: Run the equivalent of "apt-get update" as a separate step
    apt:
      update_cache: yes
    become: true
    register: apt

  - name: Upgrade all packages to the latest version
    become: true
    apt:
      name: "*"
      state: latest
      update_cache: false
    register: appsupdated

  - name: Remove useless packages from the cache
    apt:
      autoclean: yes
    become: true

  - name: Remove dependencies that are no longer required
    apt:
      autoremove: yes
    become: true

  - name: check for reboot file
    stat:
      path: /var/run/reboot-required
    register: reboot_file


##################################

  - name: drain node
    become: false
    kubernetes.core.k8s_drain:
      state: drain
      name: "{{ inventory_hostname }}"
      delete_options:
        ignore_daemonsets: true
        delete_emptydir_data: true
    delegate_to: localhost
    when:
      - appsupdated.changed
      - reboot_file.stat.exists
    register: nodedrained

  - name: Pause for 1 minutes
    ansible.builtin.pause:
      minutes: 1
    when: nodedrained.changed

  - name: Reboot system if required
    shell: ( /bin/sleep 5 ; shutdown -r now "Ansible updates triggered" ) &
            removes=/var/run/reboot-required
    ignore_errors: true
    async: 30
    poll: 0
    notify:
      - waiting for reboot
    when: reboot_file.stat.exists

  - name: Flush handlers
    meta: flush_handlers

  - name: Pause for 1 minutes
    ansible.builtin.pause:
      minutes: 1
    when: nodedrained.changed

  - name: uncordon node
    become: false
    kubernetes.core.k8s_drain:
      state: uncordon
      name: "{{ inventory_hostname }}"
      delete_options:
        ignore_daemonsets: true
        delete_emptydir_data: true
    delegate_to: localhost
    when: nodedrained.changed| default(omit)

  handlers:
  - name: waiting for reboot
    local_action: wait_for
      host="{{ inventory_hostname }}"
      port=2222
      delay=10
      timeout=120

In a v2, I’d like to test the current state of the node before initiating the node draining as it is not stateful at present.

Leave a Reply

Your email address will not be published. Required fields are marked *