Scale back the ovn-central application

Preamble

Clean downscaling of the ovn-central application is supported from release 23.03 onwards. Earlier versions of the charm will require some manual steps.

Think about the impact

OVN central is using the Raft consensus algorithm to facilitate HA. Raft

  • Tolerates up to (N-1)/2 node failures

  • Requires minimum quorum of (N/2)+1 members

Changes to the number of members in the OVN cluster affect its fault tolerance as well as its minimum requirements for quorum. Before you downscale your cluster, think about the impact it will have on both of these properties.

It is not recommended to downscale ovn-central application below 3 members.

Procedure for releases before 23.03

With older releases of ovn-central charm, the operator can run juju remove-unit command, but internally, OVN cluster will not perform reconfiguration and it will keep expecting servers from the removed unit to rejoin the cluster. To cleanly remove units, you have to complete a few manual steps.

Log into the unit that you intend to remove (using juju ssh) and execute the following commands as root:

ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/leave OVN_Southbound
ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/leave OVN_Northbound

This will cause OVN servers hosted on this unit to gracefully leave both Southbound and Northbound OVN clusters.

Perform unit removal with:

juju remove-unit <UNIT_NAME>

To verify that the downscaling completed successfully, log into one of the remaining units of ovn-central and check the state of both clusters (again as root).

ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound

If both clusters have an expected number of members, you are done. However if any of the clusters did not perform reconfiguration and removed servers are still hanging around, you can kick them manually using following command where CLUSTER_NAME is either “OVN_Southbound” or “OVN_Northbound” and SERVER_ID is a short hexadecimal number from “cluster/status” output.

ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/kick <CLUSTER_NAME> <SERVER_ID>

Note

The cluster/kick command does everything needed to decrease the number of cluster members by one. It both removes the targeted server and informs the remaining members so their view of the cluster can be automatically updated.

Procedure for release 23.03 and after

Starting with this release, the removed ovn-central unit will attempt to perform a graceful departure from the cluster so the operator should not need to do anything else than remove the unit with:

juju remove-unit <UNIT_NAME>

To verify that the unit departed cluster cleanly, wait for the ovn-central application to settle and run:

juju run-action --wait <OVN_CENTRAL_UNIT> cluster-status

This output will show yaml-formatted status of both Southbound and Northbound OVN clusters. Each cluster status will contain key “unit_map”, if this list does not contain any servers in category “UNKNOWN”, it means that downscaling completed successfully.

Example of “unit_map” after successful downscaling:

unit_map:
 ovn-central/3: 7ed2
 ovn-central/1: f1ca
 ovn-central/2: 92d5

However if there are “UNKNOWN” servers, for example like this:

unit_map:
  ovn-central/3: 7ed2
  ovn-central/1: f1ca
  ovn-central/2: 92d5
  UNKNOWN:
  - ba21

It means that downscaling did not complete successfully, and you’ll have to manually kick servers listed as “UNKNOWN” using the cluster-kick action provided by the charm.