Live migrate VMs from a running compute node¶
Introduction¶
A VM migration is the relocation of a VM from one hypervisor to another.
When a VM has a live migration performed it is not shut down during the process. This is useful when there is an imperative to not interrupt the applications that are running on the VM.
This article covers manual migrations (migrating an individual VM) and node evacuations (migrating all VMs on a compute node).
Warning
Migration involves disabling compute services on the source host, effectively removing the hypervisor from the cloud.
Network usage may be significantly impacted if block migration mode is used.
VMs with intensive memory workloads may require pausing for live migration to succeed.
Terminology¶
This article makes use of the following terms:
- block migration
A migration mode where a disk is copied over the network (source host to destination host). It works with local storage only.
- boot-from-volume
A root disk that is based on a Cinder volume.
- ephemeral disk
A non-root disk that is managed by Nova. It is based on local or shared storage.
- local storage
Storage that is local to the hypervisor. Disk images are typically found under
/var/lib/nova/instances
.- nova-scheduler
The cloud’s
nova-scheduler
can be leveraged in the context of migrations to select destination hosts dynamically. It is compatible with both manual migrations and node evacuations.- shared storage
Storage that is accessible to multiple hypervisors simultaneously (e.g. Ceph RBD, NFS, iSCSI).
Procedure¶
Ensure live migration is enabled¶
By default, live migration is disabled. Check the current configuration:
juju config nova-compute enable-live-migration
Enable the functionality if the command returns ‘false’:
juju config nova-compute enable-live-migration=true
See the nova-compute charm for information on all migration-related configuration options. Also see section SSH keys and VM migration for information on how multiple application groups can affect migrations.
Gather relevant information¶
Display the nova-compute units:
juju status nova-compute
This article will be based on the command’s following partial output:
Unit Workload Agent Machine Public address Ports Message
nova-compute/0* active idle 0 10.0.0.222 Unit is ready
ntp/0* active idle 10.0.0.222 123/udp chrony: Ready
ovn-chassis/0* active idle 10.0.0.222 Unit is ready
nova-compute/1 active idle 3 10.0.0.241 Unit is ready
ntp/1 active idle 10.0.0.241 123/udp chrony: Ready
ovn-chassis/1 active idle 10.0.0.241 Unit is ready
List the compute node hostnames:
openstack hypervisor list
+----+---------------------+-----------------+------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+----+---------------------+-----------------+------------+-------+
| 1 | node1.maas | QEMU | 10.0.0.222 | up |
| 2 | node4.maas | QEMU | 10.0.0.241 | up |
+----+---------------------+-----------------+------------+-------+
Based on the above, map units to node names. This information will be useful later on. The source host should also be clearly identified (this document will use ‘node1.maas’):
Node name |
Unit |
Source host |
---|---|---|
node1.maas |
nova-compute/0 |
✔ |
node4.maas |
nova-compute/1 |
List the VMs hosted on the source host:
openstack server list --host node1.maas --all-projects
+--------------------------------------+---------+--------+----------------------------------+-------------+----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+---------+--------+----------------------------------+-------------+----------+
| 81df1304-f755-4ae6-9b8c-2f888f6ad623 | focal-2 | ACTIVE | int_net=192.168.0.144, 10.0.0.76 | focal-amd64 | m1.micro |
| 7e897540-a0aa-4031-9b7c-dd03ebc8ec5e | focal-3 | ACTIVE | int_net=192.168.0.73, 10.0.0.69 | focal-amd64 | m1.micro |
+--------------------------------------+---------+--------+----------------------------------+-------------+----------+
Ensure adequate capacity on the destination host¶
Oversubscribing the destination host can lead to service outages. This is an issue when a destination host is explicitly selected by the operator.
The following commands are useful for discovering a VM’s flavor, listing flavor parameters, and viewing the available capacity of a destination host:
openstack server show <vm-name> -c flavor
openstack flavor show <flavor> -c vcpus -c ram -c disk
openstack host show <destination-host>
Avoid expired Keystone tokens¶
Keystone token expiration times should be increased when dealing with oversized VMs as expired tokens will prevent the cloud database from being updated. This will lead to migration failure and a corrupted database entry. See Keystone tokens in the OpenStack Charms Deployment Guide.
To set the token expiration time to three hours (from the default one hour):
juju config keystone token-expiration=10800
To ensure that the new expiration time is in effect, wait for the current tokens to expire (e.g. one hour) before continuing.
Disable the source host¶
Prior to migration or evacuation, disable the source host by referring to its corresponding unit:
juju run-action --wait nova-compute/0 disable
This will stop nova-compute services and inform nova-scheduler to no longer assign new VMs to the host.
Live migrate VMs¶
Live migrate VMs using either manual migration or node evacuation.
Manual migration¶
The command to use when live migrating VMs manually is:
openstack server migrate --live-migration [--block-migration] [--host <dest-host>] <vm>
Examples are provided for various scenarios.
Note
Depending on your client the Nova API Microversion of ‘2.30’ may need to be specified when combining live migration with a specified host (i.e.
--os-compute-api-version 2.30
).Specifying a destination host will override any anti-affinity rules that may be in place.
Important
The migration of VMs using local storage will fail if the
--block-migration
option is not specified. However, the use of this
option will also lead to a successful migration for a combination of local
and non-local storage (e.g. local ephemeral disk and boot-from-volume).
To migrate VM ‘focal-2’, which is backed by local storage, using the scheduler:
openstack server migrate --live-migration --block-migration focal-2
To migrate VM ‘focal-3’, which is backed by non-local storage, using the scheduler:
openstack server migrate --live-migration focal-3
To migrate VM ‘focal-2’, which is backed by a combination local and non-local storage:
openstack server migrate --live-migration --block-migration focal-2
To migrate VM ‘focal-2’, which is backed by local storage, to host ‘node4.maas’ specifically:
openstack --os-compute-api-version 2.30 server migrate \ --live-migration --block-migration --host node4.maas focal-2
To migrate VM ‘focal-3’, which is backed by non-local storage, to host ‘node4.maas’ specifically:
openstack --os-compute-api-version 2.30 server migrate \ --live-migration --host node4.maas focal-3
Node evacuation¶
The command to use when live evacuating a compute node is:
nova host-evacuate-live [--block-migrate] [--target-host <dest-host>] <source-host>
Examples are provided for various scenarios.
Note
The scheduler may send VMs to multiple destination hosts.
Block migration will be attempted on VMs by default.
Important
The migration of VMs using non-local storage will fail if the
--block-migrate
option is specified. However, the omittance of this
option will lead to a successful migration for a combination of local and
non-local storage (e.g. local ephemeral disk and boot-from-volume). This
option therefore has no compelling use case.
To evacuate host ‘node1.maas’ of VMs, which are backed by local or non-local storage (or a combination thereof):
nova host-evacuate-live node1.maas
To evacuate host ‘node1.maas’ of VMs, which are backed by local or non-local storage (or a combination thereof), to host ‘node4.maas’ specifically:
nova host-evacuate-live --target-host node4.maas node1.maas
Enable the source host¶
Providing the source host is not being retired, re-enable it by referring to its corresponding unit:
juju run-action --wait nova-compute/0 enable
This will start nova-compute services and allows nova-scheduler to run new VMs on this host.
Revert Keystone token expiration time¶
If the Keystone token expiration time was modified in an earlier step, change it back to its original value. Here it is reset to the default of one hour:
juju config keystone token-expiration=3600
Troubleshooting¶
Migration list and migration ID¶
To get a record of all past migrations on a per-VM basis, which includes the migration ID:
nova migration-list
In this output columns have been removed for legibility purposes, and only a single, currently running, migration is shown:
+----+-------------+------------+---------+--------------------------------------+----------------+
| Id | Source Node | Dest Node | Status | Instance UUID | Type |
+----+-------------+------------+---------+--------------------------------------+----------------+
| 29 | node4.maas | node1.maas | running | 81df1304-f755-4ae6-9b8c-2f888f6ad623 | live-migration |
+----+-------------+------------+---------+--------------------------------------+----------------+
The above migration has an ID of ‘29’.
Forcing a migration¶
A VM with an intensive memory workload can be hard to live migrate. In such a case the migration can be forced by pausing the VM until the copying of memory is finished.
The openstack server show command output contains a progress
field that normally displays a value of ‘0’ but for this scenario of a busy VM
it will start to provide percentages.
Forcing a migration should be considered once its progress nears 90%:
nova live-migration-force-complete <vm> <migration-id>
Caution
Some applications are time sensitive and may not tolerate a forced migration due to the effect pausing can have on a VM’s clock.
Aborting a migration¶
A migration can be aborted like this:
nova live-migration-abort <vm> <migration-id>
Migration logs¶
A failed migration will result in log messages being appended to the
nova-compute.log
file on the source host.