2023.2 Series Release Notes

28.3.0-11

Bug Fixes

  • Fixes a regression for live migration on shared storage that was removing the backing disk and instance folder during the cleanup of a virtual machine post live migration. bug 2080436 for details.

28.2.0

Bug Fixes

  • Ironic virt driver now uses the node cache and respects partition keys, such as conductor group, for list_instances and list_instance_uuids calls. This fix will improve performance of the periodic queries which use these driver methods and reduce API and DB load on the backing Ironic service.

28.1.0

Bug Fixes

  • Some OS platforms don’t provide by default cpufreq resources in sysfs, so they don’t have CPU scaling governors. That’s why we should let the governor strategy to be optional for CPU power management.

  • With the change from ml2/ovs DHCP agents towards OVN implementation in neutron there is no port with device_owner network:dhcp anymore. Instead DHCP is provided by network:distributed port. Fix relies on enable_dhcp provided by neutron-api if no port with network:dhcp owner is found. See bug 2055245 for details.

  • Bug 2009280 has been fixed by no longer enabling the evmcs enlightenment in the libvirt driver. evmcs only works on Intel CPUs, and domains with that enlightenment cannot be started on AMD hosts. There is a possible future feature to enable support for generating this enlightenment only when running on Intel hosts.

28.0.1

Bug Fixes

  • Relaxed the config option checking of the cpu_power_management feature of the libvirt driver. The nova-compute service will start with [libvirt]cpu_power_management=True and an empty [compute]cpu_dedicated_set configuration. The power management is still only applied to dedicated CPUs. So the above configuration only allowed to ensure that cpu_power_management can be enabled independently for configuring cpu_dedicated_set during deployment.

  • Previously switchdev capabilities should be configured manually by a user with admin privileges using port’s binding profile. This blocked regular users from managing ports with Open vSwitch hardware offloading as providing write access to a port’s binding profile to non-admin users introduces security risks. For example, a binding profile may contain a pci_slot definition, which denotes the host PCI address of the device attached to the VM. A malicious user can use this parameter to passthrough any host device to a guest, so it is impossible to provide write access to a binding profile to regular users in many scenarios.

    This patch fixes this situation by translating VF capabilities reported by Libvirt to Neutron port binding profiles. Other VF capabilities are translated as well for possible future use.

28.0.0

Prelude

The OpenStack 2023.2 (Nova 28.0.0) release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 27.0.0 (2023.1) to 28.0.0 (2023.2). As a reminder, OpenStack 2023.2 is a non-Skip-Level-Upgrade Release (starting from now, we name it a non-SLURP release) meaning that you can only do rolling-upgrade from 2023.1. Next SLURP release will be 2024.1 where you will be able to upgrade from 2023.1 directly by skipping this release.

There are a few major changes worth mentioning. This is not an exhaustive list:

  • The latest Compute API microversion supported for 2023.2 is v2.95.

  • The Ironic driver [ironic]/peer_list configuration option has been deprecated. The Ironic driver now more closely models other Nova drivers by having a single compute have exclusive control over assigned nodes. If high availability of a single compute service is required, operators should use active/passive failover.

  • The legacy quota driver is now deprecated and a nova-manage limits command is provided in order to migrate the legacy limits into Keystone. We plan to change the default quota driver to the unified limits driver in an upcoming release (hopefully 2024.1 Caracal). It is recommended that you begin planning and executing a migration to unified limits as soon as possible.

  • QEMU in its TCG mode (i.e. full system emulation) uses a translation block (TB) cache as an optimization during dynamic code translation. The libvirt driver can now configure the tb-cache size when the virt type is qemu. This helps running VMs with small memory size. In order to use this feature, a configuration option [libvirt]/tb_cache_size has been introduced.

  • Two new scheduler weighers have been introduced. One helps sorting the nodes by the number of active instances they run, the other helps sorting by the hypervisor version each compute runs. Accordingly, you can place your instances with different strategies, eg. by allocating them to more recent nodes or by reducing the number of noisy instance neighbors.

  • It is now possible to define different authorization policies for migration with and without a target host.

  • A couple of other improvements target reducing the number of bugs we have, one checking at reboot if stale volume attachments still reside and another one ensuring a strict linkage between a compute, a service and the instances it runs.

New Features

  • A new os_compute_api:os-migrate-server:migrate:host policy is created, being by default only an admin-only policy. This will help operators to have different policies between cold-migrate without providing a host or not.

  • A new num_instances_weigher weigher has been added. This weigher will compare the number of instances between each node and order the list of filtered results by its number, By default, this weigher is enabled but with a default of 0.0 which doesn’t change the current behavior. In order to use it, please change the value of [filter_scheduler]/num_instances_weight_multiplier config option where a positive value will favor the host with the higher number of instances (ie. packing strategy) vs. a negative value that will spread instances between hosts. As a side note, this weigher will count all of the existing instances on the host, even the stopped or shelved ones.

  • This change ensures the synchronization of volume attachments between Nova and Cinder, by deleting any dangling volume attachments and maintaining consistency between two databases.

    Block device mapping (BDM) table in the Nova database, stores information about volume attachments, image attachments and swap attachments. Similarly, each volume attachment had a corresponding entry in the Cinder database volume attachment table.

    With this change, on instance reboot, Nova will checks for all volume attachments associated with the instance and verifies their availability in the Cinder database. If attachments are not found they will get deleted from Nova database too.

    After Nova database cleanup, similarly Cinder database is checked for attachments related to instance. If attachments found in Cinder DB that are not present in Nova DB, they will get deleted from Cinder databse.

    See spec for more details.

  • A new hypervisor version weigher has been added to prefer selecting hosts with newer hypervisors installed. For the libvirt driver, this is the version of libvirt on the compute node not the version of qemu. As with all weighers this is enabled by default and its behavior can be modified using the new hypervisor_version_weight_multiplier config option in the filter_scheduler section.

  • Qemu>=5.0.0 bumped the default tb-cache size to 1GiB(from 32MiB) and this made it difficult to run multiple guest VMs on systems running with lower memory. With Libvirt>=8.0.0 it’s possible to configure lower tb-cache size. A new config option is introduced:

    [libvirt]tb_cache_size

    This config option can be used to configure tb_cache size for guest VMs, it’s only applicable with virt_type=qemu.

  • A new command nova-manage limits migrate_to_unified_limits has been added to make migration from the nova.quota.DbQuotaDriver to the nova.quota.UnifiedLimitsDriver easier. This will enable operators to have their existing quota limits copied from the Nova database to Keystone automatically.

Upgrade Notes

  • The AvailabilityZoneFilter was deprecated for removal in 24.0.0 (Xena) and has now been removed. The functionality of the``AvailabilityZoneFilter`` has been replaced by the``map_az_to_placement_aggregate`` pre-filter. The pre-filter was introduced in 18.0.0 (Rocky) and enabled by default in 24.0.0 (Xena). This pre-filter is now always enabled and the [scheduler] query_placement_for_availability_zone config option has been removed.

  • The minimum required version of libvirt by the nova-compute service is now 7.0.0, and the minimum required version of QEMU is 5.2.0. Failing to meet these minimum versions when using the libvirt compute driver will result in the nova-compute service not starting.

    The next minimum required version of libvirt to be used in a future release is 8.0.0, while the next minimum QEMU is 6.2.0.

  • A new hypervisor version weigher has been added that will prefer selecting hosts with a newer hypervisor installed. This can help simplify rolling upgrades by preferring the already upgraded hosts when moving workloads around using live or cold migration. To restore the old behavior either remove the weigher from the list of enabled weighers or set [filter_scheduler] hypervisor_version_weight_multiplier=0. The default value of the hypervisor_version_weight_multiplier is 1 so only a mild preference is given to new hosts, higher values will make the effect more pronounced and negative values will prefer older hosts.

  • The legacy sqlalchemy-migrate migrations, which have been deprecated since Wallaby, have been removed. There should be no end-user impact.

  • Configuration of service user tokens is now required for all Nova services to ensure security of block-storage volume data.

    All Nova configuration files must configure the [service_user] section as described in the documentation.

    See https://bugs.launchpad.net/nova/+bug/2004555 for more details.

Deprecation Notes

  • We have renamed [ironic]partition_key to [ironic]conductor_group. The config option is still used to specify which Ironic conductor group the ironic driver in the nova compute process should target.

  • We have deprecated the configuration [ironic]peer_list, along with our support for a group of ironic nova-compute processes targeting a shared set of Ironic nodes. There are so many bugs in this support we now prefer statically sharding the nodes between multiple nova-compute processes. Note that the ironic nova-compute process is stateless, and the identity of the service is defined by the config option [DEFAULT]host. As such, you can use an active-passive HA solution to ensure at most one nova-compute process is running for each Ironic node shard.

  • The nova.quota.DbQuotaDriver is marked as deprecated and the default quota driver configuration is planned to be changed to the nova.quota.UnifiedLimitsDriver in the 29.0.0 (2024.1 Caracal) release.

  • The hyperv driver is marked as experimental and may be removed in a future release. The driver is not tested by the OpenStack project and does not have a clear maintainer.

  • The vmwareapi driver is marked as experimental and may be removed in a future release. The driver is not tested by the OpenStack project and does not have a clear maintainer.

Bug Fixes

  • The CPU power management feature has been fixed to use privsep to avoid a FileNotFound error when offlining CPUs.

  • Bug #2024258: Fixes an issue with performance degradation archiving databases with large numbers of foreign key related records.

    Previously, deleted rows were archived in batches of max_rows parents + their child rows in a single database transaction. It limited how high a value of max_rows could be specified by the user because of the size of the database transaction it could generate. Symptoms of the behavior were exceeding the maximum configured packet size of the database or timing out due to a deadlock.

    The behavior has been changed to archive batches of complete parent + child rows trees while limiting each batch when it has reached >= max_rows records. This allows the size of the database transaction to be controlled by the user and enables more rows to be archived per invocation of nova-manage db archive_deleted_rows when there are a large number of foreign key related records.

  • [bug 1983471] When offloading a shelved instance, the compute will now remove the binding so instance ports will appear as “unbound” in neutron.

  • Bug #2003991: Fixes an issue where quota was not properly enforced during unshelve of a SHELVED_OFFLOADED server when [quota]count_usage_from_placement = true or [quota]driver = nova.quota.UnifiedLimitsDriver are configured.