Xena Series Release Notes¶
24.2.1-22¶
Upgrade Notes¶
Configuration of service user tokens is now required for all Nova services to ensure security of block-storage volume data.
All Nova configuration files must configure the
[service_user]
section as described in the documentation.See https://bugs.launchpad.net/nova/+bug/2004555 for more details.
버그 수정¶
Bug #1970383: Fixes a permissions error when using the ‘query_placement_for_routed_network_aggregates’ scheduler variable, which caused a traceback on instance creation for non-admin users.
Bug #1941005 is fixed. During resize Nova now uses the PCI requests from the new flavor to select the destination host.
Fix rescuing volume based instance by adding a check for ‘hw_rescue_disk’ and ‘hw_rescue_device’ properties in image metadata before attempting to rescue instance.
24.2.0¶
버그 수정¶
As a fix for bug 1942329 nova now updates the MAC address of the
direct-physical
ports during mova operations to reflect the MAC address of the physical device on the destination host. Those servers that were created before this fix need to be moved or the port needs to be detached and the re-attached to synchronize the MAC address.
[bug 1958636] Explicitly check for and enable SMM when firmware requires it. Previously we assumed libvirt would do this for us but this is not true in all cases.
Bug #1978444: Now nova retries deleting a volume attachment in case Cinder API returns
504 Gateway Timeout
. Also,404 Not Found
is now ignored and leaves only a warning message.
Bug #1981813: Now nova detects if the
vnic_type
of a bound port has been changed in neutron and leaves an ERROR message in the compute service log as such change on a bound port is not supported. Also the restart of the nova-compute service will not crash any more after such port change. Nova will log an ERROR and skip the initialization of the instance with such port during the startup.
During the havana cycle it was discovered that eventlet monkey patching of greendns broke ipv6. https://bugs.launchpad.net/nova/+bug/1164822 Since then nova has been disabling eventlet monkey patching of greendns. Eventlet adressed the ipv6 limitation in v0.17 with the introduction of python 3 support in 2015. Nova however continued to disable it, which can result i slow dns queries blocking the entire nova api or other binary because socket.getaddrinfo becomes a blocking call into glibc see: https://bugs.launchpad.net/nova/+bug/1964149 for more details.
If compute service is down in source node and user try to stop instance, instance gets stuck at powering-off, hence evacuation fails with msg: Cannot ‘evacuate’ instance <instance-id> while it is in task_state powering-off. It is now possible for evacuation to ignore the vm task state. For more details see: bug 1978983
When vDPA was first introduced move operations were implemented in the code but untested either in a real environment or in functional tests. Due to this gap nova elected to block move operations for instance with vDPA devices. All move operations except for live migration have now been tested and found to indeed work so the API blocks have now been removed and functional tests introduced. Other operations such as suspend and live migration require code changes to support and will be enabled as new features in the future.
기타 기능¶
A workaround has been added to the libvirt driver to catch and pass migrations that were previously failing with the error:
libvirt.libvirtError: internal error: migration was active, but no RAM info was set
See bug 1982284 for more details.
24.1.1¶
버그 수정¶
Instances with hardware offloaded ovs ports no longer lose connectivity after failed live migrations. The driver.rollback_live_migration_at_source function is no longer called during during pre_live_migration rollback which previously resulted in connectivity loss following a failed live migration. See Bug 1944619 for more details.
Amended the guest resume operation to support mediated devices, as libvirt’s minimum required version (v6.0.0) supports the hot-plug/unplug of mediated devices, which was addressed in v4.3.0.
Fixed bug 1960230 that prevented resize of instances that had previously failed and not been cleaned up.
The bug 1960401 is fixed which can cause invalid BlockDeviceMappings to accumulate in the database. This prevented the respective volumes from being attached again to the instance.
Fixes slow compute restart when using the
nova.virt.ironic
compute driver where the driver was previously attempting to attach VIFS on start-up via theplug_vifs
driver method. This method has grown otherwise unused since the introduction of theattach_interface
method of attaching VIFs. As Ironic manages the attachment of VIFs to baremetal nodes in order to align with the security requirements of a physical baremetal node’s lifecycle. The ironic driver now ignores calls to theplug_vifs
method.
24.1.0¶
새로운 기능¶
Added a new configuration option
[workarounds]/enable_qemu_monitor_announce_self
that when enabled causes the Libvirt driver to send a announce_self QEMU monitor command post live-migration. Please see bug 1815989 for more details. Please note that this causes the domain to be considered tainted by libvirt.
알려진 이슈¶
The libvirt virt driver in Nova implements power on and hard reboot by destroying the domain first and unpluging the vifs then recreating the domain and replugging the vifs. However nova does not wait for the network-vif-plugged event before unpause the domain. This can cause the domain to start running and requesting IP via DHCP before the networking backend has finished plugging the vifs. The config option [workarounds]wait_for_vif_plugged_event_during_hard_reboot has been added, defaulting to an empty list, that can be used to ensure that the libvirt driver waits for the network-vif-plugged event for vifs with specific
vnic_type
before it unpauses the domain during hard reboot. This should only be used if the deployment uses a networking backend that sends such event for the givenvif_type
at vif plug time. The ml2/ovs and the networking-odl Neutron backend is known to send plug time events for ports withnormal
vnic_type
. For more information see https://bugs.launchpad.net/nova/+bug/1946729
버그 수정¶
The
POST /servers
(create server) API will now reject attempts to create a server with the same port specified multiple times. This was previously accepted by the API but the instance would fail to spawn and would instead transition to the error state.
Bug #1829479: Now deleting a nova-compute service removes allocations of successfully evacuated instances. This allows the associated resource provider to be deleted automatically even if the nova-compute service cannot recover after all instances on the node have been successfully evacuated.
The bug 1952941 is fixed where a pre-Victoria server with pinned CPUs cannot be migrated or evacuated after the cloud is upgraded to Victoria or newer as the scheduling fails with
NotImplementedError: Cannot load 'pcpuset'
error.
Bug 1950657, fixing behavior when nova-compute wouldn’t retry image download when gets “Corrupt image download” error from glanceclient and has num_retries config option set.
24.0.0¶
Prelude¶
The 24.0.0 release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 23.0.0 (Wallaby) to 24.0.0 (Xena).
There are a few major changes worth mentioning. This is not an exhaustive list:
The latest Compute API microversion supported for Xena is v2.90. Details on REST API microversions added since the 23.0.0 Wallaby release can be found in the REST API Version History page.
Support for accelerators in Nova servers has been improved. Now Cyborg-managed SmartNICs can be attached as SR-IOV devices.
Two new
nova-manage
CLI commands can be used for checking the volume attachment connection information and for refreshing it if the connection is stale (for example with a Ceph backing store and MON IP addresses). Some documentation on how to use them can be found here.Instance hostnames published by the metadata API service or config drives can be explicitly defined at instance creation time thanks to the new 2.90 API microversion. See the
hostname
field documentation on the API docs for further details.Libvirt virt driver now supports any PCI device, not just virtual GPUs, that are using the
VFIO-mdev
virtualization framework, like network adapters or compute accelerators. See more in the spec.
새로운 기능¶
Microversion 2.89 has been introduced and will include the
attachment_id
of a volume attachment,bdm_uuid
of the block device mapping record and removes the duplicateid
from the responses forGET /servers/{server_id}/os-volume_attachments
andGET /servers/{server_id}/os-volume_attachments/{volume_id}
.
A number of commands have been managed to
nova-manage
to help update stale volume attachment connection info for a given volume and instance.The
nova-manage volume_attachment show
command can be used to show the current volume attachment information for a given volume and instance.The
nova-manage volume_attachment get_connector
command can be used to get updated host connector for the localhost.Finally, the
nova-manage volume_attachment refresh
command can be used to update the volume attachment with this updated connection information.
A
--sleep
option has been added to thenova-manage db archive_deleted_rows
CLI. When this command is run with the--until-complete
option, the process will archive rows in batches in a tight loop, which can cause problems in busy environments where the aggressive archiving interferes with other requests trying to write to the database. The--sleep
option can be used to specify a time to sleep between batches of rows while archiving with--until-complete
, allowing the process to be throttled.
A
--task-log
option has been added to thenova-manage db archive_deleted_rows
CLI. When--task-log
is specified,task_log
table records will be archived while archiving the database. The--task-log
option works in conjunction with--before
if operators desire archiving only records that are older than<date>
. Theupdated_at
field is used by--task-log --before <date>
to determine the age of atask_log
record for archival.The
task_log
database table contains instance usage audit records ifnova-compute
has been configured with[DEFAULT]instance_usage_audit = True
. This will be the case if OpenStack Telemetry is being used in the deployment, as the option causes Nova to generate audit notifications that Telemetry consumes from the message bus.Usage data can also be later retrieved by calling the
/os-instance_usage_audit_log
REST API [1].Historically, there has been no way to delete
task_log
table records other than manual database modification. Because of this,task_log
records could pile up over time and operators are forced to perform manual steps to periodically truncate thetask_log
table.[1] https://docs.openstack.org/api-ref/compute/#server-usage-audit-log-os-instance-usage-audit-log
A new configuration option is now available for supporting PCI devices that use the VFIO-mdev kernel framework and are stateless. Instead of using the
VGPU
resource class for both the inventory and the related allocations, the operator could ask to use another custom resource class for a specific mdev type by using the dynamicmdev_class
.
When using the libvirt virt driver with the QEMU or KVM backends, instances will now be created with the vmcoreinfo feature enabled by default. This creates a fw_cfg entry for a guest to store dump details, necessary to process kernel dump with KASLR enabled and providing additional kernel details. For more information, refer to the libvirt documentation.
The 2.90 microversion has been added. This microversion allows users to specify a requested hostname to be configured for the instance metadata when creating an instance (
POST /servers
), updating an instance (PUT /servers/{id}
), or rebuilding an instance (POST /servers/{server_id}/action (rebuild)
). When specified, this hostname replaces the hostname that nova auto-generates from the instance display name. As with the auto-generated hostnames, a service such ascloud-init
can automatically configure the hostname in the guest OS using this information retrieved from the metadata service.In addition, starting with the 2.90 microversion, the
OS-EXT-SRV-ATTR:hostname
field is now returned for all users. Previously this was restricted to admin users.
Add support for the
bochs
libvirt video model. This is a legacy-free video model that is best suited for UEFI guests. In limited cases (e.g. if the guest does not depend on direct VGA hardware access), it can be useable for BIOS guests as well.
Add support for smartnic via Cyborg device profiles in Neutron ports with vnic type
accelerator-direct
. When such port is used Cyborg will manage the smartnic and Nova will pass through the smartnic VF to the server. Note that while vnic typeaccelerator-direct-physical
also exists in Neutron it is not yet supported by Nova and the server create request will fail with such port.
알려진 이슈¶
Linux guest images that have known kernel bugs related to virtualized apic initialization previously would sporadically hang. For images where the kernel cannot be upgraded, a
[workarounds]
config option has been introduced:[workarounds]libvirt_disable_apic
This option is primarily intended for CI and development clouds as a bridge for operators to mitigate the issue while they work with their upstream image vendors.
Upgrade Notes¶
As part of the fix for bug 1910466, code that attempted to optimize VM CPU thread assignment based on the host CPU topology as it was determined to be buggy, undocumented and rejected valid virtual CPU topologies while also producing different behavior when CPU pinning was enabled vs disabled. The optimization may be reintroduced in the future with a more generic implementation that works for both pinned and unpinned VMs.
A few of the APIs return code was not consistent for the operations/ features not implemented or supported. It was returned as 403, 400, or 409 (for Operation Not Supported For SEV , Operation Not Supported For VTPM cases). Now we have made it consistent and return 400 always when any operations/features are not implemented or supported.
Support for automatically retrying all database interactions by configuring the
[database] use_db_reconnect
config option has been removed. This behavior was only ever supported for interactions with the main database and was generally not necessary as a number of lookups were already explicitly wrapped in retries. The[database] use_db_reconnect
option is provided by oslo.db and will now be ignored by nova.
Experimental support for thread pooling of DB API calls has been removed. This feature was first introduced in the 2014.2 (Juno) release but has not graduated to fully-supported status since nor was it being used for any API DB calls. The
[oslo_db] use_tpool
config option used to enable this feature will now be ignored by nova.
The
[workarounds]disable_native_luksv1
workaround configurable has been removed after previously being deprecated during the Wallaby (23.0.0) release.
The
[workarounds]rbd_volume_local_attach
workaround configurable has been removed after previously being deprecated in the Wallaby (23.0.0) release.
A number of scheduler-related config options were renamed during the 15.0.0 (Ocata) release. The deprecated aliases have now been removed. These are:
[DEFAULT] scheduler_max_attempts
(now[scheduler] max_attempts
)[DEFAULT] scheduler_host_subset_size
(now[scheduler] host_subset_size
)[DEFAULT] max_io_ops_per_host
(now[scheduler] max_io_ops_per_host
)[DEFAULT] max_instances_per_host
(now[scheduler] max_instances_per_host
)[DEFAULT] scheduler_tracks_instance_changes
(now[scheduler] track_instance_changes
)[DEFAULT] scheduler_available_filters
(now[scheduler] available_filters
)
Nova now requires that the Placement API supports at least microversion 1.36, added in Train. The related nova-upgrade check has been modified to warn if this prerequisite is not fulfilled.
The database migration engine has changed from sqlalchemy-migrate to alembic. For most deployments, this should have minimal to no impact and the switch should be mostly transparent. The main user-facing impact is the change in schema versioning. While sqlalchemy-migrate used a linear, integer-based versioning scheme, which required placeholder migrations to allow for potential migration backports, alembic uses a distributed version control-like schema where a migration’s ancestor is encoded in the file and branches are possible. The alembic migration files therefore use a arbitrary UUID-like naming scheme and the
nova-manage db sync
andnova-manage api_db sync
commands now expect such an version when manually specifying the version that should be applied. For example:$ nova-manage db sync 8f2f1571d55b
It is no longer possible to specify an sqlalchemy-migrate-based version. When the
nova-manage db sync
andnova-manage api_db sync
commands are run, all remaining sqlalchemy-migrate-based migrations will be automatically applied. Attempting to specify an sqlalchemy-migrate-based version will result in an error.
지원 종료된 기능 노트¶
The
AvailabilityZoneFilter
scheduler filters is now deprecated for removal in a future release. The functionality of theAvailabilityZoneFilter
has been replaced by themap_az_to_placement_aggregate
pre-filter which was introduced in 18.0.0 (Rocky). This pre-filter is now enabled by default and will be mandatory in a future release.
The existing config options in the
[devices]
group for managing virtual GPUs are now renamed in order to be more generic since the mediated devices framework from the linux kernel can support other devices:enabled_vgpu_types
is now deprecated in favour ofenabled_mdev_types
Dynamic configuration groups called
[vgpu_*]
are now deprecated in favour of[mdev_*]
Support for the deprecated options will be removed in a future release.
The
os_compute_api:os-extended-server-attributes
policy controls which users a number of server extended attributes are shown to. Configuring visiblity of theOS-EXT-SRV-ATTR:hostname
attribute via this policy has now been deprecated and will be removed in a future release. Upon removal, this attribute will be shown for all users regardless of policy configuration.
보안 이슈¶
A vulnerability in the console proxies (novnc, serial, spice) that allowed open redirection has been patched. The novnc, serial, and spice console proxies are implemented as websockify servers and the request handler inherits from the python standard SimpleHTTPRequestHandler. There is a known issue in the SimpleHTTPRequestHandler which allows open redirects by way of URLs in the following format:
http://vncproxy.my.domain.com//example.com/%2F..
which if visited, will redirect a user to example.com.
The novnc, serial, and spice console proxies will now reject requests that pass a redirection URL beginning with “//” with a 400 Bad Request.
In this release OVS port creation has been delegated to os-vif when the
noop
oropenvswitch
security group firewall drivers are enabled in Neutron. Those options, and others that disable thehybrid_plug
mechanism, will now use os-vif instead of libvirt to plug VIFs into the bridge. By delegating port plugging to os-vif we can use theisolate_vif
config option to ensure VIFs are plugged securely preventing guests from accessing other tenants’ networks before the neutron ovs agent can wire up the port. See bug #1734320 for details. Note that OVN, ODL and other SDN solutions also usehybrid_plug=false
but they are not known to be affected by the security issue caused by the previous behavior. As such theisolate_vif
os-vif config option is only used when deploying with ml2/ovs.
버그 수정¶
Improved detection of anti-affinity policy violation when performing live and cold migrations. Most of the violations caused by race conditions due to performing concurrent live or cold migrations should now be addressed by extra checks in the compute service. Upon detection, cold migration operations are automatically rescheduled, while live migrations have two checks and will be rescheduled if detected by the first one, otherwise the live migration will fail cleanly and revert the instance state back to its previous value.
Bug 1851545, wherein unshelving an instance with SRIOV Neutron ports did not update the port binding’s
pci_slot
and could cause libvirt PCI conflicts, has been fixed.중요
Constraints in the fix’s implementation mean that it only applies to instances booted after it has been applied. Existing instances will still experience bug 1851545 after being shelved and unshelved, even with the fix applied.
Fixes an issue with multiple
nova-compute
services used with Ironic, where a rebalance operation could result in a compute node being deleted from the database and not recreated. See bug 1853009 for details.
The nova libvirt driver supports two independent features, virtual CPU topologies and virtual NUMA topologies. Previously, when
hw:cpu_max_sockets
,hw:cpu_max_cores
andhw:cpu_max_threads
were specified for pinned instances (hw:cpu_policy=dedicated
) without explicithw:cpu_sockets
,hw:cpu_cores
,hw:cpu_threads
extra specs or their image equivalent, nova failed to generate a valid virtual CPU topology. This has now been fixed and it is now possible to use max CPU constraints with pinned instances. e.g. a combination ofhw:numa_nodes=2
,hw:cpu_max_sockets=2
,hw:cpu_max_cores=2
,hw:cpu_max_threads=8
andhw:cpu_policy=dedicated
can now generate a valid topology using a flavor with 8 vCPUs.
Addressed an issue that prevented instances with 1 vcpu using multiqueue feature from being created successfully when their vif_type is TAP.
On some hardware platforms, an SR-IOV virtual function for a NIC port may exist without being associated with a parent physical function that has an assocatied netdev. In such a case the the PF interface name lookup will fail. As the
PciDeviceNotFoundById
exception was not handled this would prevent the nova compute agent from starting on affected hardware. See: https://bugs.launchpad.net/nova/+bug/1915255 for more details. This edgecase has now been addressed, however, features that depend on the PF name such as minimum bandwidth based QoS cannot be supported on these platforms.
In this release we delegate port plugging to os-vif for all OVS interface types. This allows os-vif to create the OVS port before libvirt creates a tap device during a live migration therefore preventing the loss of the MAC learning frames generated by QEMU. This resolves a long-standing race condition between Libvirt creating the OVS port, Neutron wiring up the OVS port and QEMU generating RARP packets to populate the vswitch MAC learning table. As a result this reduces the interval during a live migration where packets can be lost. See bug #1815989 for details.
To fix device detach issues in the libvirt driver the detach logic has been changed from a sleep based retry loop to waiting for libvirt domain events. During this change we also introduced two new config options to allow fine tuning the retry logic. For details see the description of the new
[libvirt]device_detach_attempts
and[libvirt]device_detach_timeout
config options.
Minimizes a race condition window when using the
ironic
virt driver where the data generated for the Resource Tracker may attempt to compare potentially stale instance information with the latest known baremetal node information. While this doesn’t completely prevent nor resolve the underlying race condition identified in bug 1841481, this change allows Nova to have the latest state information, as opposed to state information which may be out of date due to the time which it may take to retrieve the status from Ironic. This issue was most observable on baremetal clusters with several thousand physical nodes.