2024.1 Series (23.1.0 - 24.1.x) Release Notes

24.1.3-10

バグ修正

  • The set of strings used to detect cipher suite version related errors in the ipmitool command was expanded. If the string Error in open session response message : invalid role is contained in the output of a failed ipmitool command execution, such error will be now considered as related to inappropriate ciphers too, and will be retried with another cipher suite version if Ironic is configured to do so. See bug 2085137 for more details.

  • Fixes an issue where operators executing complex arrangement of steps which include out-of-band and in-band steps, for example a hardware RAID create_configuration step followed by in-band steps inside of the agent, would effectively get the agent stuck in a wait state in the Cleaning, Servicing, or Deploying workflows. This was related to the way out-of-band steps are executed and monitored. Ironic, before starting to execute a new step, now cleans the polling lockout flag for the respective workflow being executed to prevent the agent from getting stuck. For more information, please see bug 2096938.

  • Some vendors insist that floppy images must be 1440 KiB in size and that the file name ends with .img. Make it so.

  • Includes the agent token parameter in get command status requests as the endpoint now requires authentication.

  • The fix for CVE-2024-47211 results in image checksum being required in all cases. However there is no checksum requirement for file:// based images. When checksum is missing for file:// based image_source it is now calculated on-the-fly.

  • Fixes an error within the redfish session cache when no redfish_password is specified bug 2097019.

24.1.3

セキュリティー上の問題

  • An issue in Ironic has been resolved where image checksums would not be checked prior to the conversion of an image to a raw format image from another image format.

    With default settings, this normally would not take place, however the image_download_source option, which is available to be set at a node level for a single deployment, by default for that baremetal node in all cases, or via the [agent]image_download_source configuration option when set to local. By default, this setting is http.

    This was in concert with the [DEFAULT]force_raw_images when set to True, which caused Ironic to download and convert the file.

    In a fully integrated context of Ironic's use in a larger OpenStack deployment, where images are coming from the Glance image service, the previous pattern was not problematic. The overall issue was introduced as a result of the capability to supply, cache, and convert a disk image provided as a URL by an authenticated user.

    Ironic will now validate the user supplied checksum prior to image conversion on the conductor. This can be disabled using the [conductor]disable_file_checksum configuration option.

バグ修正

  • Fixes inspection failure when bmc_address or bmc_v6address is null in the inventory received from the ramdisk.

  • Fixes a security issue where Ironic would fail to checksum disk image files it downloads when Ironic had been requested to download and convert the image to a raw image format. This required the image_download_source to be explicitly set to local, which is not the default.

    This fix can be disabled by setting [conductor]disable_file_checksum to True, however this option will be removed in new major Ironic releases.

    As a result of this, parity has been introduced to align Ironic to Ironic-Python-Agent's support for checksums used by standalone users of Ironic. This includes support for remote checksum files to be supplied by URL, in order to prevent breaking existing users which may have inadvertently been leveraging the prior code path. This support can be disabled by setting [conductor]disable_support_for_checksum_files to True.

  • Fixes aborting in-band inspection. Previously, it would fail with Can not transition from state 'inspect failed' on event 'abort'.

24.1.2

アップグレード時の注意

  • When upgrading Ironic to address the qemu-img image conversion security issues, the ironic-python-agent ramdisks will also need to be upgraded.

  • When upgrading Ironic to address the qemu-img image conversion security issues, the [conductor]conductor_always_validates_images setting may be set to True as a short term remedy while ironic-python-agent ramdisks are being updated. Alternatively it may be advisable to also set the [agent]image_download_source setting to local to minimize redundant network data transfers.

  • As a result of security fixes to address qemu-img image conversion security issues, a new configuration parameter has been added to Ironic, [conductor]permitted_image_formats with a default value of "raw,qcow2,iso". Raw and qcow2 format disk images are the image formats the Ironic community has consistently stated as what is supported and expected for use with Ironic. These formats also match the formats which the Ironic community tests. Operators who leverage other disk image formats, may need to modify this setting further.

セキュリティー上の問題

  • Ironic now checks the supplied image format value against the detected format of the image file, and will prevent deployments should the values mismatch. If being used with Glance and a mismatch in metadata is identified, it will require images to be re-uploaded with a new image ID to represent corrected metadata. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic always inspects the supplied user image content for safety prior to deployment of a node should the image pass through the conductor, even if the image is supplied in raw format. This is utilized to identify the format of the image and the overall safety of the image, such that source images with unknown or unsafe feature usage are explicitly rejected. This can be disabled by setting [conductor]disable_deep_image_inspection to True. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic can also inspect images which would normally be provided as a URL for direct download by the ironic-python-agent ramdisk. This is not enabled by default as it will increase the overall network traffic and disk space utilization of the conductor. This level of inspection can be enabled by setting [conductor]conductor_always_validates_images to True. Once the ironic-python-agent ramdisk has been updated, it will perform similar image security checks independently, should an image conversion be required. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic now explicitly enforces a list of permitted image types for deployment via the [conductor]permitted_image_formats setting, which defaults to "raw", "qcow2", and "iso". While the project has classically always declared permissible images as "qcow2" and "raw", it was previously possible to supply other image formats known to qemu-img, and the utility would attempt to convert the images. The "iso" support is required for "boot from ISO" ramdisk support.

  • Ironic now explicitly passes the source input format to executions of qemu-img to limit the permitted qemu disk image drivers which may evaluate an image to prevent any mismatched format attacks against qemu-img.

  • The ansible deploy interface example playbooks now supply an input format to execution of qemu-img. If you are using customized playbooks, please add "-f {{ ironic.image.disk_format }}" to your invocations of qemu-img. If you do not do so, qemu-img will automatically try and guess which can lead to known security issues with the incorrect source format driver.

  • Operators who have implemented any custom deployment drivers or additional functionality like machine snapshot, should review their downstream code to ensure they are properly invoking qemu-img. If there are any questions or concerns, please reach out to the Ironic project developers.

  • Operators are reminded that they should utilize cleaning in their environments. Disabling any security features such as cleaning or image inspection are at your own risk. Should you have any issues with security related features, please don't hesitate to open a bug with the project.

  • The [conductor]disable_deep_image_inspection setting is conveyed to the ironic-python-agent ramdisks automatically, and will prevent those operating ramdisks from performing deep inspection of images before they are written.

  • The [conductor]permitted_image_formats setting is conveyed to the ironic-python-agent ramdisks automatically. Should a need arise to explicitly permit an additional format, that should take place in the Ironic service configuration.

バグ修正

  • Fixes an issue with units tests that show this DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future. cls = validator_for(schema) Removed the warning for deprecated schema by using a new template.

  • Fixes the issue of service steps not starting due to servicing states (states.SERVICING and states.SERVICEWAIT) missing from _FASTTRACK_HEARTBEAT_ALLOWED constant.

  • Fixes issue with configuring virtual media boot for executing service steps by adding missing entries for states.SERVICING and states.SERVICEWAIT in the whitelist of the states allowed by this method.

  • Fixes multiple issues in the handling of images as it relates to the execution of the qemu-img utility, which is used for image format conversion, where a malicious user could craft a disk image to potentially extract information from an ironic-conductor process's operating environment.

    Ironic now explicitly enforces a list of approved image formats as a [conductor]permitted_image_formats list, which mirrors the image formats the Ironic project has historically tested and expressed as known working. Testing is not based upon file extension, but upon content fingerprinting of the disk image files. This is tracked as CVE-2024-44082 via bug 2071740.

  • Fixes usage of redfish detach virtual media feature to be conform to the general implementation. Before the detach virtual media API call using redfish driver was not working as intended and caused the operation to fail.

  • Fixes an issue in redfish attach/detach generic virtual media where the attached devices are not correctly recognized causing the attach operation to fail.

  • Service step validation no longer requires a priority field, which is not supported for servicing.

  • Fixes service steps that rely on a reboot. Previously, the reboot was not properly recognized in the conductor logic.

  • Adds an ISO publisher value to ISO images which are mastered as part of cleaning/deployment/service operations in support of a fix for bug 2032377.

  • Fixes generated URL when using the virtual media attachment API. Previously, it missed the node UUID, causing conflicts between different nodes.

24.1.0

Prelude

Ironic contributors are thrilled to present the release of 24.1.0, tested as part of OpenStack 2024.1 (Caracal) throughout the last six months. This release can be upgraded directly to from Ironic 21.4 as part of a SLURP upgrade from OpenStack 2023.1 (Antelope). Ironic's first release came during the 2014.1 (Icehouse) cycle -- a decade ago. In those ten years, redfish has been created, the default deploy driver has been replaced, and Ironic has expanded into the CNCF community with Metal3. Thanks for making us a part of your cloud!

新機能

  • Adds a http boot interface, based upon the pxe boot interface which informs the DHCP server of an HTTP URL to boot the machine from, and then requests the BMC boot the machine in UEFI HTTP mode.

  • Adds a http-ipxe boot interface, based upon the ipxe boot interface which informs the DHCP server of an HTTP URL to boot the machine from, and then requests the BMC boot the machine in UEFI HTTP mode.

  • Adds node auto-discovery support to the agent inspection implementation.

  • Add support for ovn vtep switches. Operators will be able to use logical and physical switches. Minimally tested in production.

  • Adds a new service ironic-pxe-filter that is designed to work with the agent inspect interface to conduct "unmanaged" inspection. It is adapted from the ironic-inspector's dnsmasq PXE filter and can be used as its replacement. See documentation for more details.

  • Adds implementation of attach/detach generic virtual media device to the Redfish driver.

既知の問題

  • Testing of the http boot interface with Ubuntu 22.04 provided Grub2 yielded some intermittent failures which appear to be more environmental in nature as the signed Shim loader would start, then load the GRUB loader, and then some of the expected files might be attempted to be accessed, and then fail due to an apparent transfer timeout. Consultation with some grub developers concur this is likely environmental, meaning the specific grub build or CI performance related. If you encounter any issues, please do not hestitate to reach out to the Ironic developer community.

アップグレード時の注意

  • Adds an online migration to the new inspection interface. If the agent inspection is enabled and the inspector inspection is disabled, the inspect_interface field will be updated for all nodes that use inspector and are currently not on inspection (i.e. not in the inspect wait or inspecting states).

    If some nodes may be inspecting during the upgrade, you may want to run the online migrations several times with a delay to finish migrating all nodes.

廃止予定の機能

  • The redfish vendor eject vmedia action is now deprecated and it will be removed during the next cycle in favor of the generic API.

バグ修正

  • Fixes Redfish virtual media boot on BMCs that only expose the VirtualMedia resource on Systems instead of Managers. For more informations, you can see bug 2039458.

  • Fixes a vague error when attempting to use the ilo hardware type with iLO6 hardware, by returning a more specific error suggesting action to take in order to remedy the issue. Specifically, one of the API's used by the ilo hardware type is disabled in iLO6 BMCs in favor of users utilizing Redfish. Operators are advised to utilize the redfish hardware type for these machines.

  • Some of Ironic's API endpoints, when the new RBAC policy is being enforced, were previously emitting 500 error codes when insufficent access rights were being used, specifically because the policy required system scope. This has been corrected, and the endpoints should now properly signal a 403 error code if insufficient access rights are present for an authenticated requestor.

  • Increases the 32-character limit of the user column in the NodeHistory model to support up to 64-character-long values. For more information, see bug.

  • Fixes issues with Lenovo hardware where the system firmware may display a blue "Boot Option Restoration" screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.

  • Fixes an issue where the conductor service would fail to launch when the neutron network_interface setting was enabled, and no global cleaning_network or provisioning_network is set in ironic.conf. These settings have long been able to be applied on a per-node basis via the API. As such, the service can now be started and will error on node validation calls, as designed for drivers missing networking parameters.

  • Each conductor now reserves a small proportion of its worker threads (5% by default) for API requests and other critical tasks. This ensures that the API stays responsive even under extreme internal load.

  • Provides a fix for service role support to enable the use case where a dedicated service project is used for cloud service operation to facilitate actions as part of the operation of the cloud infrastructure.

    OpenStack clouds can take a variety of configuration models for service accounts. It is now possible to utilize the [DEFAULT] rbac_service_role_elevated_access setting to enable users with a service role in a dedicated service project to act upon the API similar to a "System" scoped "Member" where resources regardless of owner or lessee settings are available. This is needed to enable synchronization processes, such as nova-compute or the networking-baremetal ML2 plugin to perform actions across the whole of an Ironic deployment, if desirable where a "System" scoped user is also undesirable.

    This functionality can be tuned to utilize a customized project name aside from the default convention service, for example baremetal or admin, utilizing the [DEFAULT] rbac_service_project_name setting.

    Operators can alternatively entirely override the service_role RBAC policy rule, if so desired, however Ironic feels the default is both reasonable and delineates sufficiently for the variety of Role Based Access Control usage cases which can exist with a running Ironic deployment.

  • Query parameters in the API that expect lists now accept repeated arguments (param=value1&param=value2) in addition to comma-separated strings (param=value1,value2). The former seems to be more common and is actually (incorrectly) used in GopherCloud.

  • Fixes error handling in the virtual media attachment API when the image downloading fails. Now the last_error field is populated correctly and the error is logged.