Rocky Series (11.0.0 - 11.1.x) Release Notes¶
11.1.4-12¶
Bug Fixes¶
Fixes ‘Invalid parameter value for SpanLength’ when configuring RAID using Python 3. This passed incorrect data type to iDRAC, e.g., instead of 2 it passed 2.0. See story 2004265.
Cleans up nodes stuck in the
deleting
state on conductor restart.
Fixes vague node
last_error
field reporting upon deploy step failure by providing the exception error message in addition to the step that failed.
Kill
ipmitool
process invoked by ironic to read node’s power state ifipmitool
process does not exit after configured timeout expires. It appears pretty common foripmitool
to run for five minutes (with current ironic defauls) once it hits a non-responsive bare metal node. This could slow down the management of other nodes due periodic tasks slots exhaustion. The new behaviour could is enabled by default, but could be disabled via the[ipmi]kill_on_timeout
ironic configuration option.
Fixed a bug where rebooting a node managed by the
idrac
hardware type when using the WS-MAN power interface sometimes fails with aThe command failed to set RequestedState
error. See bug 2007487 for details.
Adds
command_timeout
andmax_command_attempts
configuration options to IPA, so when connection errors occur the command will be executed again.
Fixes an issue where
ironic-conductor
initialization could return aNodeNotLocked
error for requests requiring locks when the conductor was starting. This was due to the conductor removing locks after beginning accepting new work. The lock removal has been moved to after the Database connectivity has been established but before the RPC bus is initialized.
11.1.4¶
Bug Fixes¶
Fixes a deployment issue encountered during deployment, more precisely during the configdrive partition creation step. On some specific devices like NVMe drives, the created configdrive partition could not be correctly identified (required to dump data onto it afterward). https://storyboard.openstack.org/#!/story/2005764
Fixes an issue with using serial number as root device hints with the
ansible
deploy interface.
Fixes an issue regarding the
ansible
deploy interface. Node deployment was broken for any image that was not public because the original request context was not available anymore at the time some image information was fetched.
Fixes issue where the resource list API returned results with requested fields only until the API MAX_LIMIT. After the API MAX_LIMIT is reached the API started ignoring user requested fields. This fix will make sure that the next url generated by the pagination code will include the user requested fields as query parameter.
Fixes an issue where the pagination marker was not being set if
uuid
was not in the list of requested fields when executing a list query. The affected API endpoints were: port, portgroup, volume_target, volume_connector, node and chassis. See story 2003192 for more details.
Fixes an issue where baremetal node deployment would fail on clouds with a high number of security groups. Listing the security groups took too long. Instead of listing all security groups, a query filter was added to list only the security groups to be used for the network. (See bug 2006256.)
Fixes a bug with the grub ramdisk boot template handling, such that the template now properly references the user provided kernal and ramdisk. Previously the deployment ramdisk and kernel was referenced in the template.
Fixes an issue in updating firmware using
update_firmware_sum
clean step from management interface ofilo
hardware type with an error stating that unable to connect to iLO address due to authentication failure. See story 2006223 for details.
11.1.3¶
Deprecation Notes¶
Using the
fake
management interface with themanual-management
hardware type is deprecated, please usenoop
instead. Existing nodes will have to be updated after the upgrade.
Bug Fixes¶
Fixes an issue regarding the
ansible deployment interface
cleaning workflow. Handling the error in the driver and returning nothing caused the manager to consider the step done and go to the next one instead of interrupting the cleaning workflow.
Fixes an issue with the ansible deployment interface where raw images could not be streamed correctly to the host.
Fixes deployment with the
ansible
deploy interface and instance images with GPT partition table.
Fixes an issue where the sensor data parsing method for the
ipmitool
interface lacked the ability to handle the automatically included ipmitool debugging information when thedebug
option is set toTrue
in the ironic.conf file. As such, extra debugging information supplied by the underlyingipmitool
command is disregarded. More information can be found in story 2005331.
Fixes an issue where deploy fails during node preparation if the node
capabilities
are passed as string.
Fixes an issue for validating checksum when trying to calculate the actual checksum and failing with UnicodeDecode Error. The fix uses the oslo_utils library for calculating the actual checksum.
The
manual-management
hardware type now defaults to thenoop
management interface. Unlike thefake
management interface, it does not fail on attempt to set the boot device to the local disk.
Fixes a bug where cinder block storage service volumes volume fail to attach expecting a mountpoint to be a valid string. See story 2004864 for additional information.
Returns the correct error message on providing an invalid reference to
image_source
. Previously an internal error was raised.
Reverts the fix to the
idrac
hardware type creating port objects during inspection withpxe_enabled
fields not set to reflect the configuration of the physical ports. It is inconsistent with the stable branch policy [1]. It requirespython-dracclient
version 1.5.0 and greater; however,driver-requirements.txt
specifies version 1.3.0 and greater can be used on this branch.[1] https://docs.openstack.org/project-team-guide/stable-branches.html
11.1.2¶
Bug Fixes¶
A bug has been fixed in the node update code that could cause the nodes to become not updatable if their driver is no longer available.
Fixes an issue where the master instance image cache could not be disabled. The configuration option
[pxe]/instance_master_path
may now be set to the empty string to disable the cache.
Fixes an issue where the master TFTP image cache could not be disbled. The configuration option
[pxe]/tftp_master_path
may now be set to the empty string to disable the cache. For more information, see story 2004608.
Fixes a bug where ironic port is not updated in node introspection as per PXE enabled setting for
idrac
hardware type. See bug 2004340 for details.
11.1.1¶
New Features¶
Setting these configuration options to 0 will disable the periodic tasks:
[conductor]sync_power_state_interval: sync power states for the nodes
[conductor]check_provision_state_interval:
check deployments and time out if the deployment takes too long
check the status of cleaning a node and time out if it takes too long
check the status of inspecting a node and time out if it takes too long
check for and handle nodes that are taken over by new conductors (if an old conductor disappeared)
[conductor]send_sensor_data_interval: send sensor data to ceilometer
[conductor]sync_local_state_interval: refresh a conductor’s copy of the consistent hash ring. If any mappings have changed, determines which, if any, nodes need to be “taken over”. The ensuing actions could include preparing a PXE environment, updating the DHCP server, and so on.
[oneview]periodic_check_interval:
check for nodes taken over by OneView users
check for nodes freed by OneView users
Known Issues¶
Building RAID1 is known to not work with Dell BOSS cards using python-dracclient 1.4.0 or earlier. Upgrade to python-dracclient 1.5.0 to use this feature.
Upgrade Notes¶
The
hash_ring_reset_interval
configuration option was changed from 180 to 15 seconds. Previously, this option was essentially ignored on the API side, becase the hash ring was reset on each API access. The lower value minimizes the probability of a request routed to a wrong conductor when the ring needs rebalancing.
If you are doing a minor version upgrade, please re-run the
ironic-dbsync online_data_migrations
command to properly update the versions of the Objects in the database. Otherwise, the next major upgrade may fail.
Critical Issues¶
The
ironic-dbsync online_data_migrations
command was not updating the objects to their latest versions, which could prevent upgrades from working (i.e. when running the next release’sironic-dbsync upgrade
). Objects are updated to their latest versions now when running that command. See story 2004174 for more information.
Bug Fixes¶
Fixes an issue with a baremetal node that times out during cleaning. The ironic-conductor was attempting to change the node’s provision state to ‘clean failed’ twice, resulting in the node’s
last_error
being set incorrectly. This no longer happens. For more information, see story 2004299.
Fixes an issue where setting these configuration options to 0 caused a ValueError exception to be raised. You can now set them to 0 to disable the associated periodic tasks. (For more information, see story 2002059.):
[conductor]sync_power_state_interval: sync power states for the nodes
[conductor]check_provision_state_interval:
check deployments and time out if the deployment takes too long
check the status of cleaning a node and time out if it takes too long
check the status of inspecting a node and time out if it takes too long
check for and handle nodes that are taken over by new conductors (if an old conductor disappeared)
[conductor]send_sensor_data_interval: send sensor data to ceilometer
[conductor]sync_local_state_interval: refresh a conductor’s copy of the consistent hash ring. If any mappings have changed, determines which, if any, nodes need to be “taken over”. The ensuing actions could include preparing a PXE environment, updating the DHCP server, and so on.
[oneview]periodic_check_interval:
check for nodes taken over by OneView users
check for nodes freed by OneView users
Fixes an issue where Neutron ports would be left with a baremetal MAC address associated after an instance is deleted from a baremetal host. This caused problems with MAC address conflicts in follow up deployments to the same baremetal host. bug 2004428.
Fixes an issue where a flat Neutron port would be left with a host ID associated with it after an instance is deleted from a baremetal host. This caused problems with reusing the same port for a new instance as it is already bound to the old instance.
Fixes a bug where the number of CPU sockets was being returned by the
idrac
hardware type during introspection, instead of the number of virtual CPUs. See bug 2004155 for details.
Fixes a race condition in the hash ring implementation that could cause an internal server error on any request. See story 2003966 for details.
Properly reports an error when the image cache and the image HTTP or TFTP location are on different file system, causing hard link to fail.
Fixes an issue where iSCSI based deployments fail if the
cpu_arch
property is not specified on a node.
Fixes
redfish
hardware type to reuse HTTP session tokens when talking to BMC using session authentication. Prior to this fixredfish
hardware type never tried to reuse session token given out by BMC during previous connection what may sometimes lead to session pool exhaustion with some BMC implementations.
Fixes an issue wherein provisioning fails if ironic node is configured with
ramdisk
deploy interface. See bug 2003532 for more details.
The IPMI hardware type unconditionally instructed the BMC to not automatically clear boot flag valid bit if Chassis Control command not received within 60-second timeout (countdown restarts when a Chassis Control command is received). Some BMCs do not support setting this; if sent it causes the boot to be aborted instead. For IPMI hardware type a new driver option
node['driver_info']['ipmi_disable_boot_timeout']
can be specified. It isTrue
by default; set it toFalse
to bypass sending this command. See story 2004266 for additional information.
11.1.0¶
Prelude¶
Ironic 11.1… Where the volume dial turned more!
While Pixie Boots has rocked out to Rock and Roll, the Bare Metal as a Service team has wrapped up our Rocky release with 11.1. This new release contains a number of major features that we hope will improve the lives of bare metal operators everywhere!
Conductor grouping enabling nodes to be assigned to groups of different conductors.
Deployment steps framework enabling greater flexibility for deployers to request specific steps.
Bios setting interfaces for the
ilo
andirmc
hardware types.Ramdisk deployment interface for disk-less deployments.
Capability to reset nodes to their default interfaces via the API when resetting the node’s driver.
New Features¶
Added support for local booting a partition image for ppc64* hardware. If a PReP partition is detected when deploying to a ppc64* machine, the partition will be specified to IPA causing the bootloader to be installed there directly. This feature requires a ironic-python-agent ramdisk with ironic-lib >=2.14.
Adds new optional
snmp_community_read
andsnmp_community_write
properties tosnmp
driver configuration (specified via a node’sdriver_info
field). If present, the value(s) will be used respectively for SNMP reads and/or writes to the PDU. When not present,snmp_community
value will be used instead.
The iRMC driver can now automatically update the node.traits field with CUSTOM_CPU_FPGA value based on information provided by the node during node inspection.
Adds a
ramdisk
deploy interface for deployments that wish to network boot to a ramdisk, as opposed to perform a complete traditional deployment to a physical media. This may be useful in scientific use cases or where ephemeral baremetal machines are desired.The
ramdisk
deploy interface is intended for advanced users and has some particular operational caveats that the users should be aware of prior to use, such as network access list requirements and configuration drive architectural restrictions and the inability to leverage configuration drives.
Adds a new configuration option
[pxe]pxe_config_subdir
to allow operators to define the specific directory that may be used inside of/tftpboot
or/httpboot
for a boot loader to locate the configuration file for the node. This option defaults topxelinux.cfg
which is the directory that the Syslinux pxelinux.0 bootloader utilised. Operators may wish to change the directory name if they are using other boot loaders such as GRUB or iPXE.
Conductors and nodes may be arbitrarily grouped to provide a basic level of affinity between conductors and nodes. Conductors use the
[conductor]/conductor_group
configuration option to set the group which they belong to. The same value may be set on one or more nodes in theconductor_group
field (available in API version 1.46), and these will be matched such that only conductors with a given group will manage nodes with the same group.A group name may be up to 255 characters containing
a-z
,0-9
,_
,-
, and.
. The group is case-insensitive. The default group is the empty string (""
).The “node list” API endpoint (
GET /v1/nodes
) may also be filtered by conductor group in API version 1.46.
The framework for deployment steps is in place. All in-tree drivers (DeployInterfaces) have one (big) deploy step; the conductor executes this step when deploying a node.
Starting with the Bare Metal REST API version 1.44, the current deploy step (if any) being executed is available in a node’s
deploy_step
field in the responses for the following queries:GET /v1/nodes/<node identifier>
GET /v1/nodes/detail
GET /v1/nodes?fields=deploy_step,...
Implements
bios
interface forilo
hardware type. Adds the list of supported bios interfaces for the ilo hardware type. Adds manual cleaning stepsapply_configuration
andfactory_reset
which support managing the BIOS settings for the iLO servers using ilo hardware type.
Adds support for the new
noop
interface to theipmi
hardware type. This interface targets hardware that does not correctly change boot mode via the IPMI protocol. Using it requires pre-configuring the boot order on a node to try PXE, then fall back to local booting.
Adds new
bios
interface toirmc
hardware type. This provides out-of-band BIOS configuration solution for iRMC driver which makes the functionality available via manual cleaning.
Adds out-of-band RAID configuration solution for the iRMC driver which makes the functionality available via manual cleaning. See iRMC hardware type documentation for more details.
Starting with API version 1.45, PATCH requests to
/v1/nodes/<NODE>
accept the new query parameterreset_interfaces
. It can be provided whenever thedriver
field is updated. If set to ‘true’, all hardware interfaces wil be reset to their defaults, except for ones updated in the same request.
Upgrade Notes¶
Operators utilizing
grub
for PXE booting, typically with UEFI, should change their deployed master PXE configuration file provided for nodes PXE booting using grub. Ironic 11.1 now writes both MAC address and IP address based PXE confiuration links for network booting viagrub
. The grub variable should be changed from$net_default_ip
to$net_default_mac
. IP address support is deprecated and will be removed in the Stein release.
The minimum required version of pysnmp has been bumped to 4.3. This pysnmp version introduces simpler, faster and more functional high-level SNMP API on which ironic snmp driver has been migrated.
The minimum required version of the
osprofiler
library is now 1.5.0. This is now a new dependency, ironic has not been able to start with 1.4.0 since the Pike release when this dependency was introduced.
The
swift/endpoint_type
configuration option is now removed. python-swiftclient 3.2.0 (Ocata) and above removed support for the native URL type used by radosgw. Since using aswift/endpoint_type
value ofradosgw
would fail anyway, it is removed. Deployers must now configure ceph withrgw swift account in url = True
. This must be set before upgrading to this release.
The
snmp
hardware type now uses thenoop
management interface instead offake
used previously. Support forfake
is left for backward compatibility.
Deprecation Notes¶
All drivers must implement their deployment process using deploy steps. Out-of-tree drivers without deploy steps will be supported until the Stein release. For more details, see story 1753128.
The
xclarity
hardware type, as well as the supporting driver interfaces have been deprecated and are scheduled to be removed from ironic in the Stein development cycle. This is due to the lack of operational Third Party testing to help ensure that the support for Lenovo XClarity is functional.The
xclarity
hardware type was introduced at the end of the Queens development cycle. During implementation of Third Party CI, the Lenovo team encountered some unforseen delays. Lenovo is continuing to work towards Third Party CI, and upon establishment and verification of functional Third Party CI, this deprecation will be rescinded.
Support for ironic to link PXE boot configuration files via the assigned interface IP address has been deprecated. This option was only the case when
[pxe]ipxe_enabled
was set tofalse
and the node was being deployed using UEFI.
Using the
fake
management interfaces with thesnmp
hardware type is now deprecated, please usenoop
instead.
Bug Fixes¶
Better handles the case when an operator attempts to perform an upgrade from a release older than Pike, directly to a release newer than Pike, skipping one or more releases in between (i.e. a “skip version upgrade”). Instead of crashing, the operator will be informed that upgrading from a version older than the previous release is not supported (skip version upgrades) and that (as of Pike) all database migrations need to be performed using the previous releases for a fast-forward upgrade. [Bug 2002558]
Fixes support for
grub
based UEFI PXE booting by enabling links to the PXE configuration files to be written using the MAC address of the node in addition to the interface IP address. If the[dhcp]dhcp_provider
option is set tonone
, only the MAC based links will be created.
Fixes an issue that caused the integrated Dell Remote Access Controller (iDRAC)
management
hardware interface implementation,idrac
, to fail to boot nodes in Unified Extensible Firmware Interface (UEFI) boot mode. That interface is supported by theidrac
hardware type. The issue is resolved for Dell EMC PowerEdge 13th and 14th generation servers. It is not resolved for PowerEdge 12th generation and earlier servers. For more information, see story 1656841.
If a node gets stuck in one of the states
deploying
,cleaning
,verifying
,inspecting
,adopting
,rescuing
,unrescuing
for some reason (eg. conductor goes down when executing a task), it will be moved to an appropriate failure state in the next time the conductor starts.
Changes the iPXE behavior to retry a total of 10 times with an increasing backoff time between each retry in order to not create a Denial of Service situation with the iPXE HTTP server. Should the retries fail, the node will be powered-off after a warning is displayed on the console for 30 seconds. For more information, see story.
The cleaning operation may fail, if an in-band clean step were to execute after the completion of out-of-band clean step that performs reboot of the node. The failure is caused because of race condition where in cleaning is resumed before the Ironic Python Agent(IPA) is ready to execute clean steps. This has been fixed. For more information, see bug 2002731.
Other Notes¶
The deprecated configuration option
[ipmi]retry_timeout
was removed, use[ipmi]command_retry_timeout
instead.