This section covers considerations that are equally important to all described architectures.
As explained in Bare Metal service overview, the Bare Metal service has three components.
The Bare Metal API service (ironic-api
) should be deployed in a similar
way as the control plane API services. The exact location will depend on the
architecture used.
The Bare Metal conductor service (ironic-conductor
) is where most of the
provisioning logic lives. The following considerations are the most
important when deciding on the way to deploy it:
nova-serialproxy
service (part of the Compute service)
is used, it has to be able to reach the conductors. Otherwise, they have to
be directly accessible by the users.The provisioning ramdisk which runs the ironic-python-agent
service
on start up.
Warning
The ironic-python-agent
service is not intended to be used or executed
anywhere other than a provisioning/cleaning/rescue ramdisk.
The Bare Metal service strives to provide the best support possible for a variety of hardware. However, not all hardware is supported equally well. It depends on both the capabilities of hardware itself and the available drivers. This section covers various considerations related to the hardware interfaces. See Enabling drivers and hardware types for a detailed introduction into hardware types and interfaces before proceeding.
The minimum set of capabilities that the hardware has to provide and the driver has to support is as follows:
Note
Strictly speaking, it is possible to make the Bare Metal service provision nodes without some of these capabilities via some manual steps. It is not the recommended way of deployment, and thus it is not covered in this guide.
Once you make sure that the hardware supports these capabilities, you need to find a suitable driver. Most of enterprise-grade hardware has support for IPMI and thus can utilize IPMI driver. Some newer hardware also supports Redfish driver. Several vendors provide more specific drivers that usually provide additional capabilities. Check Drivers, Hardware Types and Hardware Interfaces to find the most suitable one.
The boot interface of a node manages booting of both the deploy ramdisk and the user instances on the bare metal node. The deploy interface orchestrates the deployment and defines how the image gets transferred to the target disk.
The main alternatives are to use PXE/iPXE or virtual media - see Boot interfaces for a detailed explanation. If a virtual media implementation is available for the hardware, it is recommended using it for better scalability and security. Otherwise, it is recommended to use iPXE, when it is supported by target hardware.
There are two deploy interfaces in-tree, iscsi
and direct
. See
Deploy Interfaces for explanation of the difference.
With the iscsi
deploy method, most of the deployment operations happen on
the conductor. If the Object Storage service (swift) or RadosGW is present in
the environment, it is recommended to use the direct
deploy method for
better scalability and reliability.
The Bare Metal services does not impose too many restrictions on the characteristics of hardware itself. However, keep in mind that
By default, the Bare Metal service will pick the smallest hard drive that is larger than 4 GiB for deployment. Another hard drive can be used, but it requires setting root device hints.
Note
This device does not have to match the boot device set in BIOS (or similar firmware).
The machines should have enough RAM to fit the deployment/cleaning ramdisk to run. The minimum varies greatly depending on the way the ramdisk was built. For example, tinyipa, the TinyCoreLinux-based ramdisk used in the CI, only needs 400 MiB of RAM, while ramdisks built by diskimage-builder may require 3 GiB or more.
The Bare Metal service can deploy two types of images:
Whole-disk images that contain a complete partitioning table with all necessary partitions and a bootloader. Such images are the most universal, but may be harder to build.
Partition images that contain only the root partition. The Bare Metal service will create the necessary partitions and install a boot loader, if needed.
Warning
Partition images are only supported with GNU/Linux operating systems.
Warning
If you plan on using local boot, your partition images must contain GRUB2 bootloader tools to enable ironic to set up the bootloader during deploy.
The Bare Metal service supports booting user instances either using a local
bootloader or using the driver’s boot interface (e.g. via PXE or iPXE
protocol in case of the pxe
interface).
Network boot cannot be used with certain architectures (for example, when no tenant networks have access to the control plane).
Additional considerations are related to the pxe
boot interface, and other
boot interfaces based on it:
The default boot option for the cloud can be changed via the Bare Metal service configuration file, for example:
[deploy]
default_boot_option = local
This default can be overridden by setting the boot_option
capability on a
node. See Local boot with partition images for details.
Note
Currently, network boot is used by default. However, we plan on changing it
in the future, so it’s safer to set the default_boot_option
explicitly.
There are several recommended network topologies to be used with the Bare Metal service. They are explained in depth in specific architecture documentation. However, several considerations are common for all of them:
Note
In the majority of cases, the same network should be used for cleaning, provisioning and rescue for simplicity.
Unless noted otherwise, everything in these sections apply to all three networks.
The baremetal nodes must have access to the Bare Metal API while connected to the provisioning/cleaning/rescuing network.
Note
Only two endpoints need to be exposed there:
GET /v1/lookup
POST /v1/heartbeat/[a-z0-9\-]+
You may want to limit access from this network to only these endpoints, and make these endpoint not accessible from other networks.
If the pxe
boot interface (or any boot interface based on it) is used,
then the baremetal nodes should have untagged (access mode) connectivity
to the provisioning/cleaning/rescuing networks. It allows PXE firmware, which
does not support VLANs, to communicate with the services required
for provisioning.
Note
It depends on the network interface whether the Bare Metal service will handle it automatically. Check the networking documentation for the specific architecture.
Sometimes it may be necessary to disable the spanning tree protocol delay on the switch - see DHCP during PXE or iPXE is inconsistent or unreliable.
The Baremetal nodes need to have access to any services required for provisioning/cleaning/rescue, while connected to the provisioning/cleaning/rescuing network. This may include:
direct
deploy interface and some virtual media boot interfacesThe Baremetal Conductors need to have access to the booted baremetal nodes during provisioning/cleaning/rescue. A conductor communicates with an internal API, provided by ironic-python-agent, to conduct actions on nodes.
The Bare Metal API service is stateless, and thus can be easily scaled horizontally. It is recommended to deploy it as a WSGI application behind e.g. Apache or another WSGI container.
Note
This service accesses the ironic database for reading entities (e.g. in
response to GET /v1/nodes
request) and in rare cases for writing.
The Bare Metal conductor service utilizes the active/active HA model. Every conductor manages a certain subset of nodes. The nodes are organized in a hash ring that tries to keep the load spread more or less uniformly across the conductors. When a conductor is considered offline, its nodes are taken over by other conductors. As a result of this, you need at least 2 conductor hosts for an HA deployment.
Conductors can be resource intensive, so it is recommended (but not required) to keep all conductors separate from other services in the cloud. The minimum required number of conductors in a deployment depends on several factors:
sync_power_state_interval
option),max_concurrent_builds
option for the Compute service).We recommend a target of 100 bare metal nodes per conductor for maximum reliability and performance. There is some tolerance for a larger number per conductor. However, it was reported [1] [2] that reliability degrades when handling approximately 300 bare metal nodes per conductor.
Each conductor needs enough free disk space to cache images it uses. Depending on the combination of the deploy interface and the boot option, the space requirements are different:
The deployment kernel and ramdisk are always cached during the deployment.
The iscsi
deploy method requires caching of the whole instance image
locally during the deployment. The image has to be converted to the raw
format, which may increase the required amount of disk space, as well as
the CPU load.
Note
This is not a concern for the direct
deploy interface, as in this case
the deployment ramdisk downloads the image and either streams it to the
disk or caches it in memory.
When network boot is used, the instance image kernel and ramdisk are cached locally while the instance is active.
Note
All images may be stored for some time after they are no longer needed.
This is done to speed up simultaneous deployments of many similar images.
The caching can be configured via the image_cache_size
and
image_cache_ttl
configuration options in the pxe
group.
[1] | http://lists.openstack.org/pipermail/openstack-dev/2017-June/118033.html |
[2] | http://lists.openstack.org/pipermail/openstack-dev/2017-June/118327.html |
When integrating with other OpenStack services, more considerations may need to be applied. This is covered in other parts of this guide.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.