Container-Based Steps¶
Overview¶
The Container Hardware Manager in ironic-python-agent (IPA) allows running OCI-compatible containers as steps on bare metal nodes. This enables operators to package arbitrary tools – firmware updaters, diagnostic suites, compliance scanners – as container images and execute them during any step-based workflow, such as cleaning, deployment, or servicing.
Basics¶
A workflow for implementing container-based steps with runbooks is:
The operator builds an IPA ramdisk with the
ironic-python-agent-podmandiskimage-builder element, which installs podman and the Container Hardware Manager into the ramdisk.The Ironic conductor sends
[agent_containers]configuration to IPA via the lookup/heartbeat endpoint. This allows conductor-side settings to override any build-time defaults in the ramdisk.When a runbook triggers a
container_clean_step, IPA uses podman (or docker) to pull and run the specified container image on the bare metal node.The container runs with host networking by default, executes its task, and exits. IPA reports the result back to the conductor.
Prerequisites¶
IPA ramdisk with podman support¶
The IPA ramdisk must be built with the ironic-python-agent-podman
diskimage-builder (DIB) element. This element is currently Debian-based
only.
export DIB_ALLOW_ARBITRARY_CONTAINERS=true
export DIB_RUNNER=podman
disk-image-create ironic-python-agent-ramdisk \
ironic-python-agent-podman \
debian -o ipa-with-podman
Key DIB environment variables:
DIB_ALLOW_ARBITRARY_CONTAINERSSet to
trueto allow any container image. Set tofalse(default) to restrict to a specific allowlist. Environments which permit non-admin roles to create and execute runbooks should not set this totruefor security reasons.DIB_ALLOWED_CONTAINERSComma-separated list of allowed container image URLs. Only used when
DIB_ALLOW_ARBITRARY_CONTAINERSisfalse.DIB_RUNNERContainer runtime:
podman(default) ordocker.
Container registry access¶
The container registry hosting your images must be accessible from the
cleaning network. If using a private registry, ensure credentials and TLS
certificates are configured in the ramdisk or passed via
pull_options.
Ironic Conductor Configuration¶
The [agent_containers] configuration group controls how the conductor
instructs IPA to handle containers. These settings are sent to IPA at
lookup time, so changes take effect without rebuilding the ramdisk.
[agent_containers]
# Allow any container image (default: false)
allow_arbitrary_containers = false
# Allowlist of container images (used when above is false)
allowed_containers = docker://registry.example.com/firmware-tool:latest,docker://registry.example.com/diag-suite:v2
# Container runtime (default: podman)
runner = podman
# Options passed to the pull command
pull_options = --tls-verify=false
# Options passed to the run command
run_options = --rm --network=host --tls-verify=false
Warning
Setting allow_arbitrary_containers = true allows any container
image to be pulled and executed with host-level network access on the
bare metal node. Only enable this in trusted environments. Prefer using
allowed_containers to maintain an explicit allowlist.
See also:
agent_containers.allow_arbitrary_containers,
agent_containers.allowed_containers,
agent_containers.runner,
agent_containers.pull_options,
agent_containers.run_options.
Example Container-based Runbooks¶
The built-in step¶
The Container Hardware Manager exposes a built-in cleaning step called
container_clean_step on the deploy interface. This step has a
default priority of 0, meaning it only runs when explicitly invoked
via manual cleaning, servicing, or a runbook.
The step accepts the following arguments:
container_url(required)The full container image URL, e.g.
docker://registry.example.com/firmware-tool:latest.pull_options(optional)Override the default pull options for this specific container.
run_options(optional)Override the default run options for this specific container.
Single-container runbook¶
This example creates a runbook that runs a single firmware update container:
baremetal runbook create \
--name CUSTOM_CONTAINER_FW_UPDATE \
--steps '[
{
"interface": "deploy",
"step": "container_clean_step",
"args": {
"container_url": "docker://registry.example.com/firmware-tool:latest"
},
"order": 1
}
]'
Multi-container runbook¶
Runbooks can combine multiple container steps with traditional steps. This example runs a diagnostic container, then a firmware updater, and finishes with a standard disk metadata erase:
baremetal runbook create \
--name CUSTOM_CONTAINER_CLEAN \
--steps '[
{
"interface": "deploy",
"step": "container_clean_step",
"args": {
"container_url": "docker://registry.example.com/diag-suite:v2"
},
"order": 1
},
{
"interface": "deploy",
"step": "container_clean_step",
"args": {
"container_url": "docker://registry.example.com/firmware-tool:latest",
"run_options": "--rm --network=host --privileged"
},
"order": 2
},
{
"interface": "deploy",
"step": "erase_devices_metadata",
"args": {},
"order": 3
}
]'
Adding traits to nodes¶
Runbooks are matched to nodes via traits. Add the matching trait to all nodes that should use the runbook:
baremetal node add trait <node> CUSTOM_CONTAINER_CLEAN
Using the Runbook¶
Manual cleaning¶
Trigger the runbook on a node in manageable state:
baremetal node clean <node> --runbook CUSTOM_CONTAINER_CLEAN
Automated cleaning¶
To use container-based steps for automated cleaning, configure the conductor to use runbook-based or hybrid cleaning and assign the runbook. See Configuring automated cleaning with runbooks for full details on the available configuration levels (per-node, per-resource-class, global).
A minimal example using the global default:
[conductor]
automated_clean = true
automated_cleaning_step_source = runbook
automated_cleaning_runbook = CUSTOM_CONTAINER_CLEAN
All nodes must have the matching trait (CUSTOM_CONTAINER_CLEAN) unless
trait validation is disabled via
conductor.automated_cleaning_runbook_validate_traits.
Servicing¶
Container steps also work with Node servicing. Trigger a container
runbook on an active node:
baremetal node service <node> --runbook CUSTOM_CONTAINER_CLEAN
Alternative Methods¶
Operators may utilize container-based steps that are hardcoded via configuration in-ramdisk.
Ironic-python-agent can be configured to expose arbitrary steps using containers for use in workflows, including automated cleaning, via a yaml configuration file.
For example:
steps:
- name: manage_container_cleanup
image: docker://172.24.4.1:5000/cleaning-image:latest
interface: deploy
reboot_requested: true
pull_options:
- --tls-verify=false
run_options:
- --rm
- --network=host
- --tls-verify=false
abortable: true
priority: 20
- name: manage_container_cleanup2
image: docker://172.24.4.1:5000/cleaning-image2:latest
interface: deploy
reboot_requested: true
pull_options:
- --tls-verify=false
run_options:
- --rm
- --network=host
- --tls-verify=false
abortable: true
priority: 10
By placing a file in your IPA ramdisk with these contents in
the path indicated by
agent_containers.container_steps_file,
cleaning steps manage_container_cleanup and
manage_container_cleanup2 will be reported as available
cleaning steps at the indicated priority.
This is useful for high-security environments which would prefer the hassle of rebuilding a ramdisk to the risk of permitting runtime decisions around what containers to clean with.
Security Considerations¶
Prefer allowlisting over
allow_arbitrary_containers = true. The allowlist (allowed_containers) restricts which images IPA will accept, reducing the risk of running untrusted code.TLS verification – the default
pull_optionsandrun_optionsinclude--tls-verify=falsefor development convenience. In production, remove this flag and ensure proper TLS certificates are available in the ramdisk.Container privileges – by default, containers run with
--network=host, giving them full access to the node’s network stack. Reviewrun_optionsand consider adding--read-onlyor dropping capabilities where possible.
Troubleshooting¶
- Container pull failures
Check that the container registry is accessible from the cleaning network. Verify the image URL in the runbook step. If using TLS, ensure certificates are configured correctly in the ramdisk or add
--tls-verify=falsetopull_optionsfor testing.- Step not found: container_clean_step
The IPA ramdisk was not built with the
ironic-python-agent-podmanelement. Rebuild the ramdisk with podman support as described in Prerequisites.- Container rejected by allowlist
The container URL does not match any entry in
allowed_containersandallow_arbitrary_containersisfalse. Either add the image to the allowlist or setallow_arbitrary_containers = truein[agent_containers].- Trait mismatch
The node does not have a trait matching the runbook name. Add the trait with
baremetal node add trait <node> <RUNBOOK_NAME>.