[ English | русский | Deutsch | Indonesia | English (United Kingdom) ]

Recover a compute host failure¶

The following procedure addresses Compute node failure if shared storage is used.

Note

If shared storage is not used, data can be copied from the /var/lib/nova/instances directory on the failed Compute node ${FAILED_NODE} to another node ${RECEIVING_NODE}before performing the following procedure. Please note this method is not supported.

Re-launch all instances on the failed node.
Invoke the MariaDB command line tool.

Generate a list of instance UUIDs hosted on the failed node:

mysql> select uuid from instances where host = '${FAILED_NODE}' and deleted = 0;

Set instances on the failed node to be hosted on a different node:

mysql> update instances set host ='${RECEIVING_NODE}' where host = '${FAILED_NODE}' \
and deleted = 0;

Reboot each instance on the failed node listed in the previous query to regenerate the XML files:
```
# nova reboot —hard $INSTANCE_UUID
```

Find the volumes to check the instance has successfully booted and is at the login:

mysql> select nova.instances.uuid as instance_uuid, cinder.volumes.id \
as voume_uuid, cinder.volumes.status, cinder.volumes.attach_status, \
cinder.volumes.mountpoint, cinder.volumes,display_name from \
cinder.volumes inner join nova.instances on cinder.volumes.instance_uuid=nova.instances.uuid \
where nova.instances.host = '${FAILED_NODE}';

If rows are found, detach and re-attach the volumes using the values listed in the previous query:

# nova volume-detach $INSTANCE_UUID $VOLUME_UUID && \
# nova volume-attach $INSTANCE_UUID $VOLUME_UUID $VOLUME_MOUNTPOINT

Rebuild or replace the failed node as described in Add a compute host.

Recover a compute host failure

Recover a compute host failure¶

openstack-ansible 31.1.0.dev59