Cyborg agent interacts with each Cyborg driver in the compute node to discover available devices. This spec defines how the agent-driver API is structured.
No change is proposed to the way the agent discovers the drivers on start or restart.
This spec is common to all accelerators, including GPUs, High Precision Time Synhronization (HPTS) cards, etc. Since FPGAs have more aspects to be considered than other devices, some sections may focus on FPGA-specific factors. The spec calls out the FPGA-specific aspects.
The scope of this spec is Rocky release, but the API has been designed to be extensible for future releases. Accordingly, the spec calls out the Rocky-specific aspects.
The [1] specifies that devices are represented using Resource Providers (RPs), Resource Classes (RCs) and traits. The information needed to create them has to come from the Cyborg driver to the Cyborg agent, which in turn needs to push it to the Cyborg Conductor.
The main challenge is discovering the device topology for FPGAs. An FPGA may have one or more Partial Reconfiguration regions, and those regions may have one or more accelerators nested inside them. Further, it may have local memory that is either partitioned or shared among the regions.
Cyborg will assume and handle the following component relationships:
Today, the Cyborg agent invokes the discover() API for each driver that it finds. The discover() API returns a dictionary indexed by the PCI BDF of a device. The value element in the key-value pair of the dictionary contains the components and characteristics of the device with that BDF.
We propose to retain the same model, but enhance the dictionary to include enough information to create the resource providers and traits needed to populate Placement. Here are the additional proposed keys in the device dictionary for each PF:
"type": <enum-string>
# One of GPU, FPGA, etc."vendor": <string>
"product": <string>
Also, in the regions
entry for each PF, it is proposed to add
the following keys:
"region-type-uuid": <uuid>
# Optional, default: NULL"bitstream-id": <uuid>
# Glance/other UUID, optional, default: NULL"function-uuid": <uuid>
# Optional, default: NULLWhen the agent receives this dictionary for a device, it will do the following:
CUSTOM_<type>_<vendor>_<product>
.
Apply it to the device RP (if nRP support exists) or the compute node RP.CUSTOM_<type>_<vendor>_REGION_<type-uuid>
. Apply them to the
corresponding region RP (if nRP support exists) or the compute node RP.CUSTOM_<type>_<vendor>_FUNCTION_<function-uuid>
. Apply them to the
corresponding region RP (if nRP support exists) or the compute node RP.N/A
Add the new fields to the database under Deployables and Attributes.
None
None
None
None
None
None
None
None
Need to update unit tests to check for the newly added fields.
None
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.