https://blueprints.launchpad.net/cyborg/+spec/cyborg-nova-interaction
Cyborg provides a general management framework for accelerators, such as FPGAs, GPUs, etc. For scheduling an instance that needs accelerators, Cyborg needs to work with Nova on three levels:
This spec addresses the first two aspects. There is another spec to address the attachment of accelerators to instances [1]. Cyborg also needs to handle some aspects for FPGAs without involving Nova, specifically FPGA programming and bitstream management. They will be covered in other specs. This spec is independent of those specs.
This spec is common to all accelerators, including GPUs, High Precision Time Synchronization (HPTS) cards, etc. Since FPGAs have more aspects to be considered than other devices, some sections may focus on FPGA-specific factors. The spec calls out the FPGA-specific aspects.
Smart NICs based on FPGAs fall into two categories: those which expose the FPGA explicitly to the host, and those that do not. Cyborg’s scope includes the former. This spec includes such devices, though the Cyborg-Neutron interaction is out of scope.
The scope of this spec is Rocky release.
Here is an example diagram for an FPGA with multiple regions, and multiple functions in a region:
PCI A PCI B
| |
+-------|--------|-------------------+
| | | |
| +----|--------|---+ +--------+ |
| | +--|--+ +---|-+ | | | |
| | | Fn A| | Fn B| | | | |
| | +-----+ +-----+ | | | |
| +-----------------+ +--------+ |
| Region 1 Region 2 |
| |
+------------------------------------+
Cyborg’s representation and handling of accelerators needs to be consistent with Nova’s Placement API. Specifically, they must be modeled in terms of Resource Providers (RPs), Resource Classes (RCs) and Traits.
Though PCI Express is entrenched in the data center, some accelerators may be exposed to the host via some other protocol. Even with PCI, the connections between accelerator components and PCI functions may vary across devices. Accordingly, Cyborg should not represent accelerators as PCI functions.
For instances that need accelerators, we need to define a way for Cyborg to be included seamlessly in the Nova scheduling workflow.
We need to satisfy the following use cases for the tenant role:
An operator must be able to provide both Device as a Service and Accelerated Function as a Service in the same cluster, to serve all kinds of users: those who are device-agnostic, those using 3rd party bitstreams, and those using their own bitstreams (incl. developers).
The goal for Cyborg is to provide the mechanisms to enable all these use cases.
In this spec, we do not consider bitstream developer or device developer roles. Also, we assume that each accelerator device is dedicated to a compute node, rather than shared among several nodes.
Cyborg will represent a generic accelerator for a device type as a custom Resource Class (RC) for that type, of the form CUSTOM_ACCELERATOR_<device-type>. E.g. CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA, etc. This helps in defining separate quotas for different device types.
Device-local memory is the memory available to the device alone, usually in the form of DDR, QDR or High Bandwidth Memory in the PCIe board along with the device. It can also be represented as an RC of the form CUSTOM_ACCELERATOR_MEMORY_<memory-type>. E.g. CUSTOM_ACCELERATOR_MEMORY_DDR. A single PCIe board may have more than one type of memory.
In addition, each device/region is represented as a Resource Provider (RP). This enables traits to be applied to it and other RPs/RCs to be contained within it. So, a device RP provides one or more instances of that device type’s RC. This depends on nested RP support in Nova [2].
- For FPGAs, both the device and the regions within it will be represented as RPs. This allows the hierarchy within an FPGA to be naturally modelled as an RP hierarchy.
- Using Nested RPs is the preferred way. But, until Nova supports nested RPs, Cyborg shall associate the RCs and traits (described below) with the compute node RPs. This requires that all devices on a single host must share the same traits. If nested RP support becomes usable after Rocky release, the operator needs to handle the upgrade as below:
- Terminate all instances using accelerators.
- Remove all Cyborg traits and inventory on all compute node RPs, perhaps by running a script.
- Perform the Cyborg upgrade. Post-upgrade, the new agent/driver(s) will create RPs for the devices and publish the traits and inventory.
Cyborg will associate a Device Type trait with each device, of the form CUSTOM_<device-type>-<vendor>. E.g. CUSTOM_GPU_AMD or CUSTOM_FPGA_XILINX. This trait is intended to help match the software drivers/libraries in the instance image. This is meant to be used in a flavor when a single driver/library in the instance image can handle most or all of device types from a vendor.
- For FPGAs, this trait and others will be applied to the region RPs which are children of the device RPs as well.
Cyborg will associate a Device Family trait with each device as needed, of the form CUSTOM_<device-type>_<vendor>_<family>. E.g. CUSTOM_FPGA_INTEL_ARRIA10. This is not a product name, but the name of a device family, used to match software in the instance image with the device family. This is a refinement of the Device Type Trait. It is meant to be used in a flavor when there are different drivers/libraries for different device families. Since it may be tough to forecast whether a new device family will need a new driver/library, it may make sense to associate both these traits with the same device RP.
For FPGAs, Cyborg will associate a region type trait with each region (or with the FPGA itself if there is no Partial Reconfiguration support), of the form CUSTOM_FPGA_REGION_<vendor>__<uuid>. E.g. CUSTOM_FPGA_REGION_INTEL_<uuid>. This is needed for Device as a Service with FPGAs.
For FPGAs, Cyborg may associate a function type trait with a region when the region gets programmed, of the form CUSTOM_FPGA_FUNCTION_<vendor>_<uuid>. E.g. CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>. This is needed for AFaaS use case. This is updated when Cyborg reprograms a region as part of AFaaS request.
For FPGAs, Cyborg should associate a CUSTOM_PROGRAMMABLE trait with every region. This is needed to lay the groundwork for multi-function accelerators in the future. Flavors should ask for this trait, except in the pre-programmed case.
For FPGAs, since they may implement a wide variety of functionality, we may also attach a Functionality Trait. E.g. CUSTOM_FPGA_COMPUTE, CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE.
The Cyborg agent needs to get enough information from the Cyborg driver to create the RPs, RCs and traits. In particular, it needs to get the device type string, region IDs and function IDs from the driver. This requires the driver/agent interface to be enhanced [3].
The modeling in Placement represents generic virtual accelerators as resource classes, and devices/regions as RPs. This is PCI-agnostic. However, many FPGA implementations use PCI Express in general, and SR-IOV in particular. In those cases, it is expected that Cyborg will pass PCI VFs to instances via PCI Passthrough, and retain the PCI PF in the host for management.
For the sake of illustrating how the device representation in Nova can be used, and for completeness, we now show how to define flavors for various use cases. Please see [4] for more details.
A flavor that needs device access always asks for one or more instances of ‘resource:CUSTOM_ACCELERATOR_<device-type>’. In addition, it needs to specify the right traits.
Example flavor for DaaS:
resources:CUSTOM_ACCELERATOR_HPTS=1
trait:CUSTOM_HPTS_ZTE=required
NOTE: For FPGAs, the flavor should also include CUSTOM_PROGRAMMABLE trait.
Example flavor for AFaaS Pre-programed:
resources:CUSTOM_ACCELERATOR_FPGA=1
trait:CUSTOM_FPGA_INTEL_ARRIA10=required
trait:CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>=required
Example flavor for AFaaS Orchestration-Programmed:
resources:CUSTOM_ACCELERATOR_FPGA=1
trait:CUSTOM_FPGA_INTEL_ARRIA10=required
trait:CUSTOM_PROGRAMMABLE=required
function:CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>=required
(Not interpreted by Nova.)
- NOTE: When Nova supports preferred traits, we can use that instead of ‘function’ keyword in extra specs.
- NOTE: For Cyborg to fetch the bitstream for this function, it is assumed that the operator has configured the function UUID as a property of the bitstream image in Glance.
Another example flavor for AFaaS Orchestration-Programmed which refers to a function by name instead of UUID for ease of use:
resources:CUSTOM_ACCELERATOR_FPGA=1
trait:CUSTOM_FPGA_INTEL_ARRIA10=required
trait:CUSTOM_PROGRAMMABLE=required
function_name:<string>=required
(Not interpreted by Nova.)
- NOTE: This assumes the operator has configured the function name as a property of the bitstream image in Glance. The FPGA hardware is not expected to expose function names, and so Cyborg will not represent function names as traits.
A flavor may ask for other RCs, such as local memory.
A flavor may ask for multiple accelerators, using the granular resource request syntax. Cyborg can tie function and bitstream fields in the extra_specs to resources/traits using an extension of the granular resource request syntax (see References) which is not interpreted by Nova.
resourcesN: CUSTOM_ACCELERATOR_FPGA=1
traitsN: CUSTOM_FPGA_INTEL_ARRIA10=required
othersN: function:CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>=required
We now look at the scheduling flow when each device implements only one function. Devices with multiple functions are outside the scope for now.
A request spec with a flavor comes to Nova conductor/scheduler.
Placement API returns the list of RPs which contain the requested resources with matching traits. (With nested RP support, the returned RPs are device/region RPs. Without it, they are compute node RPs.)
FPGA-specific: For AFaaS orchestration-programmed use case, Placement will return matching devices but they may not have the requested function. So, Cyborg may provide a weigher which checks the allocation candidates to see which ones have the required function trait, and ranks them higher. This requires no change to Cyborg DB.
The request_spec goes to compute node (ignoring Cells for now).
NOTE: When one device/region implements multiple functions and orchestration-driven programming is desired, the inventory of that device needs to be adjusted. This can be addressed later and is not a priority for Rocky release. See References.
Nova compute calls os-acc/Cyborg [1].
FPGA-specific: If the request spec asks for a function X in extra specs, but X is not present in the selected region RP, Cyborg should program that region.
Cyborg should associate RPs/RCs and PFs/VFs with Deployables in its internal DB. It can use such mappings associating the requested resource (device/function) with some attach handle that can be used to attach the resource to an instance (such as a PCI function).
NOTE : This flow is PCI-agnostic: no PCI whitelists involved.
N/A
Following changes are needed in Cyborg.
None
This change allows tenants to initiate FPGA bitstream programming. To mitigate the security impact, it is proposed that only 2 methods are offered for programming (flavor asks for a bitstream, or the running instance asks for specific bitstreams) and both are handled through Cyborg. There is no direct access from an instance to an FPGA.
None
None
None
None
None
NOTE: the granular requests feature is needed to define a flavor that requests non-identical accelerators, but is not needed for Cyborg development in Rocky.
For each vendor driver supported in this release, we need to integrate the corresponding FPGA type(s) in the CI infrastructure.
None
[1] | (1, 2) Specification for Compute Node |
[2] | Nested RPs in Rocky |
[3] | Specification for Cyborg Agent-Driver API |
[4] | Custom Resource Classes in Flavors |
[5] | Cyborg Nova Queens Spec |
[6] | Rocky PTG Etherpad for Cyborg Nova Interaction |
[7] | Detailed Cyborg/Nova scheduling |
[8] | Openstack-dev email discussion |
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.