OVN L3 scheduler¶
Introduction¶
The OVN L3 scheduler assigns the router gateway ports to a list of chassis.
Having more than one chassis assigned allows the service to have high
availability: if the Logical_Router_Port acting as gateway is assigned
to a failed chassis, OVN will bind this port to the next chassis in the list.
This list of chassis is prioritized; the Logical_Router_Port will be bound
to the chassis in the defined order.
This is done by associating multiple Gateway_Chassis rows with a
Logical_Router_Port in the OVN Northbound database. A Gateway_Chassis
register is just a link to a Chassis register and a priority. For the
same Logical_Router_Port, all Gateway_Chassis assigned will have
a different priority, starting from 1 (the lowest priority) up to the number of
Gateway_Chassis assigned.
The maximum number of Gateway_Chassis that can be assigned to a
Logical_Router_Port is 5. This number is hardcoded. That means in Neutron
the highest priority a Gateway_Chassis will have is 5.
If no gateway chassis are available during the Logical_Router_Port
scheduling, no Gateway_Chassis will be assigned and no value will be set
in the “options” column of the Logical_Router register; that will be used
to detect an unhosted router gateway port.
Types of schedulers¶
The OVN L3 scheduler is configurable and allows us to implement several types of algorithms. There are currently two implemented in the in-tree repository:
OVNGatewayChanceSchedulerOVNGatewayLeastLoadedScheduler
OVNGatewayChanceScheduler¶
The scheduler algorithm in this class is very simple: from the list of gateway chassis provided (candidate chassis), it shuffles and returns the list.
OVNGatewayLeastLoadedScheduler¶
The goal of this scheduler is to balance the available chassis to host the same
number of Logical_Router_Port. Since [1], the scheduler will retrieve the
list of available candidates and will assign, per priority, the least loaded
chassis. That means this scheduler will not only consider the chassis with
bound Logical_Router_Port (highest priority gateway chassis), but it will
balance also the lower priority assignations. This is done by (1) iterating
over the list of priorities (from 1 to the number of chassis to schedule), (2)
creating a list of Logical_Router_Port assigned to each Chassis on
the select priority and (3) selecting the least loaded Chassis
Re-schedule Logical_Router_Port if a Chassis is removed¶
When a gateway Chassis is removed from the environment, it creates a “hole”
in the Gateway_Chassis assignation for a Logical_Router_Port. The
Gateway_Chassis register associated to the removed Chassis is deleted
and removed from the list of HA assigned Chassis. This event is captured
by Neutron, which re-schedules Gateway_Chassis to create a balanced list
of assignations, same as done in OVNGatewayLeastLoadedScheduler. This was
implemented in [2].
This process only applies to the lower priority Gateway_Chassis registers,
never the upper one; this is because the Logical_Router_Port is bound to
this Chassis and could be transmitting. If the highest Gateway_Chassis
is changed, the Logical_Router_Port is bound to the new Chassis and
could break any active sessions.
Availability Zones (AZ) distribution¶
Both the OVNGatewayChanceScheduler and the
OVNGatewayLeastLoadedScheduler schedulers have the Availability Zones (AZ)
in consideration. If a router has any AZ defined, the schedulers will select
only those chassis located in the AZs. If no chassis meets this condition, the
Logical_Router_Port won’t be assigned to any chassis and won’t be bound.
Once the list of candidate Chassis (depending on the scheduler selected)
is created, this list is reordered to prioritize these Chassis from
different AZs. That will spread the allocation choices to all AZs; if the
current (and highest) Chassis binding fails, the next Chassis in the
list will belong to another AZ.
This improvement was implemented in [3].
Soft Anti-Affinity for Logical_Router with multiple Logical_Router_Port¶
Support for multiple gateway ports [4] was implemented to support configurations that provide resiliency and load sharing across multiple router ports at the layer 3 level.
In addition to external dependencies such as BFD for liveness detection and
ECMP for load sharing accross default routes, the feature required changes to
the scheduler, the goal being that each Logical_Router_Port record for a
Logical_Router would have a different set of Chassis for each priority.
The Anti-Affinity is accomplished by having the OVN driver provide the router
object subject to scheduling to the scheduler. The scheduler then checks
whether there already exists Logical_Router_Port records for the target
router, and makes any Chassis involed in the already existing ports
appear as having higher load, making it less likely that the already used
Chassis gets picked for a new Logical_Router_Port.
Since the algorithm is based on load and priority, Anti-Affinity is only
supported for the OVNGatewayLeastLoadedScheduler.
This improvement was implemented in [5].