There are some VM deployment failures happening when multiple VMs are deployed at a time, failures mainly due to NetworkModel code that iterates over all the vlans in the pod. This causes each deployVM thread to hold the global lock on Network longer and cause delays. This delay in turn causes more threads to choose same host and fail since capacity is not available on that host.
Following are some changes required to be done to reduce delays during VM deployments which in turn causes some vm deployment failures when multiple VMs are launched at a time.
In Planner, remove the clusters that do not contain a host with matching service offering tag. This will save some iterations over clusters that dont have matching tagged host
In NetworkModel, do not query the vlans for the pod within the loop. Also optimized the logic to query the ip/ipv6
In DeploymentPlanningManagerImpl, do not process the affinity group if the plan has hostId provided.
Introduced a global configuration flag 'cluster.threshold.enabled'. By default the flag is true.
If the value is false, then a VM can be started in a cluster even if the cluster thresholds are
crossed. However, for a new VM deployment the cluster threshold will always be honoured.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.