Issue happens as there are more than one thread processing connect for a host simultaneously. The VM full sync. is not designed to work in this scenario and as a result user VMs may get stopped incorrectly.
Direct agent scan task runs at regular intervals (direct.agent.scan.interval defaulted to 90 secs) and identifies hosts that needs to be processed for connect. In a normal scenario hosts mostly get connected within that interval and there are no issues. But if due to some reason the connect process takes more time and is not completed by the time next agent scan runs. In this case, based on the db. state same hosts may get picked up again. And then there will be situations where more than one thread is processing connect for the same host.
The fix is to check if there is a thread already processing connect for a host and in this case all subsequent threads for that host will simply bail out. Also there may be a scenario where one thread already completed processing connect but another thread already got scheduled before that and will again repeat the same. This is also prevented by putting appropriate checks.
Changes:
- Passing the avoid set generated by the first pass of deployment to the second try.
- The second try is done, when the first pass that uses a reserved plan fails to deploy on the reserved host, to search over the entire zone again
Changes:
- Locking the group and save reservation mechanism done by DPM
- Added admin operation to cleanup VM reservations
- DPM will also cleanup VM reservations on startup
This feature allows a user to deploy VMs only in the resources dedicated to his account or domain.
1. Resources(Zones, Pods, Clusters or hosts) can be dedicated to an account or domain.
Implemented 12 new APIs to dedicate/list/release resources:
- dedicateZone, listDedicatedZones, releaseDedicatedZone for a Zone.
- dedicatePod, listDedicatedPods, releaseDedicatedPod for a Pod.
- dedicateCluster, listDedicatedClusters, releaseDedicatedCluster for a Cluster
- dedicateHost, listDedicatedHosts, releaseDedicatedHost for a Host.
2. Once a resource(eg. pod) is dedicated to an account, other resources(eg. clusters/hosts) inside that cannot be further dedicated.
3. Once a resource is dedicated to a domain, other resources inside that can be further dedicated to its sub-domain or account.
4. If any resource (eg.cluster) is dedicated to a account/domain, then resources(eg. Pod) above that cannot be dedicated to different accounts/domain (not belonging to the same domain)
5. To use Explicit dedication, user needs to create an Affinity Group of type 'ExplicitDedication'
6. A VM can be deployed with the above affinity group parameter as an input.
7. A new ExplicitDedicationProcessor has been added which will process the affinity group of type 'Explicit Dedication' for a deployment of a VM that demands dedicated resources.
This processor implements the AffinityGroupProcessor adapter. This processor will update the avoid list.
8. A VM requesting dedication will be deployed on dedicatd resources if available with the user account.
9. A VM requesting dedication can also be deployed on the dedicated resources available with the parent domains iff no dedicated resources are available with the current user's account or
domain.
10. A VM (without dedication) can be deployed on shared host but not on dedicated hosts.
11. To modify the dedication, the resource has to be released first.
12. Existing Private zone functionality has been redirected to Explicit dedication of zones.
13. Updated the db upgrade schema script. A new table "dedicated_resources" has been added.
14. Added the right permissions in commands.properties
15. Unit tests: For the new APIs and Service, added unit tests under : plugins/dedicated-resources/test/org/apache/cloudstack/dedicated/DedicatedApiUnitTest.java
16. Marvin Test: To dedicate host, create affinity group, deploy-vm, check if vm is deployed on the dedicated host.
Changes:
- In VolumeReservationVO, the getter method of a column had a typo, causing us to create a wrong searchbuilder. It was searching over the 'id' column instead of 'vm_reservation_id' causing
- This bug was causing the vm deployment to choose a wrong pool during deployment since the search was choosing incorrectly
- This bug in the GenericSearchBuilder is also fixed - if the getter method does not use the standard 'get' or 'is' prefix, one should annotate that method using
@Column(name = "<column_name>") and indicate which column this method refers to. This will cause the GenericSearchBuilder to identify the field correctly.
Changes:
- There is no good mechanism currently to figure out if the deployment failed due to affinity groups only
- We can just hint the user that the deployment might have failed due to the affinity groups and ask to review the input
Changes:
- Cloud-engine 2 step reserver and deploy flow was not retrying out of clusters, if there are no resources in the volume's cluster.
- Fixed this by letting the reservationm step not error out and continue to let deploy step find out resources outside cluster