4556 Commits

Author SHA1 Message Date
Rene Peinthor
001c769054
Linstor 4.19 fix selecting non enabled hosts (#8653)
* linstor: cleanup resource if copy from template failed

* linstor: do not use non enabled hosts for copy operations
2024-03-08 13:52:49 +05:30
Daan Hoogland
d99b1b9c2d Merge branch '4.18' into 4.19 2024-03-08 08:19:49 +01:00
Henrique Sato
223a9b8031
Quota tariff events (#8030)
Co-authored-by: Henrique Sato <henrique.sato@scclouds.com.br>
2024-03-06 17:33:39 +01:00
Wei Zhou
a7ec8738a2
kvm: fix NPE while import KVM VMs from other hosts (#8720) 2024-03-04 09:46:28 +01:00
Abhishek Kumar
9fd410be36 Merge remote-tracking branch 'apache/4.18' into 4.19 2024-03-01 17:34:27 +05:30
Harikrishna
c462be1412
New API "checkVolume" to check and repair any leaks or issues reported by qemu-img check (#8577)
* Introduced a new API checkVolumeAndRepair that allows users or admins to check and repair if any leaks observed.
Currently this is supported only for KVM

* some fixes

* Added unit tests

* addressed review comments

* add repair volume while granting access

* Changed repair parameter to accept both leaks/all

* Introduced new global setting volume.check.and.repair.before.use to do volume check and repair before VM start or volume attach operations

* Added volume check and repair changes only during VM start and volume attach operations

* Refactored the names to look similar across the code

* Some code fixes

* remove unused code

* Renamed repair values

* Fixed unit tests

* changed version

* Address review comments

* Code refactored

* used volume name in logs

* Changed the API to Async and the setting scope to storage pool

* Fixed exit value handling with check volume command

* Fixed storage scope to the setting

* Fix volume format issues

* Refactored the log messages

* Fix formatting
2024-02-29 14:41:49 +05:30
dahn
56e0450526
Logging improvements on migration in the VmwareResource (#8300) 2024-02-28 15:29:35 +05:30
Daan Hoogland
f4987bf8ee Merge release branch 4.18 to 4.19
* 4.18:
  Storage plugin support to check if volume on datastore requires access for migration (#8655)
  CKS: fix /opt/bin/deploy-cloudstack-secret in CKS control nodes (#8697)
2024-02-26 15:53:11 +01:00
Suresh Kumar Anaparti
f731fe882c
Storage plugin support to check if volume on datastore requires access for migration (#8655)
* Check if volume on datastore requires access for migration, and grant/revoke volume access if requires

* Updated default implementation for requiresAccessForMigration method in PrimaryDataStoreDriver
2024-02-26 20:16:31 +05:30
Wei Zhou
18c3d470c6
CKS: fix /opt/bin/deploy-cloudstack-secret in CKS control nodes (#8697) 2024-02-26 14:21:26 +01:00
Abhishek Kumar
2a56c61ade Merge remote-tracking branch 'apache/4.18' into 4.19 2024-02-26 12:01:26 +05:30
Wei Zhou
8d4b4dcec4
CKS: add kube config path in extra control nodes (#8658) 2024-02-16 15:01:27 +01:00
GaOrtiga
6f3e4e6302
fix_filter_and_pagination (#8306)
Co-authored-by: Gabriel <gabriel.fernandes@scclouds.com.br>
2024-02-16 11:15:55 +01:00
Rohit Yadav
bda49ab08f Merge remote-tracking branch 'shapeblue/merged-4-18' into 4.19 2024-02-13 12:54:24 +05:30
Vishesh
a8028eecbd
Merge remote-tracking branch 'origin/4.18' into 4.19 2024-02-13 11:44:20 +05:30
Vishesh
1955d8f3db
Add advance settings to fine tune DRS imbalance calculation (#8521)
* Use free/total instead of free metric to calculate imbalance

* Filter out hosts for condensed while checking imbalance

* Make DRS more configurable

* code refactor

* Add unit tests

* fixup

* Fix validation for drs.imbalance.condensed.skip.threshold

* Add logging and other minor changes for drs

* Add some logging for drs

* Change format for drs imbalance to string

* Show drs imbalance as percentage

* Fixup label for memorytotal in en.json
2024-02-13 11:18:53 +05:30
Rene Peinthor
70b634fff2
Linstor: add HA support and small cleanups (#8407)
* linstor: Outline get storagepools from resourcegroup into function

* linstor: move getHostname() to kvm/Pool and reimplement

* linstor: implement CloudStack HA support
2024-02-13 11:16:12 +05:30
dahn
672206c312
kvm: ITCO watchdog added (#8282)
* ITCO watchdog added

* add inject-nmi action

* Update plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtVMDef.java

Co-authored-by: Wei Zhou <weizhou@apache.org>

---------

Co-authored-by: Wei Zhou <weizhou@apache.org>
2024-02-12 08:54:39 +01:00
Wei Zhou
af2e277999 Merge remote-tracking branch 'apache/4.18' into 4.19 2024-02-09 11:53:39 +01:00
Rene Peinthor
393f3d7727
linstor: use relative hostname path (#8633)
As described in issue #8310 some older distributions don't have
hostname in /usr/bin so rely on PATH resolving
2024-02-09 11:49:20 +01:00
Rohit Yadav
a1f547a011 Merge remote-tracking branch 'origin/4.18' into 4.19
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

Conflicts:
	plugins/storage/volume/linstor/src/main/java/org/apache/cloudstack/storage/datastore/util/LinstorUtil.java
2024-02-09 00:10:34 +05:30
slavkap
1d1b332141
remove StorPool tags from detached volumes (#8377)
* remove tags from detached volumes

* Adress comments

* address comments

* Address comments
2024-02-09 00:05:34 +05:30
Rene Peinthor
56f0448f0d
Linstor fix migration while node offline (#8610)
* linstor: Add util method getBestErrorMessage from main

* linstor: failed remove of allow-two-primaries is no fatal error

* linstor: Fix failure if a Linstor node is down while migrating

If a Linstor node is down while migrating resource, allow-two-primaries
setting will fail because we can't reach the downed node. But it will
still set the property on the other nodes and migration should work.
We now just report an error instead of completely failing.
2024-02-08 23:57:38 +05:30
Rohit Yadav
0d36098c76 Merge remote-tracking branch 'origin/4.18' into 4.19 2024-02-07 14:20:39 +05:30
Wei Zhou
69e8ebc03f
CKS: retry if unable to drain node or unable to upgrade k8s node (#8402)
* CKS: retry if unable to drain node or unable to upgrade k8s node

I tried CKS upgrade 16 times, 11 of 16 upgrades succeeded.

2 of 16 upgrades failed due to
```
error: unable to drain node "testcluster-of7974-node-18c8c33c2c3" due to error:[error when evicting pods/"cloud-controller-manager-5b8fc87665-5nwlh" -n "kube-system": Post "https://10.0.66.18:6443/api/v1/namespaces/kube-system/pods/cloud-controller-manager-5b8fc87665-5nwlh/eviction": unexpected EOF, error when evicting pods/"coredns-5d78c9869d-h5nkz" -n "kube-system": Post "https://10.0.66.18:6443/api/v1/namespaces/kube-system/pods/coredns-5d78c9869d-h5nkz/eviction": unexpected EOF], continuing command...
```

3 of 16 upgrades failed due to
```
Error from server: error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=roles", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=Role"
Name: "kubernetes-dashboard", Namespace: "kubernetes-dashboard"
from server for: "/mnt/k8sdisk//dashboard.yaml": etcdserver: leader changed
```

* CKS: remove tests of creating/deleting HA clusters as they are covered by the upgrade test

* Update PR 8402 as suggested

* test: remove CKS cluster if fail to create or verify
2024-02-06 11:14:10 +01:00
Wei Zhou
54225ecd15 Veeam: fix incompatible types: String cannot be converted to Date 2024-02-05 10:50:16 +01:00
Wei Zhou
b8904f75dd Merge remote-tracking branch 'apache/4.18' into 4.19 2024-02-05 10:08:31 +01:00
slavkap
94c8b1da5c
Option to create StorPool primary storage with a valid URL (#8356)
* Option to create primary storage with a valid URL

* check if the scheme is valid
2024-02-05 14:21:13 +05:30
Marcus Sorensen
9f1b34aeb2
Fix libvirt domain event listener by properly processing events (#8437)
* Fix libvirt domain event listener by properly processing events

* Add javadoc for setupEventListener

---------

Co-authored-by: Marcus Sorensen <mls@apple.com>
2024-02-05 13:30:10 +05:30
Abhishek Kumar
a7b97ff3b0 Updating pom.xml version numbers for release 4.19.1.0-SNAPSHOT
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-02-02 18:06:04 +05:30
Lucas Martins
1c98b5a4e5 Change Cryptsetup validation (#8482)
Co-authored-by: lucas.martins.scclouds <lucas.martins@scclouds.com.br>
2024-02-01 09:43:28 +01:00
Wei Zhou
b34f093137
veeam: fix some issues with restoring volume from backup and attaching it to VM (#8570)
* veeam: detach only the restored volume during backup restore

Steps to reproduce the issue
1. create a VM (A) with ROOT and DATA disk
2. assign to a backup offering
3. create backup
4. create another VM (B)
5. restore the DATA disk of VM A, and attach to VM B
6. When operation is done, check the datastore

Without this change, the ROOT image is not removed and left over on the datastore.
```
[root@ref-trl-5933-v-Mr8-wei-zhou-esxi2:/vmfs/volumes/5f60667d-18d828eb] ls -l /vmfs/volumes/5f60667d-18d828eb/CS-RSTR-dfb6f21c-a941-49db-9963-4f0286a17dac
total 1784840
-rw-------    1 root     root     5242880000 Jan 24 09:23 ROOT-722_2-flat.vmdk
-rw-------    1 root     root           499 Jan 24 09:23 ROOT-722_2.vmdk
```

With this change, the whole temporary vm has been destroyed.
```
[root@ref-trl-5933-v-Mr8-wei-zhou-esxi2:/vmfs/volumes/5f60667d-18d828eb] ls -l /vmfs/volumes/5f60667d-18d828eb/CS-RSTR-734bee3b-640c-4ff0-a34b-bc45358565b2
ls: /vmfs/volumes/5f60667d-18d828eb/CS-RSTR-734bee3b-640c-4ff0-a34b-bc45358565b2: No such file or directory
```

* veeam: fix wrong disk size in debug message

* veeam: sync backup repository after operations are done

got exception of some operations which succeeds due to the following error
```
2024-01-19 10:59:52,846 DEBUG [o.a.c.b.v.VeeamClient] (API-Job-Executor-42:ctx-716501bb job-4373 ctx-2359b76d) (logid:b5e19a17) Veeam response for PowerShell commands [PowerShell Import-Module Veeam.Backup.PowerShell -WarningAction SilentlyContinue;$restorePoint = Get-VBRRestorePoint ^| Where-Object { $_.Id -eq '1d99106a-b5c8-4a1e-958d-066a987caa5f' };if ($restorePoint) { Remove-VBRRestorePoint -Oib $restorePoint -Confirm:$false;$repo = Get-VBRBackupRepository;Sync-VBRBackupRepository -Repository $repo;} else { ; Write-Output 'Failed to delete'; Exit 1;}] is: [^M
Restore Type       Job Name             State      Start Time             End Time               Description           ^M
------------       --------             -----      ----------             --------               -----------           ^M
ConfResynchronize  Configuration Dat... Starting   19/01/2024 10:59:52    01/01/1900 00:00:00                          ^M
^M
^M
Remove-VBRRestorePoint : Win32 internal error "Access is denied" 0x5 occurred while reading the console output buffer. ^M
Contact Microsoft Customer Support Services.^M
At line:1 char:196^M
+ ... orePoint) { Remove-VBRRestorePoint -Oib $restorePoint -Confirm:$false ...^M
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^M
    + CategoryInfo          : ReadError: (:) [Remove-VBRRestorePoint], HostException^M
    + FullyQualifiedErrorId : ReadConsoleOutput,Veeam.Backup.PowerShell.Cmdlets.RemoveVBRRestorePoint^M
 ^M
].
```

* veeam: fix unable to detach volume when restore backup and attach to vm then detach the volume

It also happened when destroy the original or backup VM

```
2024-01-24 10:10:03,401 ERROR [c.c.s.r.VmwareStorageProcessor] (DirectAgent-74:ctx-95b24ac7 10.0.35.53, job-25995/job-25996, cmd: DettachCommand) (logid:7260ffb8) Failed to detach volume!
java.lang.RuntimeException: Unable to access file [de52fdd3386b3d67b27b3960ecdb08f4] i-2-723-VM/7c2197c129464035bab062edec536a09-flat.vmdk
        at com.cloud.hypervisor.vmware.util.VmwareClient.waitForTask(VmwareClient.java:426)
        at com.cloud.hypervisor.vmware.mo.DatastoreMO.moveDatastoreFile(DatastoreMO.java:290)
        at com.cloud.storage.resource.VmwareStorageLayoutHelper.syncVolumeToRootFolder(VmwareStorageLayoutHelper.java:241)
        at com.cloud.storage.resource.VmwareStorageProcessor.attachVolume(VmwareStorageProcessor.java:2150)
        at com.cloud.storage.resource.VmwareStorageProcessor.dettachVolume(VmwareStorageProcessor.java:2408)
        at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:174)
        at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:71)
        at com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:589)
        at com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2024-01-24 10:10:03,402 INFO  [c.c.h.v.u.VmwareHelper] (DirectAgent-74:ctx-95b24ac7 10.0.35.53, job-25995/job-25996, cmd: DettachCommand) (logid:7260ffb8) [ignored]failed to get message for exception: Unable to access file [de52fdd3386b3d67b27b3960ecdb08f4] i-2-723-VM/7c2197c129464035bab062edec536a09-flat.vmdk
```

* vmware: create restored volume with new UUID and attach to VM
2024-01-29 11:40:43 +01:00
Abhishek Kumar
2746225b99 Updating pom.xml version numbers for release 4.19.0.0
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-01-29 10:21:52 +05:30
Vishesh
fedcf66de0
Externalise a few timeouts & fix timeout for hostSupportsUefi in libvirt ready command wrapper (#8547)
This PR fixes bug introduced in #8502. Timeout for script execution was set to 60 ms instead of 60s which resulted in host not getting UEFI enabled. This is a blocker for 4.19 release.

We do this by introducing a new agent parameter `agent.script.timeout` (default - 60 seconds) to use as a timeout for the script checking host's UEFI status.

We also externalize the timeout for the ReadyCommand by introducing a new global setting `ready.command.wait` (default - 60 seconds).

For ModifyStoragePoolCommand, we don't externalize the timeout to avoid confusion for the user. Since, the required timeout can vary depending on the provider in use and we are only setting the wait for default host listener for now. Instead, we reuse the global `wait` setting by dividing it by `5` making the default value of 6 minutes (1800/5 = 360s) for ModifyStoragePoolCommand.

Note: the actual time, the MS waits is twice the wait set for a Command. Check reference code below.
19250403e6/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentAttache.java (L406-L442)
2024-01-27 23:36:13 +05:30
Wei Zhou
33bb92acce
Veeam: Support Veeam 11 and 12 (#8241)
This PR fixes several issues in the testing of Veeam 11 and Veeam12
- Import Veeam.Backup.PowerShell and silently ignore the warning messages
- Fix issue when assign vm to backup offerings, which caused by separator (\r\n)
- Fix authorization failure in veeam 12a, which is because v1_4 is not supported in veeam 12a any more
- Fix exception if backup name has space
- Fix backup metrics in veeam12, which is because powershell command does not return the values needed
- Fix Incorrect datetime value, which is because powershell command returns a datetime which is not supported in Java
- Fix issue during backup restoration if VM has both ROOT and DATA disks.

This PR also has the following update
- Add integration test test/integration/smoke/test_backup_recovery_veeam.py
- Make some UI changes
- Add zone setting backup.plugin.veeam.version. If it is not set, CloudStack will get veeam version via powershell commands.
- Add zone setting backup.plugin.veeam.task.poll.interval and backup.plugin.veeam.task.poll.max.retry
2024-01-19 18:42:01 +01:00
Nicolas Vazquez
8d42ca8ccf
Use project version on pom dependencies (#8529)
This PR fixes the POM dependencies from a hardcoded value to the project.version property on dependencies
2024-01-18 20:16:06 +05:30
Vishesh
c3b77cb7b8
Fix host stuck in connecting state (#8502)
There are a lot of test failures due to test_vm_life_cycle.py in multiple PRs due to host not available for migration of VMs.
#8438 (comment)
#8433 (comment)
#7344 (comment)

While debugging I noticed that the hosts get stuck in Connecting state because MS is waiting for a response of the ReadyCommand from the agent. Since we take a lock on connection and disconnection, restarting the agent doesn't work. To fix this, we have to restart the MS or wait for ~1 hour (default timeout).

On the agent side, it gets stuck waiting for a response from the Script execution.

To reproduce, run smoke/test_vm_life_cycle.py (TestSecuredVmMigration test class to be specific). Once the tests are complete, you will notice that some hosts are stuck in Connecting state. And restarting the agent fails due to the named lock. Locks on DB can be checked using the below query.

SELECT *
FROM performance_schema.metadata_locks
INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID
WHERE PROCESSLIST_ID <> CONNECTION_ID() \G;

This PR adds a wait for the ready command and a timeout to the Script execution to ensure that the thread doesn't get stuck and the named lock from database is released.
2024-01-15 13:56:34 +05:30
Nicolas Vazquez
a3a4833c3e
Fixes for KVM unmanaged instances import on advanced network and VNC password (#8492)
This PR fixes a regression caused by #8465 on advanced zones, import fails with:

2024-01-10 12:13:33,234 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Allocating nic for vm 142272e8-9e2e-407b-9d7e-e9a03b81653c in network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10} during import
2024-01-10 12:13:33,239 ERROR [o.a.c.v.UnmanagedVMsManagerImpl] (API-Job-Executor-3:ctx-991bbe9f job-128 ctx-f49517d4) (logid:d7b8e716) Failed to import NICs while importing vm: i-2-31-VM
com.cloud.exception.InsufficientVirtualNetworkCapacityException: Unable to acquire Guest IP  address for network Network {"id": 204, "name": "Isolated", "uuid": "9679fac5-e3ac-4694-a57b-beb635340f39", "networkofferingid": 10}Scope=interface com.cloud.dc.DataCenter; id=1
	at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.importNic(NetworkOrchestrator.java:4582)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importNic(UnmanagedVMsManagerImpl.java:859)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importVirtualMachineInternal(UnmanagedVMsManagerImpl.java:1198)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstanceFromHypervisor(UnmanagedVMsManagerImpl.java:1511)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.baseImportInstance(UnmanagedVMsManagerImpl.java:1342)
	at org.apache.cloudstack.vm.UnmanagedVMsManagerImpl.importUnmanagedInstance(UnmanagedVMsManagerImpl.java:1282)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Also, addresses the VNC password field set instead of a fixed string
2024-01-12 14:14:01 +05:30
Nicolas Vazquez
59e78cbc45
Fix KVM unmanage disks path (#8483)
This PR fixes the volumes path on KVM import unmanaged instances

Fixes: #8479
2024-01-11 14:45:57 +05:30
Vishesh
4f40eae1c4
DRS: Use free metrics insteado of used for computation (#8458)
This PR makes changes to use cluster's free metrics instead of used while computing imbalance for the cluster. This allows DRS to run for clusters where hosts doesn't have the same amount of metrics.
2024-01-10 17:52:46 +05:30
slavkap
c569fe9119
Fix KVM import and list unmanaged VMs (#8445)
VM import fixes

1 - Fix of VM insert for VMs with StorPool volumes
2 - Fix of list/insert unmanaged VMs with RBD volumes
2024-01-10 13:12:07 +05:30
Abhishek Kumar
d6ac91f2df
minio: fix store user creation (#8425)
To prevent errors during multi-user access, use account UUID to create/access user on the provider side. Also, update the existing secret key for a user that already exists.
2024-01-09 17:44:11 +05:30
Abhishek Kumar
2253a33c1e Merge remote-tracking branch 'apache/4.18' 2023-12-20 08:58:30 +05:30
Wei Zhou
ab70108f15
CKS: create Security Groups for CKS clusters of each account (#8316)
This PR fixes #7684

The security groups contain the same rules for port 22 and 6443, no need to recreate for each CKS cluster.
2023-12-20 08:57:27 +05:30
John Bampton
dda672503f
Remove unneeded duplicate words (#8358)
This PR removes some unneeded duplicate words.
2023-12-15 17:13:32 +05:30
kishankavala
ab20b1220f
KVM Ingestion - Import Instance (#7976)
This PR adds new functionality to import KVM instances from an external host or from disk images in local or shared storage.
Doc PR: https://github.com/apache/cloudstack-documentation/pull/356
2023-12-14 13:08:56 +05:30
Abhishek Kumar
82f7abddb3 Merge remote-tracking branch 'apache/4.18' 2023-12-13 11:24:15 +05:30
Bryan Lima
3bb318bab9
kvm: Add support for cgroupv2 (#8252)
1. Problem description

In Apache CloudStack (ACS), when a VM is deployed in a host with the KVM hypervisor, an XML file is created in the assigned host, which has a property shares that defines the weight of the VM to access the host CPU. The value of this property has no unit, and it is a relative measure to calculate how much CPU a given VM will have in the host. However, this value has a limit, which depends on the version of cgroup utilized by the host's kernel. The problem lies at the range value of shares that varies between both versions: [2, 264144] for cgroups version 1; and [1, 10000] for cgroups version 2. Currently, ACS calculates the value of shares using Equation 1, presented below, where CPU is the number of cores and speed is the CPU frequency; both specified in the VM's compute offering. Therefore, if a compute offering has, for example, 6 cores at 2 GHz, the shares value will be 12000 and an exception will be thrown by libvirt if the host utilizes cgroup v2. The second version is becoming the default one in current Linux distributions; thus, it is necessary to address this limitation.

    Equation 1
    shares = CPU * speed

Fixes: #6744
2. Proposed changes

To address the problem described, we propose to apply a scale conversion considering the max shares of the host. Using the same formula currently utilized by ACS, it is possible to calculate the maximum shares of a VM for a given host. In other words, using the number of cores and the nominal speed of the host's CPU as the upper limit of shares allowed to a VM. Then, this value will be scaled to the allowed interval of [1, 10000] of cgroup v2 by using a linear scale conversion.

The VM shares would be calculated as Equation 2, presented below, where VM requested shares is the requested shares value calculated using Equation 1, cgroup upper limit is fixed with a value of 10000 (cgroups v2 upper limit), and host max shares is the maximum shares value of the host, calculated using Equation 1. Using Equation 2, the only case where a VM passes the cgroup v2 limit is when the user requests more resources than the host has, which is not possible with the current implementation of ACS.

    Equation 2
    shares = (VM requested shares * cgroup upper limit)/host max shares

To implement the proposal, the following APIs will be updated: deployVirtualMachine, migrateVirtualMachine and scaleVirtualMachine. When a VM is being deployed, a new verification will be added to find a suitable host. The max shares of each host will be calculated, and the VM calculated shares will be verified if it does not surpass the host's value. Likewise, the migration of VMs will have a similar new verification. Lastly, the scale of VMs will also have the same verification for the VM's host.

To determine the max shares of a given host, we will use the same equation currently used in ACS for calculating the shares of VMs, presented in Section 1. When Equation 1 is used to determine the maximum shares of a host, CPU is the number of cores of the host, and speed is the nominal CPU speed, i.e., considering the CPU's base frequency.

It is important to note that these changes are only for hosts with the KVM hypervisor using cgroup v2 for now.
2023-12-13 10:51:24 +05:30
Nicolas Vazquez
27a3d61729
Fix unmanage VM marvin tests and small UI fixes for import (#8338)
This PR fixes the failing smoke test for test_vm_lifecycle_unmanage_import.py for Vmware and adds a small UI fix on the import wizard
2023-12-13 10:25:05 +05:30
Abhishek Kumar
080a5aee00 Merge remote-tracking branch 'apache/4.18' 2023-12-12 17:01:52 +05:30