* FR-248: Instance lease, WIP commit
* insert lease expiry into db and use that to filter exiring vms, add asyncjobmanager
* Add leaseDuration and leaseExpiryAction in Service offering create flow
* Update listVM cmd to allow listing only leased instances
* Add methods to fetch instances for which lease is expiring in next days
* Changes included:
config key setup and configured for alert email
lease options in create and update vm screen
handle delete protection, edit vm, create vm
validated stop and detroy, delete protection
* Update UI screens for leased properties coming from config and service offering
* use global lock before running scheduler
* Unit tests
* Flow changes done in UI based on discussion
* Include view changes in schema upgrade files and use feature in various UI elements
* Added integration test for vm deployment, UI enhancements for user persona, bug fixes
* validate integration tests, minor ui changes and log messages
* fix build: moving configkey from setup to test itself
* Disable testAlert to unblock build and trim whitespaces in integration tests
* Address review comments
* Minor changes in EditVM screen
* Use ExecutorService instead of Timer and TimerTask
* Additional review comments
* Incorporate following changes:
1. Execute lease action once on the instance
2. Cancel lease on instance when feature is disabled
3. Relevant events when lease gets disabled, cancelled, executed
4. Disable associating lease after deployment
5. UI elements and flow changes
6. Changes based on feedback from demo
* Handle pr review comments
* address review comments
* move instance.lease.enabled config to VMLeaseManager interface
* bug fix in edit instance flow and reject api request for invalid values
* max allowed lease is for 100 years
* log instance ids for expired instance
* Fix config validation for value range and code coverage improvement
* fix lease expiry request failures in async
* dont use forced: true for StopVmCmd
* Update server/src/main/java/org/apache/cloudstack/vm/lease/VMLeaseManager.java
Co-authored-by: Vishesh <vishesh92@gmail.com>
* handle review comments
---------
Co-authored-by: Rohit Yadav <rohityadav89@gmail.com>
Co-authored-by: Vishesh <vishesh92@gmail.com>
A separate service account will be created and added in the project, if
not exist already, when a Kubernetes cluster is deployed in a project.
This account will have a role with limited API access.
Cleanup clusters on owner account cleanup, delete service account
if needed
When the owner account of k8s clusters is deleted, while its node VMs
get expunged, the cluster entry in DB remain present. This fixes the
issue by cleaning up all clusters for the account deleted.
Project k8s service account will be deleted on account cleanup or when
there is no active k8s cluster remaining
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Introducing Storage Access Groups to define the host and storage pool connections
In CloudStack, when a primary storage is added at the Zone or Cluster scope, it is by default connected to all hosts within that scope. This default behavior can be refined using storage access groups, which allow operators to control and limit which hosts can access specific storage pools.
Storage access groups can be assigned to hosts, clusters, pods, zones, and primary storage pools. When a storage access group is set on a cluster/pod/zone, all hosts within that scope inherit the group. Connectivity between a host and a storage pool is then governed by whether they share the same storage access group.
A storage pool with a storage access group will connect only to hosts that have the same storage access group. A storage pool without a storage access group will connect to all hosts, including those with or without a storage access group.
* Reset the pool id when create volume fails on the allocated pool
- the pool id is persisted while creating the volume, when it fails the pool id is not reverted. On next create volume attempt, CloudStack couldn't find any suitable primary storage even there are pools available with enough capacity as the pool is already assigned to volume which is in Allocated state (and storage pool compatibility check fails). Ensure volume is not assigned to any pool if create volume fails (so the next creation job would pick the suitable pool).
* endpoint check for resize
* update the resize error through callback result instead of exception
* Add & Remove PowerFlex/ScaleIO MDMs while preparing & unpreparing the storage SDC connections (instead of start & stop scini)
* Add/Remove MDM IP addresses during Host connection/disconnection to/from storage pool when powerflex.connect.on.demand is false
* unit test fixes
* Don't remove MDM IPs from SDC when any volumes mapped to SDC
* Don't remove MDM IPs when other pools of same ScaleIO/PowerFlex cluster are connected
* rebase fixes
* update changes, to not remove/disconnect MDMs on maintenance
* import fixes after rebase
* Consider the clusters with allocation state 'Enabled' for EndPoint selection (in addition to Host state)
* Reset the pool id when create volume fails on the allocated pool
- the pool id is persisted while creating the volume, when it fails the pool id is not reverted. On next create volume attempt, CloudStack couldn't find any suitable primary storage even there are pools available with enough capacity as the pool is already assigned to volume which is in Allocated state (and storage pool compatibility check fails). Ensure volume is not assigned to any pool if create volume fails (so the next creation job would pick the suitable pool).
* endpoint check for resize
* update the resize error through callback result instead of exception
* logger fix
Sometimes hypervisor hosts (direct agents) stuck with Disconnect state during agent rebalancing activity across multiple management server nodes. This issue was noticed during frequent restart of the management server nodes in the cluster.
When there are multiple management server nodes in a cluster, if one or more nodes are shutdown/start/restart, CloudStack will rebalance the hosts among the remaining nodes or move the nodes to the newly joined management server nodes. During the rebalancing period multiple operations could happen including:
- DirectAgentScan at interval of configured direct.agent.scan.interval
- AgentRebalanceScan to identify and schedule rebalance agents
- TransferAgentScan to transfer the host from original owner to future owner
**Current Rebalance behavior**
1. For hosts that have AgentAttache && not forForward but in Disconnect state, CloudStack simply ignore these hosts without trying to ping again or update the status of the host.
2. For hosts that have AgentAttache && forForward, CloudStack removes the agent but still try to loadDirectlyConnectedHost.
**Improved Rebalance behavior**
During DirectAgentScan: scanDirectAgentToLoad(), identify hosts that for self-managed hosts that are in Disconnect state (disconnected after pingtimeout).
1. For hosts that have AgentAttache and is forForward, CloudStack should remove the agent
2. For hosts that have AgentAttache and is not forForward but in Disconnect state, CloudStack should try to investigate and update the status to Up if host is pingable.
3. For hosts that don't have AgentAttache, CloudStack should try to loadDirectlyConnectedHost.
* KVM incremental snapshot feature
* fix log
* fix merge issues
* fix creation of folder
* fix snapshot update
* Check for hypervisor type during parent search
* fix some small bugs
* fix tests
* Address reviews
* do not remove storPool snapshots
* add support for downloading diff snaps
* Add multiple zones support
* make copied snapshots have normal names
* address reviews
* Fix in progress
* continue fix
* Fix bulk delete
* change log to trace
* Start fix on multiple secondary storages for a single zone
* Fix multiple secondary storages for a single zone
* Fix tests
* fix log
* remove bitmaps when deleting snapshots
* minor fixes
* update sql to new file
* Fix merge issues
* Create new snap chain when changing configuration
* add verification
* Fix snapshot operation selector
* fix bitmap removal
* fix chain on different storages
* address reviews
* fix small issue
* fix test
---------
Co-authored-by: João Jandre <joao@scclouds.com.br>
* KVM: add Virtual TPM model and version
* KVM: add admin-only VM setting GUEST.CPU.MODE and GUEST.CPU.MODEL
* VMware: add vTPM
* vTPM: do not set Key due to 'Cannot add multiple devices using the same device key..'
* vTPM: add unit test testTpmModel
* engine/schema: remove user vm details for guest CPU mode/model
* vTPM: extra methods as Daan's requests
* vTPM: add unit tests in VmwareResourceTest
* vTPM: update unit tests in VmwareResourceTest
* vTPM: add unit test in LibvirtComputingResourceTest
* vTPM: use the default TPM version if an invalid version is passed
* vTPM: requires UEFI on vmware and do nothing if it is not enabled/disabled
* vTPM: let uses to add UEFI on vmware
* Update plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Update plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* vTPM: remove template details for guest CPU mode/model
* UI: boot vm from ISO into UEFI/SECURE mode
---------
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Readd filename string on qemuimg create
* Remove empty object on the data pool details of storage pools with no data pool
* Only use the method createPhysicalDiskByLibVirt with RBD when the pool is of erasure code type. Also added javadoc for createPhysicalDisk method
* Change literal '/' string to File.separator
* Add support for erasure code pools
* Fix null on putAll
* Update last agents during ms maintenance, and some code improvements
* Send 503 (Service Unavailable) response status when maintenance or shutdown is initiated
[Any load balancer in the clustered environment can avoid routing requests to this MS node]
* Migrate systemvm agents before routing host agents, and some code improvements
* Added events for ms maintenance and shutdown operations
* Added the following ms maintenance and shutdown improvements
- block new agent connections during prepare for maintenance of ms
- maintain avoids ms list
- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents
- updated setup ms list and migrate agent connections to executor service
- migrate agent connection through executor, and send the answer to the ms host that initiated the migration
- re-initialize ssl handshake executor if it is shutdown
- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states
- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states
- stop agent connections monitor on ms maintenance
- update avoid ms list in ready command
- updated connected host from the client connection
- update last agents in ms metrics from the database
- updated some agent config descriptions
- update last management server in the hosts during shutdown
- added agents and lastagents in management server response
- updated management server maintenance & shutdown unit tests
- some code improvements
* refactored code / addressed comments
* removed shutdown testcase (maybe, calling System.exit)
* Revert "removed shutdown testcase (maybe, calling System.exit)"
This reverts commit e14b0717152ef6c8be102d61c80f42803a53172e.
* avoid system.exit during shutdown test
* code improvements
* testcase fix
* Fix cutoff time in agent connections monitor thread
Somehow deleteDatastore was never implemented, that meant:
templates haven't been cleaned up on datastore delete and
also agents have never been informed about storage pool removal.