* storage: change storage pool to Up state when cancel storage migration
* Update 11773: connect host to shared pool after cancelling storage migration
* Update 11773: update db only
* Update 11773: skip capacity update for storpool
This PR introduces console access support for instances deployed using Orchestrator Extensions, available via either VNC or a direct URL.
- CloudStack queries the extension using the getconsole action.
- For VNC-based access, the extension must return host/port/ticket details. CloudStack then forwards these to the Console Proxy VM (CPVM) in the instance’s zone. It is assumed that the CPVM can reach the specified host and port.
- For direct URL access, the extension returns a console URL with the protocol set to `direct`. The URL is then provided directly to the user.
- The built-in Proxmox Orchestrator Extension now supports console access via VNC. The extension calls the Proxmox API to fetch console details and returns them in the required format.
Also, adds changes to send caller details to the extension payload.
```
# cat /var/lib/cloudstack/management/extensions/Proxmox/02b650f6-bb98-49cb-8cac-82b7a78f43a2.json | jq
{
"caller": {
"roleid": "6b86674b-7e61-11f0-ba77-1e00c8000158",
"rolename": "Root Admin",
"name": "admin",
"roletype": "Admin",
"id": "93567ed9-7e61-11f0-ba77-1e00c8000158",
"type": "ADMIN"
},
"virtualmachineid": "126f4562-1f0f-4313-875e-6150cabeb72f",
...
```
Documentation PR: https://github.com/apache/cloudstack-documentation/pull/560
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Remove allocated snapshots / vm snapshots on start
* Check and Cleanup snapshots / vm snapshots on MS start
* rebase fixes
* Update volume state (from Snapshotting) on MS start when its snapshot job not finished and snapshot in Creating state
* [routers] distiction between fatal failure and warning or unknown on healthchecks
* UI status for router health checks
* status from scripts varied
* automation signalled errors
* revert removal of update sql
* upgradeversion
* move config item and further cleanup
* handling services better
* backwards compatible response
---------
Co-authored-by: Daan Hoogland <dahn@apache.org>
This feature adds the ability to create a new instance from a VM backup for dummy, NAS and Veeam backup providers. It works even if the original instance used to create the backup was expunged or unmanaged. There are two parts to this functionality:
Saving all configuration details that the VM had at the time of taking the backup. And using them to create an instance from backup.
Enabling a user to expunge/unmanage an instance that has backups.
This PR allows attaching of GPU devices via PCI, mdev or VF to an Instance for KVM.
It allows the operator to discover the GPU devices on the KVM host and create a Compute Offering with GPU support based on the available GPU devices on the host. Once the operator has created the Compute offering, it can be used by users to launch Instances with GPU devices.
The Extensions Framework in Apache CloudStack is designed to provide a flexible and standardised mechanism for integrating external systems and custom workflows into CloudStack’s orchestration process. By defining structured hook points during key operations—such as virtual machine deployment, resource preparation, and lifecycle events—the framework allows administrators and developers to extend CloudStack’s behaviour without modifying its core codebase.
The Netris Plugin introduces Netris as a network service provider in CloudStack to be able to create and manage Virtual Private Clouds (VPCs) in CloudStack, being able to orchestrate the following network functionalities:
- Network segmentation with Netris-VXLAN isolation method
- Routing between "public" IP and network segments with an ACS ROUTED mode offering
- SourceNAT, DNAT, 1:1 NAT between "public" IP and network segments with an ACS NATTED mode offering
- Routing between VPC network segments (tiers in ACS nomenclature)
- Access Lists (ACLs) between VPC tiers and "public" network (TCP, UDP, ICMP) both as global egress rules and "public" IP specific ingress rules.
- ACLs between VPC network tiers (TCP, UDP, ICMP)
- External load balancing – between VPC network tiers and "public" IP
- Internal load balancing – between VPC network tiers
- CloudStack Virtual Router services (DHCP, DNS, UserData, Password Injection, etc…)
* CPU to Memory weight based algorithm to order cluster
host.capacityType.to.order.clusters config will support new algorithm: COMBINED
which will work with host.capacityType.to.order.clusters.cputomemoryweight and capacity will be
computed based on CPU and memory both and using weight factor
* minor changes
* add unit tests
* update desc and add validation
* handle copilot review comments
* add log indicating chosen capacityType for ordering
---------
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Management Server - Prepare for Maintenance and Cancel Maintenance improvements:
- Added new setting 'management.server.maintenance.ignore.maintenance.hosts' to ignore hosts in maintenance states while preparing management server for maintenance. This skips agent transfer and agents count check for hosts in maintenance.
- Rebalance indirect agents after cancel maintenance, using rebalance parameter in cancelMaintenance API
- Force maintenance after maintenance window timeout, using forced parameter in prepareForMaintenance API.
- Propagate 'indirect.agent.lb.check.interval' setting change to the host agents.
* rebases fixes
* code improvements, cleanup
* [UI] Set rebalance true by default in cancel maintenance dialog
* Update MS state after executing cluster cmd in the target MS, and some code improvements
* code improvements
* Ensure the host lb algorithm 'shuffle' is applied once before disabling the indirect agent lb check background task
* Introducing Storage Access Groups to define the host and storage pool connections
In CloudStack, when a primary storage is added at the Zone or Cluster scope, it is by default connected to all hosts within that scope. This default behavior can be refined using storage access groups, which allow operators to control and limit which hosts can access specific storage pools.
Storage access groups can be assigned to hosts, clusters, pods, zones, and primary storage pools. When a storage access group is set on a cluster/pod/zone, all hosts within that scope inherit the group. Connectivity between a host and a storage pool is then governed by whether they share the same storage access group.
A storage pool with a storage access group will connect only to hosts that have the same storage access group. A storage pool without a storage access group will connect to all hosts, including those with or without a storage access group.
* Add & Remove PowerFlex/ScaleIO MDMs while preparing & unpreparing the storage SDC connections (instead of start & stop scini)
* Add/Remove MDM IP addresses during Host connection/disconnection to/from storage pool when powerflex.connect.on.demand is false
* unit test fixes
* Don't remove MDM IPs from SDC when any volumes mapped to SDC
* Don't remove MDM IPs when other pools of same ScaleIO/PowerFlex cluster are connected
* rebase fixes
* update changes, to not remove/disconnect MDMs on maintenance
* import fixes after rebase
* Update last agents during ms maintenance, and some code improvements
* Send 503 (Service Unavailable) response status when maintenance or shutdown is initiated
[Any load balancer in the clustered environment can avoid routing requests to this MS node]
* Migrate systemvm agents before routing host agents, and some code improvements
* Added events for ms maintenance and shutdown operations
* Added the following ms maintenance and shutdown improvements
- block new agent connections during prepare for maintenance of ms
- maintain avoids ms list
- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents
- updated setup ms list and migrate agent connections to executor service
- migrate agent connection through executor, and send the answer to the ms host that initiated the migration
- re-initialize ssl handshake executor if it is shutdown
- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states
- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states
- stop agent connections monitor on ms maintenance
- update avoid ms list in ready command
- updated connected host from the client connection
- update last agents in ms metrics from the database
- updated some agent config descriptions
- update last management server in the hosts during shutdown
- added agents and lastagents in management server response
- updated management server maintenance & shutdown unit tests
- some code improvements
* refactored code / addressed comments
* removed shutdown testcase (maybe, calling System.exit)
* Revert "removed shutdown testcase (maybe, calling System.exit)"
This reverts commit e14b0717152ef6c8be102d61c80f42803a53172e.
* avoid system.exit during shutdown test
* code improvements
* testcase fix
* Fix cutoff time in agent connections monitor thread
This PR introduces the concept of multi-scope configuration settings. In addition to the Global level, currently all configurations can be set at a single scope level.
It will be useful if a configuration can be set at multiple scopes. For example, a configuration set at the domain level
will apply for all accounts, but it can be set for an account as well. In which case the account level setting will override the domain level setting.
This is done by changing the column `scope` of table `configuration` from string (single scope) to bitmask (multiple scopes).
```
public enum Scope {
Global(null, 1),
Zone(Global, 1 << 1),
Cluster(Zone, 1 << 2),
StoragePool(Cluster, 1 << 3),
ManagementServer(Global, 1 << 4),
ImageStore(Zone, 1 << 5),
Domain(Global, 1 << 6),
Account(Domain, 1 << 7);
```
Each scope is also assigned a parent scope. When a configuration for a given scope is not defined but is available for multiple scope types, the value will be retrieved from the parent scope. If there is no parent scope or if the configuration is defined for a single scope only, the value will fall back to the global level.
Hierarchy for different scopes is defined as below :
- Global
- Zone
- Cluster
- Storage Pool
- Image Store
- Management Server
- Domain
- Account
This PR also updates the scope of the following configurations (Storage Pool scope is added in addition to the existing Zone scope):
- pool.storage.allocated.capacity.disablethreshold
- pool.storage.allocated.resize.capacity.disablethreshold
- pool.storage.capacity.disablethreshold
Doc PR : https://github.com/apache/cloudstack-documentation/pull/476
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* api,agent,server,engine-schema: scalability improvements
Following changes and improvements have been added:
- Improvements in handling of PingRoutingCommand
1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
3. Optimized scanning stalled VMs
- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`
- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine
- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.
- Added caching for account/use role API access with expiration after write set to 60 seconds.
- Added caching for some recurring DB retrievals
1. CapacityManager - listing service offerings - beneficial in host capacity calculation
2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
3. DownloadListener - hypervisors for zone - beneficial for host joins
5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands
- Optimized MS list retrieval for agent connect
- Optimize finding ready systemvm template for zone
- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks
- Changes in agent-agentmanager connection with NIO client-server classes
1. Optimized the use of the executor service
2. Refactore Agent class to better handle connections.
3. Do SSL handshakes within worker threads
5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.
- Improvements in StatsCollection - minimize DB retrievals.
- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.
- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.
- Minor improvements in resource limit calculations wrt DB retrievals
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* test1, domaindetails, capacitymanager fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* test2 - agent tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* capacitymanagertest fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* change
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix missing changes
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert marvin/setup.py
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix indent
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* use space in sql
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address duplicate
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* update host logs
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert e36c6a5d07
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix npe in capacity calculation
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* move schema changes to 4.20.1 upgrade
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix build
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add some more tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* checkstyle fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* remove unnecessary mocks
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* replace statics
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* engine/orchestration,utils: limit number of concurrent new agent
connections
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* refactor - remove unused
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* unregister closed connections, monitor & cleanup
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add check for outdated vm filter in power sync
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* agent: synchronize sendRequest wait
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Support for Management Server Maintenance
- New APIs: prepareForMaintenance and cancelMaintenance, with required parameter - managementserverid.
- New management server states for maintenance: PreparingForMaintenance, Maintenance.
- listHosts API with optional parameter – managementserverid, to list the hosts connected to the management server.
- Support management server maintenance when more than one active management servers available.
- Triggers transfer agents to other available management servers for maintenance, new agent command MigrateAgentConnectionCommand to initiate transfer of indirect agents.
- New global config 'management.server.maintenance.timeout', to set the timeout (in mins) for the management server maintenance window, default: 60 mins.
- UI changes: Prepare and Cancel Maintenance in Management Server section, Connected Agents tab, New fields for hosts and management servers.
* Updated pending jobs check timer task with ScheduledExecutorService
* keep maintenance state on trigger shutdown call when ms is in maintenance
* add pending jobs count to ms response
* during ms heartbeat, update state to up only when it's down
* allow vm work jobs of async job created before prepare for maintenance
* Revert "keep maintenance state on trigger shutdown call when ms is in maintenance"
This reverts commit 607e13364679eac897f4d146bb3325ea7a61ba17.
* skip maintenance test when multiple management servers are not available, and not configured in host setting for kvm
For HA work items that are created for host state change, checks must be
done when execution is called in a new management server session.
A new column, reason, has been added in cloud.op_ha_work table to track
the reason for HA work.
When HighAvailabilityManager starts it finds and puts all pending HA
work items in Investigating state. During execution of the HA work if it
is found in investigating state, checks are done to verify if the work
is still valid. If the jobs is found to be invalid it is cancelled.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* 4.20:
merge errors fixed
Restrict the migration of volumes attached to VMs in Starting state (#9725)
server, plugin: enhance storage stats for IOPS (#10034)
Introducing granular command timeouts global setting (#9659)
Improve logging to include more identifiable information (#9873)
* Introducing granular command timeouts global setting
* fix marvin tests
* Fixed log messages
* some more log message fix
* Fix empty value setting
* Converted the global setting to non-dynamic
* set wait on command only when granular wait is defined. This is to keep the backward compatibility
* Improve error logging
* Improve logging to include more identifiable information for kvm plugin
* Update logging for scaleio plugin
* Improve logging to include more identifiable information for default volume storage plugin
* Improve logging to include more identifiable information for agent managers
* Improve logging to include more identifiable information for Listeners
* Replace ids with objects or uuids
* Improve logging to include more identifiable information for engine
* Improve logging to include more identifiable information for server
* Fixups in engine
* Improve logging to include more identifiable information for plugins
* Improve logging to include more identifiable information for Cmd classes
* Fix toString method for StorageFilterTO.java
* 4.20:
VR: fix site-2-site VPN if split connections is enabled (#10067)
UI: fix cannot open 'Edit tags' modal for static routes (#10065)
Update ownership selection component to be language independent (#10052)
Support to enable/disable VM High Availability manager and related alerts (#10118)
- Adds new config 'vm.ha.enabled' with Zone scope, to enable/disable VM High Availability manager. This is enable by default (for backward compatibilty).
When enabled, the VM HA WorkItems (for VM Stop, Restart, Migration, Destroy) can be created and the scheduled items are executed.
When disabled, new VM HA WorkItems are not allowed and the scheduled items are retried until max retries configured at 'vm.ha.migration.max.retries' (executed in case HA is re-enabled during retry attempts), and then purged after 'time.between.failures' by the cleanup thread that runs regularly at 'time.between.cleanup'.
- Adds new config 'vm.ha.alerts.enabled' with Zone scope, to enable/disable alerts for the VM HA operations. This is enabled by default.
* Fix `updateTemplatePermission` when the UI is set to a language other than English (#9766)
* Fix updateTemplatePermission UI in non-english language
* Improve fix
---------
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
* Prepend vpc name to vpc tier network name based on global setting
* Added UT for createVpcGuestNetwork
* rename connector to delimiter and add configKey.Category.Network
* Move setting the name to a new method
---------
Co-authored-by: Daan Hoogland <daan@onecht.net>
Co-authored-by: Lucas Martins <56271185+lucas-a-martins@users.noreply.github.com>
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>