This PR adds support for specifying user data (cloud-init) for system VMs via Zone Scoped global settings. This allows the operators to customize the System VMs and setup monitoring, logging or execute any custom commands.
We set the user data from the global setting in /var/cache/cloud/cmdline, and use the NoCloud datasource to process user data. cloud-init service is still disabled in the system VMs and it's executed as part of the cloud-postinit service which executes the postinit.sh script.
Added global settings:
systemvm.userdata.enabled - Disabled by default. Needs to be enabled to utilize the feature.
console.proxy.vm.userdata - UUID of the User data to be used for Console Proxy
secstorage.vm.userdata - UUID of the User data to be used for Secondary Storage VM
virtual.router.userdata - UUID of the User data to be used for Virtual Routers
This PR introduces console access support for instances deployed using Orchestrator Extensions, available via either VNC or a direct URL.
- CloudStack queries the extension using the getconsole action.
- For VNC-based access, the extension must return host/port/ticket details. CloudStack then forwards these to the Console Proxy VM (CPVM) in the instance’s zone. It is assumed that the CPVM can reach the specified host and port.
- For direct URL access, the extension returns a console URL with the protocol set to `direct`. The URL is then provided directly to the user.
- The built-in Proxmox Orchestrator Extension now supports console access via VNC. The extension calls the Proxmox API to fetch console details and returns them in the required format.
Also, adds changes to send caller details to the extension payload.
```
# cat /var/lib/cloudstack/management/extensions/Proxmox/02b650f6-bb98-49cb-8cac-82b7a78f43a2.json | jq
{
"caller": {
"roleid": "6b86674b-7e61-11f0-ba77-1e00c8000158",
"rolename": "Root Admin",
"name": "admin",
"roletype": "Admin",
"id": "93567ed9-7e61-11f0-ba77-1e00c8000158",
"type": "ADMIN"
},
"virtualmachineid": "126f4562-1f0f-4313-875e-6150cabeb72f",
...
```
Documentation PR: https://github.com/apache/cloudstack-documentation/pull/560
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Remove allocated snapshots / vm snapshots on start
* Check and Cleanup snapshots / vm snapshots on MS start
* rebase fixes
* Update volume state (from Snapshotting) on MS start when its snapshot job not finished and snapshot in Creating state
* [routers] distiction between fatal failure and warning or unknown on healthchecks
* UI status for router health checks
* status from scripts varied
* automation signalled errors
* revert removal of update sql
* upgradeversion
* move config item and further cleanup
* handling services better
* backwards compatible response
---------
Co-authored-by: Daan Hoogland <dahn@apache.org>
* api,server,extensions: allow updating extension resource map details
This PR makes changes for allowing updating details for an extension resource mapping.
Currently, extensions only support Cluster to be registered therefore changes has been added to updateCluster functionality.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
The Extensions Framework in Apache CloudStack is designed to provide a flexible and standardised mechanism for integrating external systems and custom workflows into CloudStack’s orchestration process. By defining structured hook points during key operations—such as virtual machine deployment, resource preparation, and lifecycle events—the framework allows administrators and developers to extend CloudStack’s behaviour without modifying its core codebase.
* Add first version
* Add guithemedetails join
* Update since and remove extra line
* Limit information on API response for non admin users
* Add base files for preset themes
* Add miising license
* Revert cookie check
* Fix imports
* Fix pre-commit
* Address log4j2 string to format review and add license to css files
* Fix infinite loading
* Move event details to service implementation
* Move view to a specific view file
* Refactoring gui theme classes
* Normalize package name
* Address Henrique review
* Fix create table SQL
* Add interface for Dao classes
* Remove extra tabs
* Address unauthorized call when 2FA is enabled
* Remove trailing whitespaces
* Apply suggestions from code review
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
---------
Co-authored-by: Henrique Sato <henriquesato2003@gmail.com>
Co-authored-by: Bernardo De Marco Gonçalves <bernardomg2004@gmail.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Management Server - Prepare for Maintenance and Cancel Maintenance improvements:
- Added new setting 'management.server.maintenance.ignore.maintenance.hosts' to ignore hosts in maintenance states while preparing management server for maintenance. This skips agent transfer and agents count check for hosts in maintenance.
- Rebalance indirect agents after cancel maintenance, using rebalance parameter in cancelMaintenance API
- Force maintenance after maintenance window timeout, using forced parameter in prepareForMaintenance API.
- Propagate 'indirect.agent.lb.check.interval' setting change to the host agents.
* rebases fixes
* code improvements, cleanup
* [UI] Set rebalance true by default in cancel maintenance dialog
* Update MS state after executing cluster cmd in the target MS, and some code improvements
* code improvements
* Ensure the host lb algorithm 'shuffle' is applied once before disabling the indirect agent lb check background task
Adds new interface for image selection (template/iso) for an instance in UI.
Old interface can still be used and it can be configured using UI configuration (config.json)
OS categories/Guest OS categories have been improved with ability to create new categories, delete an existing category, and marking a category as featured to allow it to show up in the UI in the image selection interface.
New APIs added:
- addOsCategory
- deleteOsCategory
- updateOsCategory
APIs updated:
- updateOsType
- listTemplates
- listOsCategories
Several improvements in UI especially related to forms - DeloyVM, ReinstallVM, CreateVnfAppliance, AddAutoscaleGroup.
DeployVM form can now be opened from template/ISO details view with query params.
Reorganized (removed and added some) OS categories to the following (in the same order):
```
1. Ubuntu
2. Debian
3. Fedora
4. CentOS
5. Rocky Linux
6. Alma Linux
7. Oracle
8. RedHat
9. SUSE
10. Windows
11. Other
```
Documentation PR: https://github.com/apache/cloudstack-documentation/pull/500
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* framework/cluster: fix NPE for ms-host status when mgr stops
This handles an NPE case for when management server host status is not
found in the DB, when stopping the cluster manager.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManagerImpl.java
Co-authored-by: dahn <daan.hoogland@gmail.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManagerImpl.java
---------
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: dahn <daan.hoogland@gmail.com>
Using a simple hyphen as a delimiter for config cache key can lead to ambiguity if the “name” field itself contains hyphens. To address this, a Ternary object of configkey name, scope and scope ID is used as the config cache keys.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Update last agents during ms maintenance, and some code improvements
* Send 503 (Service Unavailable) response status when maintenance or shutdown is initiated
[Any load balancer in the clustered environment can avoid routing requests to this MS node]
* Migrate systemvm agents before routing host agents, and some code improvements
* Added events for ms maintenance and shutdown operations
* Added the following ms maintenance and shutdown improvements
- block new agent connections during prepare for maintenance of ms
- maintain avoids ms list
- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents
- updated setup ms list and migrate agent connections to executor service
- migrate agent connection through executor, and send the answer to the ms host that initiated the migration
- re-initialize ssl handshake executor if it is shutdown
- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states
- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states
- stop agent connections monitor on ms maintenance
- update avoid ms list in ready command
- updated connected host from the client connection
- update last agents in ms metrics from the database
- updated some agent config descriptions
- update last management server in the hosts during shutdown
- added agents and lastagents in management server response
- updated management server maintenance & shutdown unit tests
- some code improvements
* refactored code / addressed comments
* removed shutdown testcase (maybe, calling System.exit)
* Revert "removed shutdown testcase (maybe, calling System.exit)"
This reverts commit e14b0717152ef6c8be102d61c80f42803a53172e.
* avoid system.exit during shutdown test
* code improvements
* testcase fix
* Fix cutoff time in agent connections monitor thread
* Add bytes and iops preset variables to volume usage type
* Add new line at the end of file
Co-authored-by: dahn <daan.hoogland@gmail.com>
* Change disk offering preset variable class name
---------
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
Co-authored-by: dahn <daan.hoogland@gmail.com>
This PR introduces the concept of multi-scope configuration settings. In addition to the Global level, currently all configurations can be set at a single scope level.
It will be useful if a configuration can be set at multiple scopes. For example, a configuration set at the domain level
will apply for all accounts, but it can be set for an account as well. In which case the account level setting will override the domain level setting.
This is done by changing the column `scope` of table `configuration` from string (single scope) to bitmask (multiple scopes).
```
public enum Scope {
Global(null, 1),
Zone(Global, 1 << 1),
Cluster(Zone, 1 << 2),
StoragePool(Cluster, 1 << 3),
ManagementServer(Global, 1 << 4),
ImageStore(Zone, 1 << 5),
Domain(Global, 1 << 6),
Account(Domain, 1 << 7);
```
Each scope is also assigned a parent scope. When a configuration for a given scope is not defined but is available for multiple scope types, the value will be retrieved from the parent scope. If there is no parent scope or if the configuration is defined for a single scope only, the value will fall back to the global level.
Hierarchy for different scopes is defined as below :
- Global
- Zone
- Cluster
- Storage Pool
- Image Store
- Management Server
- Domain
- Account
This PR also updates the scope of the following configurations (Storage Pool scope is added in addition to the existing Zone scope):
- pool.storage.allocated.capacity.disablethreshold
- pool.storage.allocated.resize.capacity.disablethreshold
- pool.storage.capacity.disablethreshold
Doc PR : https://github.com/apache/cloudstack-documentation/pull/476
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* api,agent,server,engine-schema: scalability improvements
Following changes and improvements have been added:
- Improvements in handling of PingRoutingCommand
1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
3. Optimized scanning stalled VMs
- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`
- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine
- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.
- Added caching for account/use role API access with expiration after write set to 60 seconds.
- Added caching for some recurring DB retrievals
1. CapacityManager - listing service offerings - beneficial in host capacity calculation
2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
3. DownloadListener - hypervisors for zone - beneficial for host joins
5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands
- Optimized MS list retrieval for agent connect
- Optimize finding ready systemvm template for zone
- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks
- Changes in agent-agentmanager connection with NIO client-server classes
1. Optimized the use of the executor service
2. Refactore Agent class to better handle connections.
3. Do SSL handshakes within worker threads
5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.
- Improvements in StatsCollection - minimize DB retrievals.
- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.
- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.
- Minor improvements in resource limit calculations wrt DB retrievals
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* test1, domaindetails, capacitymanager fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* test2 - agent tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* capacitymanagertest fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* change
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix missing changes
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert marvin/setup.py
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix indent
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* use space in sql
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address duplicate
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* update host logs
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert e36c6a5d07
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix npe in capacity calculation
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* move schema changes to 4.20.1 upgrade
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix build
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add some more tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* checkstyle fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* remove unnecessary mocks
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* replace statics
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* engine/orchestration,utils: limit number of concurrent new agent
connections
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* refactor - remove unused
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* unregister closed connections, monitor & cleanup
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add check for outdated vm filter in power sync
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* agent: synchronize sendRequest wait
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Support for Management Server Maintenance
- New APIs: prepareForMaintenance and cancelMaintenance, with required parameter - managementserverid.
- New management server states for maintenance: PreparingForMaintenance, Maintenance.
- listHosts API with optional parameter – managementserverid, to list the hosts connected to the management server.
- Support management server maintenance when more than one active management servers available.
- Triggers transfer agents to other available management servers for maintenance, new agent command MigrateAgentConnectionCommand to initiate transfer of indirect agents.
- New global config 'management.server.maintenance.timeout', to set the timeout (in mins) for the management server maintenance window, default: 60 mins.
- UI changes: Prepare and Cancel Maintenance in Management Server section, Connected Agents tab, New fields for hosts and management servers.
* Updated pending jobs check timer task with ScheduledExecutorService
* keep maintenance state on trigger shutdown call when ms is in maintenance
* add pending jobs count to ms response
* during ms heartbeat, update state to up only when it's down
* allow vm work jobs of async job created before prepare for maintenance
* Revert "keep maintenance state on trigger shutdown call when ms is in maintenance"
This reverts commit 607e13364679eac897f4d146bb3325ea7a61ba17.
* skip maintenance test when multiple management servers are not available, and not configured in host setting for kvm
* 4.20:
merge errors fixed
Restrict the migration of volumes attached to VMs in Starting state (#9725)
server, plugin: enhance storage stats for IOPS (#10034)
Introducing granular command timeouts global setting (#9659)
Improve logging to include more identifiable information (#9873)
* Improve logging to include more identifiable information for kvm plugin
* Update logging for scaleio plugin
* Improve logging to include more identifiable information for default volume storage plugin
* Improve logging to include more identifiable information for agent managers
* Improve logging to include more identifiable information for Listeners
* Replace ids with objects or uuids
* Improve logging to include more identifiable information for engine
* Improve logging to include more identifiable information for server
* Fixups in engine
* Improve logging to include more identifiable information for plugins
* Improve logging to include more identifiable information for Cmd classes
* Fix toString method for StorageFilterTO.java
* 4.20:
UI: Fix userdata and load balancer selection (#10016)
Prevent password updates for SAML and LDAP users (#9999)
cloudstack-migrate-databases: sql AND added (#10033)
engine/schema: move SQLs to 4.20.0 to 4.20.1 upgrade (#10018)
Remove user from project before deletion (#10008)
Simplify validation for creating volume templates via UI (#9828)