12147 Commits

Author SHA1 Message Date
Abhisar Sinha
a7beaaf73b
Add Resource Limits to Backups and Object Storage (#10017)
Doc PR : https://github.com/apache/cloudstack-documentation/pull/461
This PR fixes https://github.com/apache/cloudstack/issues/8638

== Description

Four new Resource Types have been added. Admin can configure corresponding resource limits for the tenants at different levels (domain, account, project) 
User dashboard's Storage section will show the new resources, their limits and current usage.

1. backup - No. of backups used by the account
2. backup_storage - Backup storage allocated for the account
3. bucket - No. of buckets used by the accounts
4. object_storage - Object storage allocated for the account.

Some other related changes done to BnR framework:

1. Maximum number of Backups to retain can be specified while creating Backup schedules, similar to Scheduled snapshots.
2. Oldest Scheduled backup of the same interval type will be deleted once the number reaches the configured max Backups value.
3. Code refactor: Moved syncBackups method from BackupProvider to the framework BackupManagerImpl, as it is a common functionality and all providers were using duplicated code.

Changes done to the Object Storage Framework

1. Quota parameter is made mandatory while creating a bucket. Bucket quota is considered to be the allocated space and will be used to enforce Resource limits.

== Schema Changes:

1. New Column `max_backups` added to `backup_schedule` table
4. New Column `backup_interval_type` added to `backups` table

== Api Changes:

1. createBackup: new Parameter `scheduleid`. It should be specified whenever a scheduled backup is created. This will translate to the `backup_interval_type` in the `backups` table.
3. createBackupScheduke: new Parameter `max_backups`. To specify maximum number of backups to retain for the given schedule.

== Configurations:

|Setting |Scope |Default Value |Description|
|-------|--------|--------------|-----------|
|backup.max.hourly |Global |8 |Maximum recurring hourly backups to be retained for an instance|
|backup.max.daily |Global |8 |Maximum recurring daily backups to be retained for an instance|
|backup.max.weekly |Global |8 |Maximum recurring weekly backups to be retained for an instance|
|backup.max.monthly |Global |8 |Maximum recurring monthly backups to be retained for an instance|
|max.account.backups| Global| 20 | The default maximum number of backups that can be created for an account|
|max.account.backup.storage| Global| 400 | The default maximum backup storage space (in GiB) that can be used for an account|
|max.domain.backups| Global| 40 | The default maximum number of backups that can be created for an domain|
|max.domain.backup.storage| Global| 800 | The default maximum backup storage space (in GiB) that can be used for an domain|
|max.project.backups| Global| 20 | The default maximum number of backups that can be created for an project|
|max.project.backup.storage| Global| 400 | The default maximum backup storage space (in GiB) that can be used for an project|

|Setting |Scope |Default Value |Description|
|-------|--------|--------------|-----------|
|max.account.buckets| Global| 20 | The default maximum number of buckets that can be created for an account|
|max.account.object.storage| Global| 400 | The default maximum object storage space (in GiB) that can be used for an account|
|max.domain.buckets| Global| 40 | The default maximum number of buckets that can be created for an domain|
|max.domain.object.storage| Global| 800 | The default maximum object storage space (in GiB) that can be used for an domain|
|max.project.buckets| Global| 20 | The default maximum number of buckets that can be created for an project|
|max.project.object.storage| Global| 400 | The default maximum object storage space (in GiB) that can be used for an project|


Co-authored-by: Daan Hoogland <daan@onecht.net>
Co-authored-by: Lucas Martins <56271185+lucas-a-martins@users.noreply.github.com>
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2025-02-07 16:56:20 +05:30
Daan Hoogland
2654890e86 Merge branch '4.20' 2025-02-01 21:20:08 +01:00
Daan Hoogland
085bd3bda5 Merge branch '4.19' into 4.20 2025-02-01 17:51:50 +01:00
Abhishek Kumar
0b5a5e8043
api,agent,server,engine-schema: scalability improvements (#9840)
* api,agent,server,engine-schema: scalability improvements

Following changes and improvements have been added:

- Improvements in handling of PingRoutingCommand

    1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
    2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
    3. Optimized scanning stalled VMs

- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`

- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine

- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.

- Added caching for account/use role API access with expiration after write set to 60 seconds.

- Added caching for some recurring DB retrievals

    1. CapacityManager - listing service offerings - beneficial in host capacity calculation
    2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
    3. DownloadListener - hypervisors for zone - beneficial for host joins
    5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands

- Optimized MS list retrieval for agent connect

- Optimize finding ready systemvm template for zone

- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks

- Changes in agent-agentmanager connection with NIO client-server classes

    1. Optimized the use of the executor service
    2. Refactore Agent class to better handle connections.
    3. Do SSL handshakes within worker threads
    5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
    6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.

- Improvements in StatsCollection - minimize DB retrievals.

- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.

- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.

- Minor improvements in resource limit calculations wrt DB retrievals

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* test1, domaindetails, capacitymanager fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* test2 - agent tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* capacitymanagertest fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* change

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix missing changes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert marvin/setup.py

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix indent

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* use space in sql

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address duplicate

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* update host logs

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert e36c6a5d07

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix npe in capacity calculation

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* move schema changes to 4.20.1 upgrade

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix build

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add some more tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* checkstyle fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* remove unnecessary mocks

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* replace statics

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* engine/orchestration,utils: limit number of concurrent new agent
connections

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactor - remove unused

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* unregister closed connections, monitor & cleanup

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add check for outdated vm filter in power sync

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* agent: synchronize sendRequest wait

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2025-02-01 12:28:41 +05:30
Suresh Kumar Anaparti
0d5047b8b7
Improve listing of HA and non-HA hosts when ha.tag setting is defined and hosts have multiple tags along with ha tag (#10240) 2025-01-31 14:07:00 +01:00
Abhishek Kumar
7abda3b963 Merge remote-tracking branch 'apache/4.20' 2025-01-31 16:55:38 +05:30
Abhishek Kumar
ae2ffbe40b Merge remote-tracking branch 'apache/4.19' into 4.20 2025-01-31 16:54:22 +05:30
Pearl Dsilva
5447950f09
Allow creation of Shared Networks without IP range if network offering has no services - specifyvlan = true (#10168) 2025-01-31 11:38:34 +01:00
Wei Zhou
fbb1ff78d6
Static Routes: fix check on wrong global configuration (#10066) 2025-01-31 11:04:13 +01:00
Bernardo De Marco Gonçalves
c70e4e29be
fix npe on account creation (#10274) 2025-01-30 15:52:36 +01:00
Abhishek Kumar
b93589b5bd
server: reset 2fa user configuration on incomplete setup (#10247)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2025-01-30 15:51:04 +01:00
Wei Zhou
0f544c9a3b
api/ui: add specifyvlan to network response (#10236) 2025-01-30 15:35:33 +01:00
Suresh Kumar Anaparti
ee0dc5b2d6
list hosts API fix, when any stale entries exists on storage_pool_host_ref for the removed pools (#9852) 2025-01-30 15:33:25 +01:00
niyamsw
5df15a7aa6
KVM/s390x Support: Add support for KVM on s390x architecture (#10038)
Signed-off-by: Niyam Siwach <niyam@ibm.com>
Signed-off-by: Himanshu Mishra <Himanshu.Mishra2@ibm.com>
2025-01-29 17:34:16 +01:00
Abhishek Kumar
1c84ce4e23
server: fix attach uploaded volume (#10267)
* server: fix attach uploaded volume

Fixes #10120

When an uploaded volume is attached to a VM for which no existing volume
can be found it was resulting in error. For such volumes, server needs
to find a suitable pool first and copy them to the pool from secondary
store.

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add unit tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Boris Stoyanov - a.k.a Bobby <bss.stoyanov@gmail.com>
2025-01-29 13:14:16 +02:00
Suresh Kumar Anaparti
3b108b968f
Support for Management Server Maintenance Mode (#9854)
* Support for Management Server Maintenance

- New APIs: prepareForMaintenance and cancelMaintenance, with required parameter - managementserverid.

- New management server states for maintenance: PreparingForMaintenance, Maintenance.

- listHosts API with optional parameter – managementserverid, to list the hosts connected to the management server.

- Support management server maintenance when more than one active management servers available.

- Triggers transfer agents to other available management servers for maintenance, new agent command MigrateAgentConnectionCommand to initiate transfer of indirect agents.

- New global config 'management.server.maintenance.timeout', to set the timeout (in mins) for the management server maintenance window, default: 60 mins.

- UI changes: Prepare and Cancel Maintenance in Management Server section, Connected Agents tab, New fields for hosts and management servers.

* Updated pending jobs check timer task with ScheduledExecutorService

* keep maintenance state on trigger shutdown call when ms is in maintenance

* add pending jobs count to ms response

* during ms heartbeat, update state to up only when it's down

* allow vm work jobs of async job created before prepare for maintenance

* Revert "keep maintenance state on trigger shutdown call when ms is in maintenance"

This reverts commit 607e13364679eac897f4d146bb3325ea7a61ba17.

* skip maintenance test when multiple management servers are not available, and not configured in host setting for kvm
2025-01-29 13:31:15 +05:30
Daan Hoogland
048649d351 Merge release branch 4.20 to main
* 4.20:
  server: investigate pending HA work when executing in new MS session (#10167)
  extra null guard (#10264)
2025-01-28 14:34:19 +01:00
Abhishek Kumar
33a37da9ec
server: investigate pending HA work when executing in new MS session (#10167)
For HA work items that are created for host state change, checks must be
done when execution is called in a new management server session.

A new column, reason, has been added in cloud.op_ha_work table to track
the reason for HA work.

When HighAvailabilityManager starts it finds and puts all pending HA
work items in Investigating state. During execution of the HA work if it
is found in investigating state, checks are done to verify if the work
is still valid. If the jobs is found to be invalid it is cancelled.

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2025-01-28 14:39:31 +05:30
Daan Hoogland
98f5663954 Merge branch '4.20' 2025-01-24 17:10:43 +01:00
Daan Hoogland
34d2a3bc86 Merge branch '4.19' into 4.20 2025-01-24 17:01:42 +01:00
Harikrishna
20759187b3
Fix local storage deletion cases (#10231)
* Delete local storage properties in agent.properties during delete pool

* Fix stale entry when add local storage failed

* Smaller methods

* Comment added
2025-01-23 12:46:33 +05:30
Wei Zhou
09f154796a
server: Fix host CPU number (#10218) 2025-01-22 15:44:54 +01:00
Daan Hoogland
81e052cfeb Merge release branch 4.20 to main
* 4.20:
  linstor: Fix ZFS snapshot backup (#10219)
  fix listing of VMs by network (#10204)
  Configure org.eclipse.jetty.server.Request.maxFormKeys from server.properties and increase the default value (#10214)
  api: fix access for listSystemVmUsageHistory (#10032)
  Fix NPE issues during host rolling maintenance, due to host tags and custom constrained/unconstrained service offering (#9844)
2025-01-21 12:00:19 +01:00
Daan Hoogland
5167c3b613 Merge branch '4.19' into 4.20 2025-01-21 11:59:43 +01:00
Bernardo De Marco Gonçalves
70776b067a
fix listing of VMs by network (#10204) 2025-01-21 10:33:55 +01:00
Suresh Kumar Anaparti
c5e8f63452
Fix NPE issues during host rolling maintenance, due to host tags and custom constrained/unconstrained service offering (#9844) 2025-01-20 13:20:17 +01:00
Daan Hoogland
ecd60a4e46 Merge release branch 4.20 to main
* 4.20:
  Maintenance mode: Add host to deployment planner avoid list to fix local storage vm migration (#9892)
  Add project-user association normalization script to 4.20.1 upgrade (#10116)
  fix slider component for global settings of the range type (#10187)
  Clean up network permissions on account deletion (#10176)
2025-01-20 11:58:28 +01:00
BartJM
a163831b7e
Maintenance mode: Add host to deployment planner avoid list to fix local storage vm migration (#9892) 2025-01-20 11:54:30 +01:00
Daan Hoogland
9f594c9699 Merge branch '4.19' into 4.20 2025-01-20 11:53:32 +01:00
Daan Hoogland
0c13ded943 Merge release branch 4.20 to main
* 4.20:
  Rollback of changes with errors during the VM assign (#7061)
  [VMware] Consider CD/DVD drive when calculating next free unit number for volume attachment over IDE controller (#9644)
  consider a valid ipv4 address as a validish ipv4 /32 cidr (#10174)
2025-01-15 17:52:07 +01:00
Bernardo De Marco Gonçalves
796bd4f72c
Clean up network permissions on account deletion (#10176) 2025-01-15 17:38:20 +01:00
Stephan Krug
d9a7b6e511
Rollback of changes with errors during the VM assign (#7061)
Co-authored-by: Stephan Krug <stephan.krug@scclouds.com.br>
Co-authored-by: Gabriel <gabriel.fernandes@scclouds.com.br>
Co-authored-by: Fabricio Duarte <fabricio.duarte.jr@gmail.com>
2025-01-15 16:10:49 +01:00
Daan Hoogland
bd874eaa44 Merge release branch 4.20 to main
* 4.20:
  systemvm: fix keystore is reset when patch a systemvm (#9900)
  no retrieval of null hosts (#10175)
  upgrade: consider multiple hypervisors and secondary storages (#10046)
  CheckOnHostCommand: add missing timeout setting (#9677)
2025-01-13 11:23:51 +01:00
Daan Hoogland
e2cfddb1b6 Merge branch '4.19' into 4.20 2025-01-13 11:23:14 +01:00
dahn
fce77733a5
no retrieval of null hosts (#10175) 2025-01-13 09:28:47 +01:00
Wei Zhou
7adc732992
upgrade: consider multiple hypervisors and secondary storages (#10046) 2025-01-13 09:28:18 +01:00
Daan Hoogland
fadb39ece7 Merge release branch 4.20 to main
* 4.20:
  merge errors fixed
  Restrict the migration of volumes attached to VMs in Starting state (#9725)
  server, plugin: enhance storage stats for IOPS (#10034)
  Introducing granular command timeouts global setting (#9659)
  Improve logging to include more identifiable information (#9873)
2025-01-08 14:01:19 +01:00
Daan Hoogland
ab76d3c9cc merge errors fixed 2025-01-08 14:00:41 +01:00
Daan Hoogland
4f9c3492ec Merge release branch 4.19 to 4.20
* 4.19:
  Restrict the migration of volumes attached to VMs in Starting state (#9725)
2025-01-08 13:23:42 +01:00
Felipe
21416cd403
Restrict the migration of volumes attached to VMs in Starting state (#9725)
Co-authored-by: Bernardo De Marco Gonçalves <bernardomg2004@gmail.com>
2025-01-08 08:49:21 +01:00
Abhishek Kumar
bd488c4bba
server, plugin: enhance storage stats for IOPS (#10034)
Adds framework layer change to allow retrieving and storing IOPS stats for storage pools. Custom `PrimaryStoreDriver` can implement method - `getStorageIopsStats` for returning IOPS stats. Existing method `getUsedIops` can also be overridden by such plugins when only used IOPS is returned.
For testing purpose, implementation has been added for simulator hypervisor plugin to return capacity and used IOPS for a pool.
For local storage pool, implementation has been added using iostat to return currently used IOPS.
StoragePoolResponse class has been updated to return IOPS values which allows showing IOPS values in UI for different storage pool related views and APIs.

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2025-01-07 17:17:12 +05:30
Harikrishna
9bc283e5c2
Introducing granular command timeouts global setting (#9659)
* Introducing granular command timeouts global setting

* fix marvin tests

* Fixed log messages

* some more log message fix

* Fix empty value setting

* Converted the global setting to non-dynamic

* set wait on command only when granular wait is defined. This is to keep the backward compatibility

* Improve error logging
2025-01-07 17:06:32 +05:30
Vishesh
a4224e58cc
Improve logging to include more identifiable information (#9873)
* Improve logging to include more identifiable information for kvm plugin

* Update logging for scaleio plugin

* Improve logging to include more identifiable information for default volume storage plugin

* Improve logging to include more identifiable information for agent managers

* Improve logging to include more identifiable information for Listeners

* Replace ids with objects or uuids


* Improve logging to include more identifiable information for engine

* Improve logging to include more identifiable information for server

* Fixups in engine

* Improve logging to include more identifiable information for plugins

* Improve logging to include more identifiable information for Cmd classes

* Fix toString method for StorageFilterTO.java
2025-01-06 16:42:37 +05:30
Daan Hoogland
30b2588c06 Merge release branch 4.20 to main
* 4.20:
  log name change after merge forward
  check tags while fetching storage pool for importing vm (#9764)
  UI: Add cluster arch type to the zone creation wizard (#10080)
2025-01-03 10:17:28 +01:00
Daan Hoogland
cfafcaeb01 log name change after merge forward 2025-01-03 10:16:44 +01:00
Daan Hoogland
79316450d1 Merge release branch 4.19 to 4.20
* 4.19:
  check tags while fetching storage pool for importing vm (#9764)
2025-01-03 09:46:58 +01:00
John Bampton
5bae1188ff
pre-commit fix mixed line endings in XML files (#10148)
https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#mixed-line-ending
2025-01-03 09:42:09 +01:00
Vishesh
d8c321d57a
check tags while fetching storage pool for importing vm (#9764) 2025-01-03 09:05:54 +01:00
Daan Hoogland
2daffa34f2 Merge release branch 4.20 to main
* 4.20:
  VR: fix site-2-site VPN if split connections is enabled (#10067)
  UI: fix cannot open 'Edit tags' modal for static routes (#10065)
  Update ownership selection component to be language independent (#10052)
  Support to enable/disable VM High Availability manager and related alerts (#10118)
2024-12-30 13:35:30 +01:00
Suresh Kumar Anaparti
330ed25a6c
Support to enable/disable VM High Availability manager and related alerts (#10118)
- Adds new config 'vm.ha.enabled'  with Zone scope, to enable/disable VM High Availability manager. This is enable by default (for backward compatibilty).
  When enabled, the VM HA WorkItems (for VM Stop, Restart, Migration, Destroy) can be created and the scheduled items are executed.
  When disabled, new VM HA WorkItems are not allowed and the scheduled items are retried until max retries configured at 'vm.ha.migration.max.retries' (executed in case HA is re-enabled during retry attempts), and then purged after 'time.between.failures' by the cleanup thread that runs regularly at 'time.between.cleanup'.
- Adds new config 'vm.ha.alerts.enabled' with Zone scope, to enable/disable alerts for the VM HA operations. This is enabled by default.
2024-12-26 17:45:32 +05:30