58 Commits

Author SHA1 Message Date
Vishesh
f6ad184ea2
Feature: Add support for GPU with KVM hosts (#11143)
This PR allows attaching of GPU devices via PCI, mdev or VF to an Instance for KVM.

It allows the operator to discover the GPU devices on the KVM host and create a Compute Offering with GPU support based on the available GPU devices on the host. Once the operator has created the Compute offering, it can be used by users to launch Instances with GPU devices.
2025-07-29 13:46:24 +05:30
Pearl Dsilva
b387bc1664 Merge branch '4.20' of https://github.com/apache/cloudstack 2025-03-10 09:34:52 -04:00
Phsm Qwerty
8b092951cb
prometheus: don't poll the same tag multiple times (#10450) 2025-03-07 00:26:43 -05:00
Daan Hoogland
2654890e86 Merge branch '4.20' 2025-02-01 21:20:08 +01:00
Abhishek Kumar
0b5a5e8043
api,agent,server,engine-schema: scalability improvements (#9840)
* api,agent,server,engine-schema: scalability improvements

Following changes and improvements have been added:

- Improvements in handling of PingRoutingCommand

    1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
    2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
    3. Optimized scanning stalled VMs

- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`

- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine

- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.

- Added caching for account/use role API access with expiration after write set to 60 seconds.

- Added caching for some recurring DB retrievals

    1. CapacityManager - listing service offerings - beneficial in host capacity calculation
    2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
    3. DownloadListener - hypervisors for zone - beneficial for host joins
    5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands

- Optimized MS list retrieval for agent connect

- Optimize finding ready systemvm template for zone

- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks

- Changes in agent-agentmanager connection with NIO client-server classes

    1. Optimized the use of the executor service
    2. Refactore Agent class to better handle connections.
    3. Do SSL handshakes within worker threads
    5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
    6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.

- Improvements in StatsCollection - minimize DB retrievals.

- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.

- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.

- Minor improvements in resource limit calculations wrt DB retrievals

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* test1, domaindetails, capacitymanager fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* test2 - agent tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* capacitymanagertest fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* change

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix missing changes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert marvin/setup.py

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix indent

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* use space in sql

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address duplicate

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* update host logs

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert e36c6a5d07

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix npe in capacity calculation

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* move schema changes to 4.20.1 upgrade

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix build

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add some more tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* checkstyle fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* remove unnecessary mocks

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* replace statics

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* engine/orchestration,utils: limit number of concurrent new agent
connections

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactor - remove unused

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* unregister closed connections, monitor & cleanup

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add check for outdated vm filter in power sync

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* agent: synchronize sendRequest wait

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2025-02-01 12:28:41 +05:30
João Jandre
d9774a8462 Updating pom.xml version numbers for release 4.21.0.0-SNAPSHOT
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-11-27 11:47:06 -03:00
João Jandre
c63c7ee63e Updating pom.xml version numbers for release 4.20.1.0-SNAPSHOT
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-11-27 11:40:45 -03:00
João Jandre
2fe3fcef7c Updating pom.xml version numbers for release 4.20.0.0
Signed-off-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
2024-11-19 08:54:07 -03:00
Abhishek Kumar
592038a304
api,server,ui: granular resource limit management (#8362)
Feature spec: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Granular+Resource+Limit+Management

Introduces the concept of tagged resource limits for granular resource limit management. Limits can be enforced on accounts and domains for the deployment of entities for a tagged resource. Current tagged resource limits can be used for the following resource types,

Host limits
- user_vm
- cpu
- memory

Storage limits
- volume
- primary_storage

Following global settings can used to specify tags for which limit needs to be enforced,

Host: `resource.limit.host.tags`
Storage: `resource.limit.storage.tags`

Option for specifying tagged resource limits and viewing tagged resource usage are made available in the UI.

Enhances the use of templatetag for VM deployment and template creation

Adds option to list service/compute offerings that can be used with a given template. A new parameter named templateid has been added.

Adds option to list disk offering with suitability flag for a virtual machine. A new parameter named virtualmachineid has been added to the listDiskOfferings API which when passed returns suitableforvirtualmachine param in the response.
2024-02-19 14:17:34 +05:30
João Jandre
49cecaed06
Normalize loggers and upgrade log4j 1.2 to log4j 2.19 (#7131)
* Normalize logs

All classes that could have their loggers inherited from their fathers had their own loggers deleted;
Most loggers didn't have to be static, so most of them were normalized so that they wouldn't be;
All loggers are protected now;
Static logger's name are now 'LOGGER';
Non-static logger's name are now 'logger';
New class DbUpgradeAbstractImpl created so that all Upgraders extend it and inherit its logger

* Upgrade log4j

* fix errors caused by the merge

* Refactor cglibThrowableRenderer functionality to log4j2 and upgrade the last configuration files

* fix sonarcloud bug

* Fix errors caused by merge, remove some unused loggers, and rename a variable that was mistakenly renamed on the normalization commit

* Readd snmpTrapAppender, remove TestAppender

* Regenerate changes

* regenerate changes

* refactor last custom appender

* fix systemvm configuration xml

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Fix utils pom

* fix some tests

* regenerate changes

* Fix jar being printed on exception

* fix logging in system VMs, fix commands not having log4j2 classpath.

* regenerate changes

* Fix some unwanted renomeations

* fix end of file

* regenerate changes

* regenerate changes

* fix merge error

* regenerate changes

* fix tests

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* readd reload4j to tungsten as juniper depends on it

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* re-add reload4j dependency to network-contrail, as juniper depends on it

* regenerate changes

* regenerate changes

* regenerate changes

* fix typo

* regenerate changes

* regenerate changes

* Fix end of files

* regenerate changes

* add logj42 to cloud-utils-SHADED.jar

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* fix some tests

* Regenerate changes

* Regenerate changes

* fix test

* Regenerate changes

* Regenerate changes
2024-02-08 09:55:41 -03:00
Abhishek Kumar
7dffbc6e47 Updating pom.xml version numbers for release 4.20.0.0-SNAPSHOT
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-02-02 18:16:37 +05:30
Abhishek Kumar
a7b97ff3b0 Updating pom.xml version numbers for release 4.19.1.0-SNAPSHOT
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-02-02 18:06:04 +05:30
Abhishek Kumar
2746225b99 Updating pom.xml version numbers for release 4.19.0.0
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2024-01-29 10:21:52 +05:30
Sina Kashipazha
2993c99363
Add missing hosts info to the prometheus exporter output. (#8328)
Sometimes the hostStats object of the agents becomes null in the management server. It is a rare situation, and we haven't found the root cause yet, but it occurs occasionally in our CloudStack deployments with many hosts.

The hostStat is null, even though the agent is UP and hosting multiple VMs. It is possible to access the VM consoles and execute tasks on them.

This pull request doesn't address the issue directly; rather it displays those hosts in Prometheus so we can restart the agent and get the necessary information.
2023-12-08 19:51:06 +05:30
João Jandre
26b01f6f3b
Flexible tags for hosts and storage pools (#7489)
Co-authored-by: João Jandre <joao@scclouds.com.br>
2023-11-30 09:36:47 +01:00
Daan Hoogland
98d643efe6 Merge release branch 4.18 to main
* 4.18:
  Fixed spelling and added missing states to response (#8248)
  Let Prometheus exporter plugin support utf8 characters (#8228)
2023-11-18 18:41:31 +01:00
dahn
1a2dbebe48
Let Prometheus exporter plugin support utf8 characters (#8228) 2023-11-15 09:48:11 +01:00
Daan Hoogland
72cf9740f9 Merge branch '4.18' 2023-10-06 13:50:29 +02:00
Ben
a20ab40b67
Ensure getCapacityState() is not called for hosts in maintenance (#8025) 2023-10-06 09:49:57 +02:00
Wei Zhou
246bb24b0f Updating pom.xml version numbers for release 4.18.2.0-SNAPSHOT
Signed-off-by: Wei Zhou <weizhou@apache.org>
2023-09-12 17:26:53 +02:00
Wei Zhou
4bdff06acd Updating pom.xml version numbers for release 4.18.1.0
Signed-off-by: Wei Zhou <weizhou@apache.org>
2023-09-07 08:50:50 +02:00
Rohit Yadav
ac882f3d07 Merge remote-tracking branch 'origin/4.18' 2023-08-08 15:56:19 +05:30
Sina Kashipazha
9df23f951b
Prometheus exporter fix cpu/memory usage labels (#7629) 2023-08-07 20:47:03 +02:00
Ben
3e8c0684ed
Prometheus: Ensure tagged hosts maintenance status is reported consistently (#7471)
When a host is not tagged, its maintenance status is reported in the
cloudstack_hosts_total metric: maintenance_enabled is OFFLINE,
maintenance_disabled is ONLINE.

When a host is tagged, its maintenance status is now also verified to
ensure consistent behaviour.

In prometheus exporter, maintenance status for cloudstack_hosts_total_by_tag is not checked. While it is checked for cloudstack_hosts_total metric.
Classified by_tag or not, metrics should be the same.

Fixes: #7470
2023-05-23 11:14:43 +05:30
Daan Hoogland
fb4f6a334d Updating pom.xml version numbers for release 4.19.0.0-SNAPSHOT
Signed-off-by: Daan Hoogland <daan@onecht.net>
2023-03-15 19:46:01 +01:00
Daan Hoogland
05cda2729f Updating pom.xml version numbers for release 4.18.1.0-SNAPSHOT
Signed-off-by: Daan Hoogland <daan@onecht.net>
2023-03-15 19:38:14 +01:00
Daan Hoogland
0574087284 Updating pom.xml version numbers for release 4.18.0.0
Signed-off-by: Daan Hoogland <daan@onecht.net>
2023-03-11 09:35:41 +01:00
Suresh Kumar Anaparti
d8c7e34b38
Improve global settings UI to be more intuitive/logical (#5797)
Co-authored-by: Suresh Kumar Anaparti <suresh.anaparti@shapeblue.com>
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
Co-authored-by: davidjumani <dj.davidjumani1994@gmail.com>
Co-authored-by: dahn <daan.hoogland@gmail.com>
Co-authored-by: dahn <daan@onecht.net>
2023-01-31 11:23:43 +01:00
John Bampton
d74f64a2e1
Use lowercase HTTP header field names so we are compatible with HTTP/2 (#7006) 2023-01-23 11:17:54 +01:00
Ben
ffccfc6172
Ensure Prometheus doesn't return values when the capacity_state is Disabled (#7007)
Co-authored-by: b-navaro <b.navaro@ca.leaseweb.com>
2022-12-28 17:48:51 +01:00
Vladimir Dombrovski
cc676cbc83
Metrics plugin: expose full domain path instead of name (#6959)
Co-authored-by: Vladimir DOMBROVSKI <vladimir.dombrovski@bso.co>
2022-12-20 12:43:55 +01:00
Sina Kashipazha
4e2f461b31
Prometheus exporter enhancement (#4438)
* Export count of total/up/down hosts by tags

* Export count of vms by state and host tag.

* Add host tags to host cpu/cores/memory usage in Prometheus exporter

* Cloudstack Prometheus exporter: Add allocated capacity group by host tag.

* Show count of Active domains on grafana.

* Show count of Active accounts and vms by size on grafana

* Use prepared statement to query database for a number of VM who use a specific tag.

* Extract repeated codes to new methods.
2022-09-30 17:02:01 +02:00
nvazquez
0bcc609f05
Updating pom.xml version numbers for release 4.18.0.0-SNAPSHOT
Signed-off-by: nvazquez <nicovazquez90@gmail.com>
2022-06-06 12:25:35 -03:00
nvazquez
038a669d6b
Updating pom.xml version numbers for release 4.17.1.0-SNAPSHOT
Signed-off-by: nvazquez <nicovazquez90@gmail.com>
2022-06-06 12:19:44 -03:00
nvazquez
c56220fcf2
Updating pom.xml version numbers for release 4.17.0.0
Signed-off-by: nvazquez <nicovazquez90@gmail.com>
2022-05-31 14:33:47 -03:00
Daniel Augusto Veronezi Salvador
b4aabadc4d
Replace string libraries with org.apache.commons.lang3.StringUtils (#5386)
* Replace google lib for lang3 and adjust methods calls

* Replace string libs by lang3

* Prohibit others string libs

Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-11-18 13:41:48 +05:30
nicolas
3f79436840
Updating pom.xml version numbers for release 4.17.0.0-SNAPSHOT
Signed-off-by: nicolas <nicovazquez90@gmail.com>
2021-11-09 22:55:52 -03:00
nicolas
93c3c3b9ac
Updating pom.xml version numbers for release 4.16.1.0-SNAPSHOT
Signed-off-by: nicolas <nicovazquez90@gmail.com>
2021-11-09 22:50:22 -03:00
nicolas
44c08b5acc
Updating pom.xml version numbers for release 4.16.0.0
Signed-off-by: nicolas <nicovazquez90@gmail.com>
2021-11-04 14:14:57 -03:00
Daan Hoogland
e26202f23e Updating pom.xml version numbers for release 4.16.0.0-SNAPSHOT
Signed-off-by: Daan Hoogland <dahn@onecht.net>
2021-01-04 11:32:10 +00:00
Daan Hoogland
01b3e361c7 Updating pom.xml version numbers for release 4.15.0.0
Signed-off-by: Daan Hoogland <dahn@onecht.net>
2020-12-23 16:32:25 +00:00
Rohit Yadav
d6db47618d Merge remote-tracking branch 'origin/4.14' 2020-10-14 16:06:57 +05:30
Wei Zhou
55f07030cb
plugins: Host is counted twice if it has multiple host tags in Prometheus exporter (#4383)
* Hosts are counted twice if it has multiple host tags in Prometheus exporter

* Import HostVO and inject HostDao
2020-10-14 15:51:15 +05:30
Rakesh
2333d97098
plugins: Consider maintenance mode as offline for promethues stats (#4366)
If the resource state of hypervisor in "Maintenance" then it
should be considered as offline even though the agent state
is "Up". Since its in maintenance mode, it cant be used to
allocate VM's and hence can't be considered towards resource
allocation
2020-10-14 15:42:02 +05:30
Rakesh
191dbf7ea7
plugins: Export dedicated host stats to prometheus (#4365)
We should have the metrics for the hosts which are dedicated to certain domains.
We should also be able to see cpu/memory/storage currently used per domain

> How Has This Been Tested?
Enable prometheus server
Add 127.0.0.1 as allowed Ip so that you can fetch metrics from prometheus

Now fetch the endpoint
# http http://127.0.0.1:9595/metrics | grep cloudstack_host_is_dedicated
cloudstack_host_is_dedicated{zone="mgt122-10",hostname="node11",ip="10.13.122.11"} 1
# http http://127.0.0.1:9595/metrics | grep cloudstack_host_dedicated_to_account
cloudstack_host_dedicated_to_account{zone="mgt122-10",hostname="node11",ip="10.13.122.11"} 1
2020-10-14 15:41:10 +05:30
Pearl Dsilva
b464fe41c6
server: Secondary Storage Usage Improvements (#4053)
This feature enables the following:
Balanced migration of data objects from source Image store to destination Image store(s)
Complete migration of data
setting an image store to read-only
viewing download progress of templates across all data stores
Related Primate PR: apache/cloudstack-primate#326
2020-09-17 10:12:10 +05:30
Filippo Projetto
e8fe35bd59
plugin: Set prometheus.exporter.enable as not dynamic (#4174)
Fixes #4173
2020-07-07 12:44:48 +05:30
andrijapanicsb
5f926c3353 Updating pom.xml version numbers for release 4.15.0.0-SNAPSHOT
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-05-23 10:18:39 +01:00
andrijapanicsb
05e9b11694 Updating pom.xml version numbers for release 4.14.1.0-SNAPSHOT
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-05-23 09:59:32 +01:00
andrijapanicsb
6f96b3b2b3 Updating pom.xml version numbers for release 4.14.0.0
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-05-11 15:03:14 +01:00