63 Commits

Author SHA1 Message Date
Suresh Kumar Anaparti
12513e18fa
server: Update gson date format for serializing/deserializing Date in MS stats (#11506)
* Update gson date format for serializing/deserializing Date in MS stats (across multiple management servers)

* review

* review comments, and unit tests

* added unit test with different date format

* Use separate Gson for MS stats serialization/deserialization
2025-09-22 12:22:50 +02:00
Vishesh
9167cd3b72
server: use /prod/stat to get uptime instead of the uptime command (#11670) 2025-09-19 14:08:12 +02:00
Suresh Kumar Anaparti
6d16ac2113
ScaleIO/PowerFlex smoke tests improvements, and some fixes (#11554)
* ScaleIO/PowerFlex smoke tests improvements, and some fixes

* Fix test_volumes.py, encrypted volume size check (for powerflex volumes)

* Fix test_over_provisioning.py (over provisioning supported for powerflex)

* Update vm snapshot tests

* Update volume size delta in primary storage resource count for user vm volumes only
The VR volumes resource count for PowerFlex volumes is updated here, resulting in resource count discrepancy
(which is re-calculated through ResourceCountCheckTask later, and skips the VR volumes)

* Fix test_import_unmanage_volumes.py (unsupported for powerflex)

* Fix test_sharedfs_lifecycle.py (volume size check for powerflex)

* Update powerflex.connect.on.demand config default to true
2025-09-12 16:17:20 +02:00
Lucas Martins
54c1f92efd
Fix Stats Collector to not divide by zero (#10492)
* Set loadHistory value to zero when interval is zero to not throw arithmatic exception

* Change loadHistory value to -1 and fix maxsize bug

---------

Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
2025-03-10 11:39:08 -04:00
Abhishek Kumar
0b5a5e8043
api,agent,server,engine-schema: scalability improvements (#9840)
* api,agent,server,engine-schema: scalability improvements

Following changes and improvements have been added:

- Improvements in handling of PingRoutingCommand

    1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
    2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
    3. Optimized scanning stalled VMs

- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`

- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine

- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.

- Added caching for account/use role API access with expiration after write set to 60 seconds.

- Added caching for some recurring DB retrievals

    1. CapacityManager - listing service offerings - beneficial in host capacity calculation
    2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
    3. DownloadListener - hypervisors for zone - beneficial for host joins
    5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands

- Optimized MS list retrieval for agent connect

- Optimize finding ready systemvm template for zone

- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks

- Changes in agent-agentmanager connection with NIO client-server classes

    1. Optimized the use of the executor service
    2. Refactore Agent class to better handle connections.
    3. Do SSL handshakes within worker threads
    5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
    6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.

- Improvements in StatsCollection - minimize DB retrievals.

- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.

- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.

- Minor improvements in resource limit calculations wrt DB retrievals

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>

* test1, domaindetails, capacitymanager fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* test2 - agent tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* capacitymanagertest fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* change

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix missing changes

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert marvin/setup.py

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix indent

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* use space in sql

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address duplicate

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* update host logs

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* revert e36c6a5d07

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix npe in capacity calculation

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* move schema changes to 4.20.1 upgrade

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* address comments

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* fix build

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add some more tests

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* checkstyle fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* remove unnecessary mocks

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* build fix

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* replace statics

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* engine/orchestration,utils: limit number of concurrent new agent
connections

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* refactor - remove unused

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* unregister closed connections, monitor & cleanup

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* add check for outdated vm filter in power sync

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

* agent: synchronize sendRequest wait

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>

---------

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2025-02-01 12:28:41 +05:30
Abhishek Kumar
bd488c4bba
server, plugin: enhance storage stats for IOPS (#10034)
Adds framework layer change to allow retrieving and storing IOPS stats for storage pools. Custom `PrimaryStoreDriver` can implement method - `getStorageIopsStats` for returning IOPS stats. Existing method `getUsedIops` can also be overridden by such plugins when only used IOPS is returned.
For testing purpose, implementation has been added for simulator hypervisor plugin to return capacity and used IOPS for a pool.
For local storage pool, implementation has been added using iostat to return currently used IOPS.
StoragePoolResponse class has been updated to return IOPS values which allows showing IOPS values in UI for different storage pool related views and APIs.

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2025-01-07 17:17:12 +05:30
Vishesh
a4224e58cc
Improve logging to include more identifiable information (#9873)
* Improve logging to include more identifiable information for kvm plugin

* Update logging for scaleio plugin

* Improve logging to include more identifiable information for default volume storage plugin

* Improve logging to include more identifiable information for agent managers

* Improve logging to include more identifiable information for Listeners

* Replace ids with objects or uuids


* Improve logging to include more identifiable information for engine

* Improve logging to include more identifiable information for server

* Fixups in engine

* Improve logging to include more identifiable information for plugins

* Improve logging to include more identifiable information for Cmd classes

* Fix toString method for StorageFilterTO.java
2025-01-06 16:42:37 +05:30
Wei Zhou
34056d956c
Improvement: management server peer states (#9885)
* Improvement: management server peer states

* Update pr9885: consider new mgmt server node which has msId=managementServerNodeId

* Update pr9885: update global config description

* Update pr9885: update label on UI

* framework: Do not update mshost_peer when mgmt server is Up as it will be updated by status update

* mgmt: Update state to Up when mgmt server writes heartbeat to db

* mgmt: change Service IP to Management IP

---------

Co-authored-by: Boris Stoyanov - a.k.a Bobby <bss.stoyanov@gmail.com>
2024-12-02 10:26:20 +05:30
Lucas Martins
817251f1f8
Enhancement in the accuracy of the logs regarding the capacity, usage, and threshold of secondary storages (#9043)
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
2024-07-24 15:30:40 +02:00
Vishesh
90fe1d5fdc
Merge branch '4.19' 2024-06-29 03:35:24 +05:30
Vishesh
a4e9d7f21a
Change vm.stats.remove.batch.size to delete.batch.query.size & allow delete of volume_stats in batches (#9283)
* Change vm.stats.remove.batch.size to delete.batch.query.size

* Add support for deletion of volume stats in batches

* Update server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java

Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>

* Update server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java

Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>

* Update configkey description

* Address comments

Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
2024-06-28 15:32:49 +05:30
Vishesh
3923f80c22
Merge branch '4.19' 2024-06-25 18:53:57 +05:30
João Jandre
ae3fa5d0de
Add configuration to limit the number of rows deleted from vm_stats (#8740)
Co-authored-by: João Jandre <joao@scclouds.com.br>
2024-06-20 14:26:36 +02:00
Daan Hoogland
373f017002 Merge branch '4.19' 2024-06-18 19:58:43 +02:00
Daan Hoogland
050ee44137 Merge branch '4.18' into 4.19 2024-06-18 16:05:45 +02:00
dahn
7c5b7ca077
Extra parameter for UpdateImageStore (#8941)
* Extra parameter for UpdateImageStore

* add name parameter

* ui

* cleanup

* update DB from storage stats results
2024-06-18 12:31:17 +05:30
Daan Hoogland
cb9b3134f7 Merge branch '4.19' 2024-06-14 10:30:10 +02:00
Rohit Yadav
2ca0857bd5
api: listVM API improvement followup, change returning of stats detail (#9177)
- Changes behaviour of details param handling via global setting:
  - listVirtualMachines API: when the details param is not provided, it returns whether stats are returned controlled by a new global setting `list.vm.default.details.stats`
  - listVirtualMachinesMetrics API: when the details param is not provided, it uses `all` details including `stats`
- Users who are affected slow performance of the listVirtualMachines API response time can set `list.vm.default.details.stats` to `false`
- Remove ConfigKey vm.stats.increment.metrics.in.memory which was renamed to `vm.stats.increment.metrics` in #5984 and also remove unused/unnecessary global settings via upgrade path
- Changes default value of VM stats accumulation setting `vm.stats.increment.metrics` to false until a better solution emerges. Since #5984, this is true and during the execution of listVM APIs the stats are clubbed/calculated which can immensely slow down list VM API calls. Any costly operations such as summing of stats shouldn't be done during the course of a synchronous API, such as the list VM API.
- Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail when in list view without metrics selected.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2024-06-14 11:03:14 +05:30
Abhishek Kumar
b29ec2bf12 Merge remote-tracking branch 'apache/4.19' 2024-03-01 17:40:58 +05:30
Suresh Kumar Anaparti
813d53d031
Sync the pool stats in DB with the actual stats from stats collector (#8686) 2024-02-29 15:26:32 +05:30
João Jandre
49cecaed06
Normalize loggers and upgrade log4j 1.2 to log4j 2.19 (#7131)
* Normalize logs

All classes that could have their loggers inherited from their fathers had their own loggers deleted;
Most loggers didn't have to be static, so most of them were normalized so that they wouldn't be;
All loggers are protected now;
Static logger's name are now 'LOGGER';
Non-static logger's name are now 'logger';
New class DbUpgradeAbstractImpl created so that all Upgraders extend it and inherit its logger

* Upgrade log4j

* fix errors caused by the merge

* Refactor cglibThrowableRenderer functionality to log4j2 and upgrade the last configuration files

* fix sonarcloud bug

* Fix errors caused by merge, remove some unused loggers, and rename a variable that was mistakenly renamed on the normalization commit

* Readd snmpTrapAppender, remove TestAppender

* Regenerate changes

* regenerate changes

* refactor last custom appender

* fix systemvm configuration xml

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Fix utils pom

* fix some tests

* regenerate changes

* Fix jar being printed on exception

* fix logging in system VMs, fix commands not having log4j2 classpath.

* regenerate changes

* Fix some unwanted renomeations

* fix end of file

* regenerate changes

* regenerate changes

* fix merge error

* regenerate changes

* fix tests

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* readd reload4j to tungsten as juniper depends on it

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* re-add reload4j dependency to network-contrail, as juniper depends on it

* regenerate changes

* regenerate changes

* regenerate changes

* fix typo

* regenerate changes

* regenerate changes

* Fix end of files

* regenerate changes

* add logj42 to cloud-utils-SHADED.jar

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* fix some tests

* Regenerate changes

* Regenerate changes

* fix test

* Regenerate changes

* Regenerate changes
2024-02-08 09:55:41 -03:00
Rohit Yadav
6d916cad34 Merge remote-tracking branch 'origin/4.18' 2023-12-21 13:18:51 +05:30
Rohit Yadav
969e094419
server: improve stats collector logs to state what the collector does (#8387)
This simply improves the log statement that prints debug statements
during beginning of a stats collector run for hosts or VMs.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-12-21 13:06:32 +05:30
Rene Glover
1031c31e6a
FiberChannel Multipath for KVM + Pure Flash Array and HPE-Primera Support (#7889)
This PR provides a new primary storage volume type called "FiberChannel" that allows access to volumes connected to hosts over fiber channel connections. It requires Multipath to provide path discovery and failover. Second, the PR adds an AdaptivePrimaryDatastoreProvider that abstracts how volumes are managed/orchestrated from the connector to communicate with the primary storage provider, using a ProviderAdapter interface, allowing the code interacting with the primary storage provider API's to be simpler and have no direct dependencies on Cloudstack code. Lastly, the PR provides an implementation of the ProviderAdapter classes for the HP Enterprise Primera line of storage solutions and the Pure Flash Array line of storage solutions.
2023-12-09 11:31:33 +05:30
Rohit Yadav
bde80f14aa
Fix NPE in management server logs due to /proc/cpuinfo output (#7765)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-07-25 04:43:33 +02:00
Abhishek Kumar
028ca74fb6
ui,server,api: resource metrics improvements (#6803)
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2023-01-30 09:48:03 +01:00
Daan Hoogland
46924a5782 Merge release branch 4.17 to main
* 4.17:
  Use long instead of int in DB statistics for Queries and Uptime. (#7125)
  VR: fix public-key is missing in VR after acquiring public IP (#7103)
2023-01-26 09:59:36 +01:00
dahn
f39b02aec7
Use long instead of int in DB statistics for Queries and Uptime. (#7125)
Co-authored-by: Wei Zhou <weizhou@apache.org>
2023-01-26 09:53:36 +01:00
Wei Zhou
a63b2aba7a
VM Autoscaling with virtual router (#6571) 2022-12-05 15:23:03 +01:00
HuangWei
1ead6c1bac
Fix logic check error for update GPU groupDetails (#6405) 2022-05-24 10:00:48 -03:00
dahn
545e89c1cb
Mshost stats (#5588)
* ms stats thread added

* initial data collection for management server

* empty list management server metrics command

* bean copy into MS metrics object

* ms status VO

* further API and DB plumbing

* minimal metrics response in API

* remove commented, refactor data collection plumbing

* javadocs

* surpress stacktrace on expected error

* update status experiment

* ms status publish framework added

* review comment addressed

* static data to DB and API, /proc/ reading

* addressing review comments

* ui for ms details

* small ui adjustment

* beanCopy

* agentcount response and system parameter

* labels

* package-lock

* add version strings to regular list API

* add shutdown time to DB

* add last start and last stop to regular list response

* distro info in regular response/session  count added

* metrics as details

* add heap used and remove details map

* thread-statusses

* move db upgrade to 4.17

* sysmem

* procmem

* ui demo comments applied

* javadoc

* get conf and log file locations

* loginfo

* cpuLoadStats

* no.remote

* extra spaces removed

* clusterlistener

* add unit to kb value

* revert accidental rename

* silly fqcn removed

* get mem info from bean is possible

* refactor long sequence for readability

* registerListener

* listUsageMetrics and isDbLocal

* rats

* local usage and db or not

* minimal listDbMetrics

* db vars and stats

* cleanup and #queries queried

* db stats calculation

* rat

* remove list response wrapper from sinlge details-lists responses

* rudimentary metrics view

* metrics table cleanup

* table makeup, collection dates

* move component to appropriate location

* capitalisation removed

* rebase error resolved

* rename deamon to daemon

* small style comments applied

* another merge issue

* naming comments and boot time

* stop/start prefixed with server

* layout-fix

* listMSMetrics test and test refactor

* usage metrics test

* db metrics test

* extra validations

* Update ui/public/locales/en.json

Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>

* descriptions of loadaverages and replica's

* collection time on top

* cpu load on metrics overview

* DbStatsCollection

* some parameter description texts

* labels adjusted

* new output 'kernelversion' and log info cleanup

* labels

* Update api/src/main/java/com/cloud/server/ManagementServerHostStats.java

Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/response/DbMetricsResponse.java

Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>

* Update framework/cluster/src/main/java/com/cloud/cluster/dao/ManagementServerHostDao.java

Co-authored-by: Rodrigo D. Lopez <19981369+RodrigoDLopez@users.noreply.github.com>

* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java

Co-authored-by: Rodrigo D. Lopez <19981369+RodrigoDLopez@users.noreply.github.com>

* Update api/src/main/java/org/apache/cloudstack/api/response/ManagementServerResponse.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update api/src/main/java/org/apache/cloudstack/api/response/ManagementServerResponse.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update engine/schema/src/main/java/com/cloud/host/dao/HostDao.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update server/src/main/java/com/cloud/server/StatsCollector.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update framework/cluster/src/main/java/com/cloud/cluster/dao/ManagementServerHostDao.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update server/src/main/java/com/cloud/server/StatsCollector.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update server/src/main/java/com/cloud/server/StatsCollector.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update server/src/main/java/com/cloud/server/StatsCollector.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java

* some (more) refactorring suggestions applied

* human readable memory sizes

* rat

* actual collection time instead of query time, improved descriptions

* merge errors fixed

* optional metric values

* javadoc and logging

* names of jmx vars have changed

* vue3-compatibility

* new output parameter type

* lower retention default

* vue3 fixes

* polish comments

* polish comments 2, the reckoning

* note on usage servers

* merge conflict errors

* pollish

* conditional assertion to deal with simulator restart

Co-authored-by: Daan Hoogland <dahn@onecht.net>
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
Co-authored-by: Rodrigo D. Lopez <19981369+RodrigoDLopez@users.noreply.github.com>
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2022-04-22 08:48:19 -03:00
José Flauzino
16f2896940
Persistence of VM stats (#5984)
* Add persistence of VM stats

* Fix API 'since' attribute

* Add license

* Address GutoVeronezi's reviews

* Fix the order of VM stats in the API response

* Fix msid in VM stats data

* Fix disk stats and add minor improvements

* Add log message

* Build string using ReflectionToStringBuilderUtils

* Rerun checks

Co-authored-by: joseflauzino <jose@scclouds.com.br>
2022-04-11 10:42:21 -03:00
José Flauzino
28385be609
Fix metrics stats for VMs not running (#5633)
* Fix metrics stats for VMs that are not running

* Improves the way to get vmIdsToRemoveStats

* Improves test

Co-authored-by: José Flauzino <jose@scclouds.com.br>
2021-12-06 11:06:10 -03:00
Daniel Augusto Veronezi Salvador
b4aabadc4d
Replace string libraries with org.apache.commons.lang3.StringUtils (#5386)
* Replace google lib for lang3 and adjust methods calls

* Replace string libs by lang3

* Prohibit others string libs

Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-11-18 13:41:48 +05:30
sureshanaparti
0a88e710b2
Check the pool used space from the bytes used in the storage pool stats collector, for non-default primary storage pools that cannot provide stats. (#5586)
* Check the pool used space from the bytes used in the storage pool stats collector, for  non-default primary storage pools that cannot provide stats.
Also, Update the used bytes from the pool stats answer for non-default primary storage pools if the pool can provide stats.

* Update server/src/main/java/com/cloud/storage/StorageManagerImpl.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>

* space fix

Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
2021-10-25 08:23:07 -03:00
SadiJr
0a73f5162d
Externalize config to increment or not VM metrics in memory (#5351)
Co-authored-by: SadiJr <17a0db2854@firemailbox.club>
2021-08-24 14:16:58 -03:00
Daniel Augusto Veronezi Salvador
cbe380a068
Externalize secondary storage capacity threshold (#4790)
* Externalize secondary storage capacity threshold

* Use default value as threshold when config value is lower than 0.0

* Move config to CapacityManager

* Validate config in CapacityManagerImpl

* Use config in StorageOrchestrator

* Change config description

* Remove unused import

Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
2021-07-16 08:38:36 +02:00
Rohit Yadav
e107f9aa93 Merge remote-tracking branch 'origin/4.15' 2021-04-21 13:07:44 +05:30
Rohit Yadav
5051fde952
server: Stat collector solidfire capacity fix (#4918)
Fixes regression introduced in 71c5dbcf492a023dbea5f8c34f8fd883c3ad653f
which would cause capacity bytes of certain pools to be update which
shouldn't get updated by StatsCollector such as solidfire.

Fixes #4911

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2021-04-21 12:48:11 +05:30
sureshanaparti
eba186aa40
storage: New Dell EMC PowerFlex Plugin (formerly ScaleIO, VxFlexOS) (#4304)
Added support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack (for KVM hypervisor) and enabled VM/Volume operations on that pool (using pool tag).
Please find more details in the FS here:
https://cwiki.apache.org/confluence/x/cDl4CQ

Documentation PR: apache/cloudstack-documentation#169

This enables support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack

Other improvements addressed in addition to PowerFlex/ScaleIO support:

- Added support for config drives in host cache for KVM
	=> Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
	=> Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
	=> Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
	=> Added new parameter "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path and create config drives on the "/config" directory on the host cache path
	=> Maintain the config drive location and use it when required on any config drive operation (migrate, delete)

- Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates

- Updated full deployment destination for preparing the network(s) on VM start

- Propagate the direct download certificates uploaded to the newly added KVM hosts

- Discover the template size for direct download templates using any available host from the zones specified on template registration
	=> When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones

- Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)

- Retry VM deployment/start when the host cannot grant access to volume/template

- Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
	=> Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore

- Check the router filesystem is writable or not, before performing health checks
	=> Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
	=> The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
	=> Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not

- Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s)

* Addressed some issues for few operations on PowerFlex storage pool.

- Updated migration volume operation to sync the status and wait for migration to complete.

- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.

- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).

- Updated resize volume error message string.

- Blocked the below operations on PowerFlex storage pool:
  -> Extract Volume
  -> Create Snapshot for VMSnapshot

* Added the PowerFlex/ScaleIO client connection pool to manage the ScaleIO gateway clients, which uses a single gateway client per Powerflex/ScaleIO storage pool and renews it when the session token expires.

- The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
  Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html

Other fixes included:

- Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)

- Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.

- Perform basic tests (for connectivity and file system) on router before updating the health check config data
	=> Validate the basic tests (connectivity and file system check) on router
	=> Cleanup the health check results when router is destroyed

* Updated PowerFlex/ScaleIO storage plugin version to 4.16.0.0

* UI Changes to support storage plugin for PowerFlex/ScaleIO storage pool.
- PowerFlex pool URL generated from the UI inputs(Gateway, Username, Password, Storage Pool) when adding "PowerFlex" Primary Storage
- Updated protocol to "custom" for PowerFlex provider
- Allow VM Snapshot for stopped VM on KVM hypervisor and PowerFlex/ScaleIO storage pool

and Minor improvements in PowerFlex/ScaleIO storage plugin code

* Added support for PowerFlex/ScaleIO volume migration across different PowerFlex storage instances.

- findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
- Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
- Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
- Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.

* Fixed change service offering smoke tests in test_service_offerings.py, test_vm_snapshots.py

* Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)

* Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.

* Fix to remove the duplicate zone wide pools listed while finding storage pools for migration

* Updated PowerFlex/ScaleIO volume migration checks and rollback migration on failure

* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
2021-02-24 14:58:33 +05:30
Rohit Yadav
6bde1384ff Merge remote-tracking branch 'origin/4.14' into 4.15
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2021-02-05 16:01:01 +05:30
Wei Zhou
4de6ac3c05
server: Get vm network/disk statistics and update database per host (#4601)
* server: Get vm network/disk statistics and update database per host

* #4601 : modify debug message
2021-02-04 14:44:47 +05:30
Daan Hoogland
3ab43e2edb Merge branch '4.14' 2020-10-30 15:58:12 +00:00
Daan Hoogland
46ead2df71 Merge branch '4.13' into 4.14 2020-10-30 15:54:26 +00:00
slavkap
8afb451c1c
fix NPE in volumes statistics (#4388) 2020-10-30 15:53:05 +00:00
Rakesh
71c5dbcf49
server: Update use_bytes of storage pools (#4360)
Update the used_bytes for all default primary storage pools
Also get used_bytes of storage pool from database instead of
memory
2020-10-21 19:18:03 +02:00
Rohit Yadav
c7328652fd Merge remote-tracking branch 'origin/4.14' 2020-09-01 16:02:33 +05:30
Rohit Yadav
578d29e166 Merge remote-tracking branch 'origin/4.13' into 4.14
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-09-01 16:01:52 +05:30
Gabriel Beims Bräscher
5c29d5ba45
influxdb: Avoid out of memory by influxDB (#4291)
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510):

2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
java.lang.OutOfMemoryError: unable to create new native thread
        ...
        at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
        at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
        at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
        at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
        at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
        at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.

Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().
2020-09-01 15:59:43 +05:30
Spaceman1984
d57aa83517
server: Added nfs minor version support (#4180)
This PR adds minor version support when mounting nfs on the SSVM as requested in #2861

The global setting "secstorage.nfs.version" has been changed to use the String data type which allows any minor version to be specified.
2020-08-19 14:53:38 +05:30