* Update gson date format for serializing/deserializing Date in MS stats (across multiple management servers)
* review
* review comments, and unit tests
* added unit test with different date format
* Use separate Gson for MS stats serialization/deserialization
* ScaleIO/PowerFlex smoke tests improvements, and some fixes
* Fix test_volumes.py, encrypted volume size check (for powerflex volumes)
* Fix test_over_provisioning.py (over provisioning supported for powerflex)
* Update vm snapshot tests
* Update volume size delta in primary storage resource count for user vm volumes only
The VR volumes resource count for PowerFlex volumes is updated here, resulting in resource count discrepancy
(which is re-calculated through ResourceCountCheckTask later, and skips the VR volumes)
* Fix test_import_unmanage_volumes.py (unsupported for powerflex)
* Fix test_sharedfs_lifecycle.py (volume size check for powerflex)
* Update powerflex.connect.on.demand config default to true
* Set loadHistory value to zero when interval is zero to not throw arithmatic exception
* Change loadHistory value to -1 and fix maxsize bug
---------
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
* api,agent,server,engine-schema: scalability improvements
Following changes and improvements have been added:
- Improvements in handling of PingRoutingCommand
1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
3. Optimized scanning stalled VMs
- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`
- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine
- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.
- Added caching for account/use role API access with expiration after write set to 60 seconds.
- Added caching for some recurring DB retrievals
1. CapacityManager - listing service offerings - beneficial in host capacity calculation
2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
3. DownloadListener - hypervisors for zone - beneficial for host joins
5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands
- Optimized MS list retrieval for agent connect
- Optimize finding ready systemvm template for zone
- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks
- Changes in agent-agentmanager connection with NIO client-server classes
1. Optimized the use of the executor service
2. Refactore Agent class to better handle connections.
3. Do SSL handshakes within worker threads
5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.
- Improvements in StatsCollection - minimize DB retrievals.
- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.
- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.
- Minor improvements in resource limit calculations wrt DB retrievals
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* test1, domaindetails, capacitymanager fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* test2 - agent tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* capacitymanagertest fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* change
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix missing changes
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert marvin/setup.py
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix indent
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* use space in sql
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address duplicate
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* update host logs
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert e36c6a5d07
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix npe in capacity calculation
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* move schema changes to 4.20.1 upgrade
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix build
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add some more tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* checkstyle fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* remove unnecessary mocks
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* replace statics
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* engine/orchestration,utils: limit number of concurrent new agent
connections
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* refactor - remove unused
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* unregister closed connections, monitor & cleanup
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add check for outdated vm filter in power sync
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* agent: synchronize sendRequest wait
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Adds framework layer change to allow retrieving and storing IOPS stats for storage pools. Custom `PrimaryStoreDriver` can implement method - `getStorageIopsStats` for returning IOPS stats. Existing method `getUsedIops` can also be overridden by such plugins when only used IOPS is returned.
For testing purpose, implementation has been added for simulator hypervisor plugin to return capacity and used IOPS for a pool.
For local storage pool, implementation has been added using iostat to return currently used IOPS.
StoragePoolResponse class has been updated to return IOPS values which allows showing IOPS values in UI for different storage pool related views and APIs.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Improve logging to include more identifiable information for kvm plugin
* Update logging for scaleio plugin
* Improve logging to include more identifiable information for default volume storage plugin
* Improve logging to include more identifiable information for agent managers
* Improve logging to include more identifiable information for Listeners
* Replace ids with objects or uuids
* Improve logging to include more identifiable information for engine
* Improve logging to include more identifiable information for server
* Fixups in engine
* Improve logging to include more identifiable information for plugins
* Improve logging to include more identifiable information for Cmd classes
* Fix toString method for StorageFilterTO.java
* Improvement: management server peer states
* Update pr9885: consider new mgmt server node which has msId=managementServerNodeId
* Update pr9885: update global config description
* Update pr9885: update label on UI
* framework: Do not update mshost_peer when mgmt server is Up as it will be updated by status update
* mgmt: Update state to Up when mgmt server writes heartbeat to db
* mgmt: change Service IP to Management IP
---------
Co-authored-by: Boris Stoyanov - a.k.a Bobby <bss.stoyanov@gmail.com>
- Changes behaviour of details param handling via global setting:
- listVirtualMachines API: when the details param is not provided, it returns whether stats are returned controlled by a new global setting `list.vm.default.details.stats`
- listVirtualMachinesMetrics API: when the details param is not provided, it uses `all` details including `stats`
- Users who are affected slow performance of the listVirtualMachines API response time can set `list.vm.default.details.stats` to `false`
- Remove ConfigKey vm.stats.increment.metrics.in.memory which was renamed to `vm.stats.increment.metrics` in #5984 and also remove unused/unnecessary global settings via upgrade path
- Changes default value of VM stats accumulation setting `vm.stats.increment.metrics` to false until a better solution emerges. Since #5984, this is true and during the execution of listVM APIs the stats are clubbed/calculated which can immensely slow down list VM API calls. Any costly operations such as summing of stats shouldn't be done during the course of a synchronous API, such as the list VM API.
- Fix UI that uses listVirtualMachinesMetrics to not call `stats` detail when in list view without metrics selected.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Normalize logs
All classes that could have their loggers inherited from their fathers had their own loggers deleted;
Most loggers didn't have to be static, so most of them were normalized so that they wouldn't be;
All loggers are protected now;
Static logger's name are now 'LOGGER';
Non-static logger's name are now 'logger';
New class DbUpgradeAbstractImpl created so that all Upgraders extend it and inherit its logger
* Upgrade log4j
* fix errors caused by the merge
* Refactor cglibThrowableRenderer functionality to log4j2 and upgrade the last configuration files
* fix sonarcloud bug
* Fix errors caused by merge, remove some unused loggers, and rename a variable that was mistakenly renamed on the normalization commit
* Readd snmpTrapAppender, remove TestAppender
* Regenerate changes
* regenerate changes
* refactor last custom appender
* fix systemvm configuration xml
* Regenerate changes
* Regenerate changes
* regenerate changes
* Regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* Fix utils pom
* fix some tests
* regenerate changes
* Fix jar being printed on exception
* fix logging in system VMs, fix commands not having log4j2 classpath.
* regenerate changes
* Fix some unwanted renomeations
* fix end of file
* regenerate changes
* regenerate changes
* fix merge error
* regenerate changes
* fix tests
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* readd reload4j to tungsten as juniper depends on it
* Regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* re-add reload4j dependency to network-contrail, as juniper depends on it
* regenerate changes
* regenerate changes
* regenerate changes
* fix typo
* regenerate changes
* regenerate changes
* Fix end of files
* regenerate changes
* add logj42 to cloud-utils-SHADED.jar
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* regenerate changes
* Regenerate changes
* Regenerate changes
* Regenerate changes
* regenerate changes
* Regenerate changes
* regenerate changes
* Regenerate changes
* Regenerate changes
* Regenerate changes
* regenerate changes
* Regenerate changes
* Regenerate changes
* fix some tests
* Regenerate changes
* Regenerate changes
* fix test
* Regenerate changes
* Regenerate changes
This simply improves the log statement that prints debug statements
during beginning of a stats collector run for hosts or VMs.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This PR provides a new primary storage volume type called "FiberChannel" that allows access to volumes connected to hosts over fiber channel connections. It requires Multipath to provide path discovery and failover. Second, the PR adds an AdaptivePrimaryDatastoreProvider that abstracts how volumes are managed/orchestrated from the connector to communicate with the primary storage provider, using a ProviderAdapter interface, allowing the code interacting with the primary storage provider API's to be simpler and have no direct dependencies on Cloudstack code. Lastly, the PR provides an implementation of the ProviderAdapter classes for the HP Enterprise Primera line of storage solutions and the Pure Flash Array line of storage solutions.
* ms stats thread added
* initial data collection for management server
* empty list management server metrics command
* bean copy into MS metrics object
* ms status VO
* further API and DB plumbing
* minimal metrics response in API
* remove commented, refactor data collection plumbing
* javadocs
* surpress stacktrace on expected error
* update status experiment
* ms status publish framework added
* review comment addressed
* static data to DB and API, /proc/ reading
* addressing review comments
* ui for ms details
* small ui adjustment
* beanCopy
* agentcount response and system parameter
* labels
* package-lock
* add version strings to regular list API
* add shutdown time to DB
* add last start and last stop to regular list response
* distro info in regular response/session count added
* metrics as details
* add heap used and remove details map
* thread-statusses
* move db upgrade to 4.17
* sysmem
* procmem
* ui demo comments applied
* javadoc
* get conf and log file locations
* loginfo
* cpuLoadStats
* no.remote
* extra spaces removed
* clusterlistener
* add unit to kb value
* revert accidental rename
* silly fqcn removed
* get mem info from bean is possible
* refactor long sequence for readability
* registerListener
* listUsageMetrics and isDbLocal
* rats
* local usage and db or not
* minimal listDbMetrics
* db vars and stats
* cleanup and #queries queried
* db stats calculation
* rat
* remove list response wrapper from sinlge details-lists responses
* rudimentary metrics view
* metrics table cleanup
* table makeup, collection dates
* move component to appropriate location
* capitalisation removed
* rebase error resolved
* rename deamon to daemon
* small style comments applied
* another merge issue
* naming comments and boot time
* stop/start prefixed with server
* layout-fix
* listMSMetrics test and test refactor
* usage metrics test
* db metrics test
* extra validations
* Update ui/public/locales/en.json
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
* descriptions of loadaverages and replica's
* collection time on top
* cpu load on metrics overview
* DbStatsCollection
* some parameter description texts
* labels adjusted
* new output 'kernelversion' and log info cleanup
* labels
* Update api/src/main/java/com/cloud/server/ManagementServerHostStats.java
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/response/DbMetricsResponse.java
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/dao/ManagementServerHostDao.java
Co-authored-by: Rodrigo D. Lopez <19981369+RodrigoDLopez@users.noreply.github.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java
Co-authored-by: Rodrigo D. Lopez <19981369+RodrigoDLopez@users.noreply.github.com>
* Update api/src/main/java/org/apache/cloudstack/api/response/ManagementServerResponse.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update api/src/main/java/org/apache/cloudstack/api/response/ManagementServerResponse.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update engine/schema/src/main/java/com/cloud/host/dao/HostDao.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update server/src/main/java/com/cloud/server/StatsCollector.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update framework/cluster/src/main/java/com/cloud/cluster/dao/ManagementServerHostDao.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update server/src/main/java/com/cloud/server/StatsCollector.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update server/src/main/java/com/cloud/server/StatsCollector.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update server/src/main/java/com/cloud/server/StatsCollector.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
* Update plugins/metrics/src/main/java/org/apache/cloudstack/metrics/MetricsServiceImpl.java
* some (more) refactorring suggestions applied
* human readable memory sizes
* rat
* actual collection time instead of query time, improved descriptions
* merge errors fixed
* optional metric values
* javadoc and logging
* names of jmx vars have changed
* vue3-compatibility
* new output parameter type
* lower retention default
* vue3 fixes
* polish comments
* polish comments 2, the reckoning
* note on usage servers
* merge conflict errors
* pollish
* conditional assertion to deal with simulator restart
Co-authored-by: Daan Hoogland <dahn@onecht.net>
Co-authored-by: sureshanaparti <12028987+sureshanaparti@users.noreply.github.com>
Co-authored-by: Rodrigo D. Lopez <19981369+RodrigoDLopez@users.noreply.github.com>
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Add persistence of VM stats
* Fix API 'since' attribute
* Add license
* Address GutoVeronezi's reviews
* Fix the order of VM stats in the API response
* Fix msid in VM stats data
* Fix disk stats and add minor improvements
* Add log message
* Build string using ReflectionToStringBuilderUtils
* Rerun checks
Co-authored-by: joseflauzino <jose@scclouds.com.br>
* Fix metrics stats for VMs that are not running
* Improves the way to get vmIdsToRemoveStats
* Improves test
Co-authored-by: José Flauzino <jose@scclouds.com.br>
* Check the pool used space from the bytes used in the storage pool stats collector, for non-default primary storage pools that cannot provide stats.
Also, Update the used bytes from the pool stats answer for non-default primary storage pools if the pool can provide stats.
* Update server/src/main/java/com/cloud/storage/StorageManagerImpl.java
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* space fix
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
* Externalize secondary storage capacity threshold
* Use default value as threshold when config value is lower than 0.0
* Move config to CapacityManager
* Validate config in CapacityManagerImpl
* Use config in StorageOrchestrator
* Change config description
* Remove unused import
Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
Fixes regression introduced in 71c5dbcf492a023dbea5f8c34f8fd883c3ad653f
which would cause capacity bytes of certain pools to be update which
shouldn't get updated by StatsCollector such as solidfire.
Fixes#4911
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Added support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack (for KVM hypervisor) and enabled VM/Volume operations on that pool (using pool tag).
Please find more details in the FS here:
https://cwiki.apache.org/confluence/x/cDl4CQ
Documentation PR: apache/cloudstack-documentation#169
This enables support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack
Other improvements addressed in addition to PowerFlex/ScaleIO support:
- Added support for config drives in host cache for KVM
=> Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
=> Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
=> Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
=> Added new parameter "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path and create config drives on the "/config" directory on the host cache path
=> Maintain the config drive location and use it when required on any config drive operation (migrate, delete)
- Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates
- Updated full deployment destination for preparing the network(s) on VM start
- Propagate the direct download certificates uploaded to the newly added KVM hosts
- Discover the template size for direct download templates using any available host from the zones specified on template registration
=> When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones
- Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)
- Retry VM deployment/start when the host cannot grant access to volume/template
- Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
=> Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore
- Check the router filesystem is writable or not, before performing health checks
=> Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
=> The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
=> Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not
- Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s)
* Addressed some issues for few operations on PowerFlex storage pool.
- Updated migration volume operation to sync the status and wait for migration to complete.
- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.
- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).
- Updated resize volume error message string.
- Blocked the below operations on PowerFlex storage pool:
-> Extract Volume
-> Create Snapshot for VMSnapshot
* Added the PowerFlex/ScaleIO client connection pool to manage the ScaleIO gateway clients, which uses a single gateway client per Powerflex/ScaleIO storage pool and renews it when the session token expires.
- The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html
Other fixes included:
- Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)
- Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.
- Perform basic tests (for connectivity and file system) on router before updating the health check config data
=> Validate the basic tests (connectivity and file system check) on router
=> Cleanup the health check results when router is destroyed
* Updated PowerFlex/ScaleIO storage plugin version to 4.16.0.0
* UI Changes to support storage plugin for PowerFlex/ScaleIO storage pool.
- PowerFlex pool URL generated from the UI inputs(Gateway, Username, Password, Storage Pool) when adding "PowerFlex" Primary Storage
- Updated protocol to "custom" for PowerFlex provider
- Allow VM Snapshot for stopped VM on KVM hypervisor and PowerFlex/ScaleIO storage pool
and Minor improvements in PowerFlex/ScaleIO storage plugin code
* Added support for PowerFlex/ScaleIO volume migration across different PowerFlex storage instances.
- findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
- Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
- Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
- Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.
* Fixed change service offering smoke tests in test_service_offerings.py, test_vm_snapshots.py
* Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)
* Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.
* Fix to remove the duplicate zone wide pools listed while finding storage pools for migration
* Updated PowerFlex/ScaleIO volume migration checks and rollback migration on failure
* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510):
2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
java.lang.OutOfMemoryError: unable to create new native thread
...
at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.
Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().
This PR adds minor version support when mounting nfs on the SSVM as requested in #2861
The global setting "secstorage.nfs.version" has been changed to use the String data type which allows any minor version to be specified.