79 Commits

Author SHA1 Message Date
Vishesh
19f79b1d94
Merge branch '4.19' 2024-04-12 16:31:40 +05:30
Suresh Kumar Anaparti
d3e020a545
Mark libvirt events experimental, add properties flag (#8825)
* Mark libvirt events experimental, add properties flag

* unit test fixes

---------

Co-authored-by: Marcus Sorensen <mls@apple.com>
2024-04-11 17:06:33 +05:30
João Jandre
49cecaed06
Normalize loggers and upgrade log4j 1.2 to log4j 2.19 (#7131)
* Normalize logs

All classes that could have their loggers inherited from their fathers had their own loggers deleted;
Most loggers didn't have to be static, so most of them were normalized so that they wouldn't be;
All loggers are protected now;
Static logger's name are now 'LOGGER';
Non-static logger's name are now 'logger';
New class DbUpgradeAbstractImpl created so that all Upgraders extend it and inherit its logger

* Upgrade log4j

* fix errors caused by the merge

* Refactor cglibThrowableRenderer functionality to log4j2 and upgrade the last configuration files

* fix sonarcloud bug

* Fix errors caused by merge, remove some unused loggers, and rename a variable that was mistakenly renamed on the normalization commit

* Readd snmpTrapAppender, remove TestAppender

* Regenerate changes

* regenerate changes

* refactor last custom appender

* fix systemvm configuration xml

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Fix utils pom

* fix some tests

* regenerate changes

* Fix jar being printed on exception

* fix logging in system VMs, fix commands not having log4j2 classpath.

* regenerate changes

* Fix some unwanted renomeations

* fix end of file

* regenerate changes

* regenerate changes

* fix merge error

* regenerate changes

* fix tests

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* readd reload4j to tungsten as juniper depends on it

* Regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* re-add reload4j dependency to network-contrail, as juniper depends on it

* regenerate changes

* regenerate changes

* regenerate changes

* fix typo

* regenerate changes

* regenerate changes

* Fix end of files

* regenerate changes

* add logj42 to cloud-utils-SHADED.jar

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* Regenerate changes

* regenerate changes

* Regenerate changes

* Regenerate changes

* fix some tests

* Regenerate changes

* Regenerate changes

* fix test

* Regenerate changes

* Regenerate changes
2024-02-08 09:55:41 -03:00
Vishesh
fedcf66de0
Externalise a few timeouts & fix timeout for hostSupportsUefi in libvirt ready command wrapper (#8547)
This PR fixes bug introduced in #8502. Timeout for script execution was set to 60 ms instead of 60s which resulted in host not getting UEFI enabled. This is a blocker for 4.19 release.

We do this by introducing a new agent parameter `agent.script.timeout` (default - 60 seconds) to use as a timeout for the script checking host's UEFI status.

We also externalize the timeout for the ReadyCommand by introducing a new global setting `ready.command.wait` (default - 60 seconds).

For ModifyStoragePoolCommand, we don't externalize the timeout to avoid confusion for the user. Since, the required timeout can vary depending on the provider in use and we are only setting the wait for default host listener for now. Instead, we reuse the global `wait` setting by dividing it by `5` making the default value of 6 minutes (1800/5 = 360s) for ModifyStoragePoolCommand.

Note: the actual time, the MS waits is twice the wait set for a Command. Check reference code below.
19250403e6/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentAttache.java (L406-L442)
2024-01-27 23:36:13 +05:30
Nicolas Vazquez
371ad9f55b
New Feature: Import VMware VMs into KVM (#7881)
This PR adds the capability in CloudStack to convert VMware Instances disk(s) to KVM using virt-v2v and import them as CloudStack instances. It enables CloudStack operators to import VMware instances from vSphere into a KVM cluster managed by CloudStack. vSphere/VMware setup might be managed by CloudStack or be a standalone setup.

    CloudStack will let the administrator select a VM from an existing VMware vCenter in the CloudStack environment or external vCenter requesting vCenter IP, Datacenter name and credentials.
    The migrated VM will be imported as a KVM instance
    The migration is done through virt-v2v: https://access.redhat.com/articles/1351473, https://www.ovirt.org/develop/release-management/features/virt/virt-v2v-integration.html
    The migration process timeout can be set by the setting convert.instance.process.timeout
    Before attempting the virt-v2v migration, CloudStack will create a clone of the source VM on VMware. The clone VM will be removed after the registration process finishes.
    CloudStack will delegate the migration action to a KVM host and the host will attempt to migrate the VM invoking virt-v2v. In case the guest OS is not supported then CloudStack will handle the error operation as a failure
    The migration process using virt-v2v may not be a fast process
    CloudStack will not perform any check about the guest OS compatibility for the virt-v2v library as indicated on: https://access.redhat.com/articles/1351473.
2023-12-07 12:59:56 +05:30
Stephan Krug
267a457efc
Externalize KVM HA heartbeat frequency (#6892)
Co-authored-by: Stephan Krug <stephan.krug@scclouds.com.br>
Co-authored-by: GaOrtiga <49285692+GaOrtiga@users.noreply.github.com>
Co-authored-by: dahn <daan.hoogland@gmail.com>
2023-11-16 09:17:17 +01:00
Marcus Sorensen
82b981854b
KVM Agent config to reserve dom0 CPUs (#7987)
This PR allows an admin to reserve some hypervisor host CPUs for system use. Another way to think of it is limiting the number of CPUs allocatable to VMs. This can be useful if the admin wants to do other things with the hypervisor's CPU, for example reserve some cores for running hyperconverged storage processes.

Co-authored-by: Marcus Sorensen <mls@apple.com>
2023-10-06 10:50:18 +05:30
Nicolas Vazquez
be66eb2a35
Auto Enable/Disable KVM hosts (#7170)
* Auto Enable Disable KVM hosts

* Improve health check result

* Fix corner cases

* Script path refactor

* Fix sonar cloud reports

* Fix last code smells

* Add marvin tests

* Fix new line on agent.properties to prevent host add failures

* Send alert on auto-enable-disable and add annotations when the setting is enabled

* Address reviews

* Add a reason for enabling or disabling a host when the automatic feature is enabled

* Fix comment on the marvin test description

* Fix for disabling the feature if the admin has manually updated the host resource state before any health check result
2023-04-04 17:03:37 +05:30
slavkap
d288bb0c78
KVM support of iothreads and IO driver policy (#6909) 2023-01-25 12:34:05 +01:00
Paula Oliveira
0fe2e6950e
Improving code related to the Agent properties (#6348)
Co-authored-by: Paula Zomignani Oliveira <paula@scclouds.com.br>
Co-authored-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2022-12-22 12:00:49 +01:00
José Flauzino
1843632c24
Fix memory stats for KVM (#6358)
Co-authored-by: joseflauzino <jose@scclouds.com.br>
2022-11-09 18:00:12 +01:00
nvazquez
96594aec28
Merge branch '4.16' 2022-05-23 08:16:52 -03:00
Nicolas Vazquez
dc975dff95
[KVM] Enable IOURING only when it is available on the host (#6399)
* [KVM] Disable IOURING by default on agents

* Refactor

* Remove agent property for iouring

* Restore property

* Refactor suse check and enable on ubuntu by default

* Refactor irrespective of guest OS

* Improvement

* Logs and new path

* Refactor condition to enable iouring

* Improve condition

* Refactor property check

* Improvement

* Doc comment

* Extend comment

* Move method

* Add log
2022-05-23 08:11:14 -03:00
nvazquez
65dc2df896
Merge branch '4.16' 2022-04-13 08:45:48 -03:00
slavkap
42a92dcdd3
Extract the IO_URING configuration into the agent.properties (#6253)
When using advanced virtualization the IO Driver is not supported. The
admin will decide if want to enable/disable this configuration from
agent.properties file. The default value is true
2022-04-13 08:43:35 -03:00
Nicolas Vazquez
2f2d6cbe38
KVM: Enhance CPU speed detection on hosts (#6175)
* KVM: Enhance CPU speed detection on hosts

* Update agent/conf/agent.properties

Co-authored-by: dahn <daan.hoogland@gmail.com>

Co-authored-by: dahn <daan.hoogland@gmail.com>
2022-04-01 12:12:34 -03:00
John Bampton
15937369fe
Fix spelling (#6161)
* Fix spelling

* Fix spelling
2022-03-29 00:21:07 -03:00
John Bampton
6401c850b7
Fix spelling (#6064)
* Fix spelling

- `interupted` to `interrupted`
- `paramter` to `parameter`

* Fix more typos
2022-03-08 13:02:35 -03:00
Wei Zhou
95ef292860
kvm: honor migrate.wait and abort vm migration job (#5388)
* kvm: honor migrate.wait and abort vm migration job

* kvm: propogate migratewait to all cloudstack agents on kvm hosts

* update #5388

* update #5388: display error msg
2021-09-15 12:38:16 -03:00
Daniel Augusto Veronezi Salvador
159c72fa97
Externalize KVM Agent's option to change migration thread timeout (#4570)
* Externalize KVM Agent's option to change migration thread timeout

* Update javadoc

Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-08-26 08:18:17 -03:00
Daniel Augusto Veronezi Salvador
83c0b61ab2
Externalize KVM Agent storage's reboot configuration (#4586)
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-08-24 01:06:00 -03:00
Daniel Augusto Veronezi Salvador
349120f7c5
Externalize config to enable manually setting CPU topology on KVM VM (#5273)
* Externalize config to enable manually setting CPU topology on KVM VM

* Change log level

Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-08-15 23:52:50 -03:00
Daniel Augusto Veronezi Salvador
7b752c3077
Externalize KVM Agent storage's timeout configuration (#5239)
* Externalize KVM Agent storage's timeout configuration

* Address @nvazquez review

* Add empty line at the end of the agent.properties file

Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
2021-07-28 15:45:27 +02:00
dahn
6f93e5cd08
Revert "Externalize kvm agent storage timeout configuration (#4585)" (#5218)
This reverts commit 05a978c2495d1c93c798d2b845c5030ad78f6a14.
2021-07-20 09:16:43 +02:00
Daniel Augusto Veronezi Salvador
05a978c249
Externalize kvm agent storage timeout configuration (#4585)
* Externalize KVM Agent storage's timeout configuration

Created a class of constant agent's properties available to configure on "agent.properties".
Created a class to provides a facility to read the agent's properties file and get its properties.

* Refactored KVHAMonitor nested thread and changed some logs

* It has been added the timeout's config in the agent.properties file

* Rename classes

* Rename var and remove comment

* Fix typo with word "heartbeat"

* Extract multiple methods call to variables

* Add unit tests to file handler

* Increase info about the property

* Create inner class Property

* Rename method getProperty to getPropertyValue

* Remove copyright

* Remove copyright

* Extract code to createHeartBeatCommand

* Change method access from protected to private

Co-authored-by: Daniel Augusto Veronezi Salvador <daniel@scclouds.com.br>
2021-07-19 09:41:50 +02:00
sureshanaparti
eba186aa40
storage: New Dell EMC PowerFlex Plugin (formerly ScaleIO, VxFlexOS) (#4304)
Added support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack (for KVM hypervisor) and enabled VM/Volume operations on that pool (using pool tag).
Please find more details in the FS here:
https://cwiki.apache.org/confluence/x/cDl4CQ

Documentation PR: apache/cloudstack-documentation#169

This enables support for PowerFlex/ScaleIO (v3.5 onwards) storage pool as a primary storage in CloudStack

Other improvements addressed in addition to PowerFlex/ScaleIO support:

- Added support for config drives in host cache for KVM
	=> Changed configuration "vm.configdrive.primarypool.enabled" scope from Global to Zone level
	=> Introduced new zone level configuration "vm.configdrive.force.host.cache.use" (default: false) to force host cache for config drives
	=> Introduced new zone level configuration "vm.configdrive.use.host.cache.on.unsupported.pool" (default: true) to use host cache for config drives when storage pool doesn't support config drive
	=> Added new parameter "host.cache.location" (default: /var/cache/cloud) in KVM agent.properties for specifying the host cache path and create config drives on the "/config" directory on the host cache path
	=> Maintain the config drive location and use it when required on any config drive operation (migrate, delete)

- Detect virtual size from the template URL while registering direct download qcow2 (of KVM hypervisor) templates

- Updated full deployment destination for preparing the network(s) on VM start

- Propagate the direct download certificates uploaded to the newly added KVM hosts

- Discover the template size for direct download templates using any available host from the zones specified on template registration
	=> When zones are not specified while registering template, template size discovery is performed using any available host, which is picked up randomly from one of the available zones

- Release the VM resources when VM is sync-ed to Stopped state on PowerReportMissing (after graceful period)

- Retry VM deployment/start when the host cannot grant access to volume/template

- Mark never-used or downloaded templates as Destroyed on deletion, without sending any DeleteCommand
	=> Do not trigger any DeleteCommand for never-used or downloaded templates as these doesn't exist and cannot be deleted from the datastore

- Check the router filesystem is writable or not, before performing health checks
	=> Introduce a new test "filesystem.writable.test" to check the filesystem is writable or not
	=> The router health checks keeps the config info at "/var/cache/cloud" and updates the monitor results at "/root" for health checks, both are different partitions. So, test at both the locations.
	=> Added new script: "filesystem_writable_check.py" at /opt/cloud/bin/ to check the filesystem is writable or not

- Fixed NPE issue, template is null for DATA disks. Copy template to target storage for ROOT disk (with template id), skip DATA disk(s)

* Addressed some issues for few operations on PowerFlex storage pool.

- Updated migration volume operation to sync the status and wait for migration to complete.

- Updated VM Snapshot naming, for uniqueness in ScaleIO volume name when more than one volume exists in the VM.

- Added sync lock while spooling managed storage template before volume creation from the template (non-direct download).

- Updated resize volume error message string.

- Blocked the below operations on PowerFlex storage pool:
  -> Extract Volume
  -> Create Snapshot for VMSnapshot

* Added the PowerFlex/ScaleIO client connection pool to manage the ScaleIO gateway clients, which uses a single gateway client per Powerflex/ScaleIO storage pool and renews it when the session token expires.

- The token is valid for 8 hours from the time it was created, unless there has been no activity for 10 minutes.
  Reference: https://cpsdocs.dellemc.com/bundle/PF_REST_API_RG/page/GUID-92430F19-9F44-42B6-B898-87D5307AE59B.html

Other fixes included:

- Fail the VM deployment when the host specified in the deployVirtualMachine cmd is not in the right state (i.e. either Resource State is not Enabled or Status is not Up)

- Use the physical file size of the template to check the free space availability on the host, while downloading the direct download templates.

- Perform basic tests (for connectivity and file system) on router before updating the health check config data
	=> Validate the basic tests (connectivity and file system check) on router
	=> Cleanup the health check results when router is destroyed

* Updated PowerFlex/ScaleIO storage plugin version to 4.16.0.0

* UI Changes to support storage plugin for PowerFlex/ScaleIO storage pool.
- PowerFlex pool URL generated from the UI inputs(Gateway, Username, Password, Storage Pool) when adding "PowerFlex" Primary Storage
- Updated protocol to "custom" for PowerFlex provider
- Allow VM Snapshot for stopped VM on KVM hypervisor and PowerFlex/ScaleIO storage pool

and Minor improvements in PowerFlex/ScaleIO storage plugin code

* Added support for PowerFlex/ScaleIO volume migration across different PowerFlex storage instances.

- findStoragePoolsForMigration API returns PowerFlex pool(s) of different instance as suitable pool(s), for volume(s) on PowerFlex storage pool.
- Volume(s) with snapshots are not allowed to migrate to different PowerFlex instance.
- Volume(s) of running VM are not allowed to migrate to other PowerFlex storage pools.
- Volume migration from PowerFlex pool to Non-PowerFlex pool, and vice versa are not supported.

* Fixed change service offering smoke tests in test_service_offerings.py, test_vm_snapshots.py

* Added the PowerFlex/ScaleIO volume/snapshot name to the paths of respective CloudStack resources (Templates, Volumes, Snapshots and VM Snapshots)

* Added new response parameter “supportsStorageSnapshot” (true/false) to volume response, and Updated UI to hide the async backup option while taking snapshot for volume(s) with storage snapshot support.

* Fix to remove the duplicate zone wide pools listed while finding storage pools for migration

* Updated PowerFlex/ScaleIO volume migration checks and rollback migration on failure

* Fixed the PowerFlex/ScaleIO volume name inconsistency issue in the volume path after migration, due to rename failure
2021-02-24 14:58:33 +05:30
Rohit Yadav
05ae3f8d81 Merge remote-tracking branch 'origin/4.13' into 4.14 2020-08-21 15:38:18 +05:30
Sid Kattoju
1da76d27f1
iscsi session cleanup now configurable, filters iscsi partitions (#4219)
Added property to agent.properties that enables or disables the iscsi session clean up feature. #4210
Added a condition to prevent disk partitions from being cleaned up. #4216
2020-08-21 14:38:36 +05:30
Bitworks LLC
750abf3551
FEATURE-3823: kvm agent hooks (#3839) 2020-03-14 09:22:08 +01:00
Nicolas Vazquez
efe00aa7e0
[KVM] Rolling maintenance (#3610) 2020-03-12 16:59:46 +01:00
Nicolas Vazquez
73122fd0a9
[KVM] Direct download agnostic of the storage provider (#3828)
* Remove constraint for NFS storage

* Add new property on agent.properties

* Add free disk space on the host prior template download

* Add unit tests for the free space check

* Fix free space check - retrieve avaiable size in bytes

* Update default location for direct download

* Improve the method to retrieve hosts to retry on depending on the destination pool type and scope

* Verify location for temporary download exists before checking free space

* In progress - refactor and extension

* Refactor and fix

* Last fixes and marvin tests

* Remove unused test file

* Improve logging

* Change default path for direct download

* Fix upload certificate

* Fix ISO failure after retry

* Fix metalink filename mismatch error

* Fix iso direct download

* Fix for direct download ISOs on local storage and shared mount point

* Last fix iso

* Fix VM migration with ISO

* Refactor volume migration to remove secondary storage intermediate

* Fix simulator issue
2020-03-06 19:56:54 +01:00
Wido den Hollander
7e64945b88 kvm: Logrotate should not touch agent.log (#3597)
Logrotate should only touch security_group.log and resizevolume.log
as the agent.log is already rotated by log4j inside the Agent.

Having two systems trying to rotate agent.log leads to all kinds of
issues like having binary (compressed) data in the middle of a plain-text
log file.

In addition we do not have to rotate the logs every day, only when they
grow larger than 10M. On fairly idle hypervisors this should not cause
those logs to rotate every day.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
2020-01-03 14:12:08 +01:00
Rohit Yadav
524b995083
IoT/ARM64 support: allow cloudstack-agent on Raspberry Pi 4 (armv8) to use kvm acceleration (#3644)
KVM is supported on arm64 Linux (https://www.linux-kvm.org/page/Processor_support#ARM:).
For a small (IoT) platform such as the new Raspberry Pi 4 that uses armv8 processor
(cortex-a72) it's possible to run Linux host with `/dev/kvm`
accleration. This adds support for IoT IaaS in CloudStack.

This PR is from a fun weekend project where:
- I set up a Raspberry Pi 4 - 4GB RAM model with 4 CPU cores @ 1.5Ghz, 128GB SD samsung evo plus card
- Installed Ubuntu 19.10 raspi3 base image: http://cdimage.ubuntu.com/releases/19.10/release/ubuntu-19.10-preinstalled-server-arm64+raspi3.img.xz
- Build a custom Linux 5.3 kernel with KVM enabled, deb here: http://dl.rohityadav.cloud/cloudstack-rpi/kernel-19.10/ and install the linux-image and linux-module
- Then install/setup CloudStack on it (fix some issues around jna, by manually installing newer libjna-java to /usr/share/cloudstack-agent/lib)
- Since the host processor is not x86_64, I had to build a new arm64 (or aarch64) systemvmtemplate: http://dl.rohityadav.cloud/cloudstack-rpi/systemvmtemplate/

I could finally get a 4.13 CloudStack + Adv zone/networking to run on it
and deployed a KVM based Ubuntu 19.10 environment and NFS storage.
Deployed a test vm with isolated network, VR works as expected. Console
proxy works as well, for this tested against arm64 openstack Debian 9/10
templates.

I raised the issue of enabling KVM in upstream Ubuntu arm64 build: https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1783961
Ubuntu kernel team has come back and future arm64 releases may have 
KVM enabled by default.

Limitation: on my aarch64 env, it did not support IDE, therefore all
default bus type for volumes are SCSI by default. With VIRTIO it fails
sometimes.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-11-11 22:01:05 +05:30
Wido den Hollander
8170ec56ad kvm: Logrotate should not touch agent.log (#3597)
Logrotate should only touch security_group.log and resizevolume.log
as the agent.log is already rotated by log4j inside the Agent.

Having two systems trying to rotate agent.log leads to all kinds of
issues like having binary (compressed) data in the middle of a plain-text
log file.

In addition we do not have to rotate the logs every day, only when they
grow larger than 10M. On fairly idle hypervisors this should not cause
those logs to rotate every day.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
2019-09-19 10:37:06 +05:30
Andrija Panic
df173859d7 agent: add host.reserved.mem.mb parameter documentation (#3016)
Document "host.reserved.mem.mb" parameter in agent.properties.
2018-11-12 11:40:37 +05:30
Nicolas Vazquez
4de4eabd18
Enable DPDK support on KVM (#2839)
* Enable DPDK support on KVM

* Allow DPDK deployments on user VMs only

* Fix port name ordering
2018-11-07 09:29:01 -03:00
Nicolas Vazquez
11d83fab43 agent: set log level to INFO as default for http wire (#2903)
Avoid logging bytes on direct download on KVM.
2018-10-16 10:32:03 +05:30
Rohit Yadav
30175d6879
CLOUDSTACK-10132: Extend support for management servers LB for agents (#2469)
The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.

This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).

This FR introduces two new global settings:

- `indirect.agent.lb.algorithm`: The algorithm for the indirect agent LB.
- `indirect.agent.lb.check.interval`: The preferred host check interval
  for the agent's background task that checks and switches to agent's
  preferred host.

The indirect.agent.lb.algorithm supports following algorithm options:

- static: use the list as provided.
- roundrobin: evenly spreads hosts across management servers based on
  host's id.
- shuffle: (pseudo) randomly sorts the list (not recommended for production).

Any changes to the global settings - `indirect.agent.lb.algorithm` and
`host` does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.

Comma-separated management server list is propagated to agents on
following cases:
- Addition of a host (including ssvm, cpvm systevms).
- Connection or reconnection by the agents to a management server.
- After admin changes the 'host' and/or the
  'indirect.agent.lb.algorithm' global settings.

On the agent side, the 'host' setting is saved in its properties file as:
`host=<comma separated addresses>@<algorithm name>`.

First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.

From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the `indirect.agent.lb.check.interval` which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
`agent.properties` file.

Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.

Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-03-15 16:34:03 +05:30
Frank Maximus
22d6718fe3 CLOUDSTACK-10255: Fix agent logrotate (#2429)
Renamed cloudstack-agent.logrotate to cloudstack-agent.logrotate.in,
so Ant will run the filterchain while copying.
This made the ant run copy block of cloudstack-agent.logrotate unnecessary,
so this is removed.
2018-01-24 23:35:27 +01:00
Bitworks Software, Ltd
3381c38cc7 CLOUDSTACK-10073: KVM host RAM overprovisioning (#2266)
Commit enables a new feature for KVM hypervisor which purpose is to increase virtually amount of RAM available beyond the actual limit.
There is a new parameter in agent.properties: host.overcommit.mem.mb which enables adding specified amount of RAM to actually available. It is necessary to utilize KSM and ZSwap features which extend RAM with deduplication and compression.
2017-09-29 11:46:09 +05:30
Wido den Hollander
b130e55088 CLOUDSTACK-9397: Add Watchdog timer to KVM Instance (#1707)
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.

When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.

If these heartbeats are no longer received by the HV it will reset the Instance.

If the Instance never sends the heartbeats the HV does not take action. It only
takes action if it stops sending heartbeats.

This is supported since Libvirt 0.7.3 and can be defined in the XML format as
described in the docs: https://libvirt.org/formatdomain.html#elementsWatchdog

To the 'devices' section this will be added:

In the agent.properties the action to be taken can be defined:

vm.watchdog.action=reset

The same goes for the model. The Intel i6300esb is however the most commonly used.

vm.watchdog.model=i6300esb

When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.

If these heartbeats are no longer received by the HV it will reset the Instance.

If the Instance never sends the heartbeats the HV does not take action. It only
takes action if it stops sending heartbeats.

This is supported since Libvirt 0.7.3 and can be defined in the XML format as
described in the docs: https://libvirt.org/formatdomain.html#elementsWatchdog

To the 'devices' section this will be added:

  <watchdog model='i6300esb' action='reset'/>

In the agent.properties the action to be taken can be defined:

  vm.watchdog.action=reset

The same goes for the model. The Intel i6300esb is however the most commonly used.

  vm.watchdog.model=i6300esb

Signed-off-by: Wido den Hollander <wido@widodh.nl>
2017-09-28 13:56:15 +05:30
Rohit Yadav
b60e173c5c Merge branch '4.9' into 4.10 2017-08-03 00:29:24 +02:00
Ronald van Zantvoort
974e01ed08 CLOUDSTACK-10011: Fix Agent logrotation (#2094)
* CS Agent: Correct logrotation for agent log
* CS Agent: Logrotate security_group as well
* CS Agent: fix logrotation file perms so logrotate doesnt skip it
2017-08-03 00:29:02 +02:00
Rohit Yadav
e9f526e221 Merge branch '4.9' into 4.10
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2017-07-28 10:49:34 +02:00
Abhinandan Prateek
7ed3179bd1 CLOUDSTACK-9569: add router.aggregation.command.each.timeout to agent.properties (#1933) 2017-07-27 11:17:20 +02:00
Wido den Hollander
2a5f37c1b1
CLOUDSTACK-8715: Add channel to Instances for Qemu Guest Agent
This commit adds a additional VirtIO channel with the name
'org.qemu.guest_agent.0' to all Instances.

With the Qemu Guest Agent the Hypervisor gains more control over the Instance if
these tools are present inside the Instance, for example:

* Power control
* Flushing filesystems
* Fetching Network information

In the future this should allow safer snapshots on KVM since we can instruct the
Instance to flush the filesystems prior to snapshotting the disk.

More information: http://wiki.qemu.org/Features/QAPI/GuestAgent

Keep in mind that on Ubuntu AppArmor still needs to be disabled since the default
AppArmor profile doesn't allow libvirt to write into /var/lib/libvirt/qemu

This commit does not add any communication methods through API-calls, it merely
adds the channel to the Instances and installs the Guest Agent in the SSVMs.

With the addition of the Qemu Guest Agent channel a second channel appears in /dev
on a SSVM as a VirtIO port.

The order in which the ports are defined in the XML matters for the naming inside
the SSVM VM and by not relying on /dev/vportXX but looking for a static name the
SSVM still boots properly if the order in the XML definition is changed.

A SSVM with both ports attached will have something like this:

  root@v-215-VM:~# ls -l /dev/virtio-ports
  total 0
  lrwxrwxrwx 1 root root 11 May 13 21:41 org.qemu.guest_agent.0 -> ../vport0p2
  lrwxrwxrwx 1 root root 11 May 13 21:41 v-215-VM.vport -> ../vport0p1
  root@v-215-VM:~# ls -l /dev/vport*
  crw------- 1 root root 251, 1 May 13 21:41 /dev/vport0p1
  crw------- 1 root root 251, 2 May 13 21:41 /dev/vport0p2
  root@v-215-VM:~#

In this case the SSVM port points to /dev/vport0p1, but if the order in the XML
is different it might point to /dev/vport0p2

By looking for a portname with a pre-defined pattern in /dev/virtio-ports we
do not rely on the order in the XML definition.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
2016-11-23 16:01:08 +01:00
Wido den Hollander
0beb41b6e7 CLOUDSTACK-9395: Add Virtio RNG device to Instances when configured
By adding a Random Number Generator device to Instances we can prevent
entropy starvation inside guest.

The default source is /dev/random on the host, but this can be configured
to another source when present, for example a hardware RNG.

When enabled it will add the following to the Instance's XML definition:

  <rng model='virtio'>
    <rate period='1000' bytes='2048' />
    <backend model='random'>/dev/random</backend>
  </rng>

If the Instance has the proper support, which most modern distributions have,
it will have a /dev/hwrng device which it can use for gathering entropy.

More information: https://libvirt.org/formatdomain.html#elementsRng
2016-10-04 12:44:55 +02:00
Koushik Das
a4dc271253 CLOUDSTACK-9105: Logging enhancement: Handle/reference to track API calls end to end in the MS logs
Added logid to logging framework, now all API call logs can be tracked with this id end to end
2015-12-04 12:45:57 +05:30
Boris Schrijver
5c0366c99e CLOUDSTACK-9062: Improve S3 implementation.
The S3 implementation is far from finished, this commit focusses on the bases.

 - Upgrade AWS SDK to latest version.
 - Rewrite S3 Template downloader.
 - Rewrite S3Utils utility class.
 - Improve addImageStoreS3 API command.
 - Split various classes for convenience.
 - Various minor improvements and code optimalisations.

A side effect of the new AWS SDK is that it, by default, uses the V4 signature. Therefore I added an option to specify the Signer, so it stays compatible with previous versions.
2015-11-19 15:29:10 +01:00
Rohit Yadav
909df859b3 CLOUDSTACK-9000: logrotate cloudstack-agent out and err logs
Adds logrotate rules for cloudstack-agent.{err,out} log files

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-10-28 17:17:03 +05:30