KVM is supported on arm64 Linux (https://www.linux-kvm.org/page/Processor_support#ARM:).
For a small (IoT) platform such as the new Raspberry Pi 4 that uses armv8 processor
(cortex-a72) it's possible to run Linux host with `/dev/kvm`
accleration. This adds support for IoT IaaS in CloudStack.
This PR is from a fun weekend project where:
- I set up a Raspberry Pi 4 - 4GB RAM model with 4 CPU cores @ 1.5Ghz, 128GB SD samsung evo plus card
- Installed Ubuntu 19.10 raspi3 base image: http://cdimage.ubuntu.com/releases/19.10/release/ubuntu-19.10-preinstalled-server-arm64+raspi3.img.xz
- Build a custom Linux 5.3 kernel with KVM enabled, deb here: http://dl.rohityadav.cloud/cloudstack-rpi/kernel-19.10/ and install the linux-image and linux-module
- Then install/setup CloudStack on it (fix some issues around jna, by manually installing newer libjna-java to /usr/share/cloudstack-agent/lib)
- Since the host processor is not x86_64, I had to build a new arm64 (or aarch64) systemvmtemplate: http://dl.rohityadav.cloud/cloudstack-rpi/systemvmtemplate/
I could finally get a 4.13 CloudStack + Adv zone/networking to run on it
and deployed a KVM based Ubuntu 19.10 environment and NFS storage.
Deployed a test vm with isolated network, VR works as expected. Console
proxy works as well, for this tested against arm64 openstack Debian 9/10
templates.
I raised the issue of enabling KVM in upstream Ubuntu arm64 build: https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1783961
Ubuntu kernel team has come back and future arm64 releases may have
KVM enabled by default.
Limitation: on my aarch64 env, it did not support IDE, therefore all
default bus type for volumes are SCSI by default. With VIRTIO it fails
sometimes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Logrotate should only touch security_group.log and resizevolume.log
as the agent.log is already rotated by log4j inside the Agent.
Having two systems trying to rotate agent.log leads to all kinds of
issues like having binary (compressed) data in the middle of a plain-text
log file.
In addition we do not have to rotate the logs every day, only when they
grow larger than 10M. On fairly idle hypervisors this should not cause
those logs to rotate every day.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
Using this tool on a hypervisor admins can query KVM Instances running
on that hypervisor if they have the Qemu Guest Agent installed.
All System VMs have this and they can be queried.
For example:
$ cloudstack-guest-tool i-2-25-VM
This will print some information about network and filesystem status.
root@hv-138-a05-23:~# ./cloudstack-guest-tool s-11-VM --command info|jq
{
"network": [
{
"ip-addresses": [
{
"prefix": 8,
"ip-address": "127.0.0.1",
"ip-address-type": "ipv4"
}
],
"name": "lo",
"hardware-address": "00:00:00:00:00:00"
},
{
"ip-addresses": [
{
"prefix": 16,
"ip-address": "169.254.242.169",
"ip-address-type": "ipv4"
}
],
"name": "eth0",
"hardware-address": "0e:00:a9:fe:f2:a9"
},
...
...
"filesystem": [
{
"mountpoint": "/var",
"disk": [
{
"bus": 0,
"bus-type": "virtio",
"target": 0,
"unit": 0,
"pci-controller": {
"slot": 7,
"bus": 0,
"domain": 0,
"function": 0
}
}
],
"type": "ext4",
"name": "vda6"
},
Signed-off-by: Wido den Hollander <wido@widodh.nl>
This fixes the qemu hooks `mkdir` race condition which can happen when
too many VMs may launch on a KVM host executing the hooks script that
tries to `mkdir` for the custom directory. On exception (multiple scripts
trying to mkdir), the VM stops.
The custom directory need not be created if it does not exist, instead
the custom hooks should only execute when there is a custom directory.
Feature documentation:
http://docs.cloudstack.apache.org/en/4.11.2.0/adminguide/hosts.html#kvm-libvirt-hook-script-include
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Improvements on upload direct download certificates
* Move upload direct download certificate logic to KVM plugin
* Extend unit test certificate expiration days
* Add marvin tests and command to revoke certificates
* Review comments
* Do not include revoke certificates API
* Keep connection alive when on maintenance
* Refactor cancel maintenance and unit tests
* Add marvin tests
* Refactor
* Changing the way we get ssh credentials
* Add check on SSH restart and improve marvin tests
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent is stopped, don't allow reconnection. Previously this would
send a shutdown command to the management server which would put the
host state to Disconnected but then agent's reconnection logic may kick
in sometimes which would connect the agent to the management server
but then the agent process would terminate causing the host to be
put in Alert state (due to ping timeout or it waiting too long).
This fixes the issue by ensuring that when the agent is stopped, it
does not reconnect to the management server.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue that on reconnection, agent LB feature will fail
and only the first ms-host will be tried reconnection again and again.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
KVM hook script include - logic to execute custom scripts & logging requirements
KVM hook script include - add logic to create custom directory if not exists & extra logging
* Cleaup and code-formatting POM files
* Remove obsolete mycila license-maven-plugin
* Remove obsolete console-proxy/plugin project
* Move console-proxy-rdbconsole under console-proxy parent
* Use correct parent path for rdpconsole
* Order alphabetally items in setnextversion.sh
* Unifiy License header in POMs
* Alphabetic order of modules definition
* Extract all defined versions into parent pom
* Remove obsolete files: version-info.in, configure-info.in
* Remove redundant defaultGoal
* Remove useless checkstyle plugin from checkstyle project
* Order alphabetally items in pom.xml
* Add aditional SPACEs to fix debian build
* Don't execute checkstyle on parent projects
* Use UTF-8 encoding in building checkstyle project
* Extract plugin versions into properties
* Execute PMD plugin on all the projects with -Penablefindbugs
* Upgrade maven plugins to latest version
* Make sure to always look for apache parent pom from repository
* Fix incorrect version grep in debian packaging
* Fix rebase conflicts
* Fix rebase conflicts
* Remove PMD for now to be fixed on another PR
Fixes the version in pom etc. to be consistent with versioning pattern as X.Y.Z.0-SNAPSHOT after a minor release.
Signed-off-by: Khosrow Moossavi <khos2ow@gmail.com>
In some environments running the keystore cert renewal (as root user)
over an already connected agent connection may cause exception
such as: `sudo: sorry, you must have a tty to run sudo`. Since, all
agents - KVM, CPVM and SSVM run as root user, we don't need to run
the renewal scripts with sudo.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent loses connection with management server, the reconnection
logic waits for any pending tasks to finish. However, when such tasks
do finish they fail to send an `Answer` back to managements server.
Therefore from a management server's perspective such pending
operations are stuck in a FSM state and need manual removal or fixing.
This is by design where management server's side cmd-answer request
pattern is code/execution dependent, therefore even if the answer
were to be sent when management server came back up (reconnects)
the management server will fail to acknowledge and process the answer
due to missing listeners or being in the exact state to handle answers.
Historically, the Agent would wait to reconnect until the internal
tasks complete but I found no reason why it should wait for reconnection
at all.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This extends securing of KVM hosts to securing of libvirt on KVM
host as well for TLS enabled live VM migration. To simplify implementation
securing of host implies that both host and libvirtd processes are
secured with management server's CA plugin issued certificates.
Based on whether keystore and certificates files are available at
/etc/cloudstack/agent, the KVM agent determines whether to use TLS or
TCP based uris for live VM migration. It is also enforced that a secured
host will allow live VM migration to/from other secured host, and an
unsecured hosts will allow live VM migration to/from other unsecured
host only.
Post upgrade the KVM agent on startup will expose its security state
(secured detail is sent as true or false) to the managements server that
gets saved in host_details for the host. This host detail can be accesed
via the listHosts response, and in the UI unsecured KVM hosts will show
up with the host state of ‘unsecured’. Further, a button has been added
that allows admins to provision/renew certificates to KVM hosts and can
be used to secure any unsecured KVM host.
The `cloudstack-setup-agent` was modified to accept a new flag `-s`
which will reconfigure libvirtd with following settings:
listen_tcp=0
listen_tls=1
tcp_port="16509"
tls_port="16514"
auth_tcp="none"
auth_tls="none"
key_file = "/etc/pki/libvirt/private/serverkey.pem"
cert_file = "/etc/pki/libvirt/servercert.pem"
ca_file = "/etc/pki/CA/cacert.pem"
For a connected KVM host agent, when the certificate are
renewed/provisioned a background task is scheduled that waits until all
of the agent tasks finish after which libvirt process is restarted and
finally the agent is restarted via AgentShell.
There are no API or DB changes.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Several fixes addressed:
- Dettach ISO fails when trying to detach a direct download ISO
- Fix for metalink support on SSVM agents (this closes CLOUDSTACK-10238)
- Reinstall VM from bypassed registered template (this closes CLOUDSTACK-10250)
- Fix upload certificate error message even though operation was successful
- Fix metalink download, checksum retry logic and metalink SSVM downloader
The new CA framework introduced basic support for comma-separated
list of management servers for agent, which makes an external LB
unnecessary.
This extends that feature to implement LB sorting algorithms that
sorts the management server list before they are sent to the agents.
This adds a central intelligence in the management server and adds
additional enhancements to Agent class to be algorithm aware and
have a background mechanism to check/fallback to preferred management
server (assumed as the first in the list). This is support for any
indirect agent such as the KVM, CPVM and SSVM agent, and would
provide support for management server host migration during upgrade
(when instead of in-place, new hosts are used to setup new mgmt server).
This FR introduces two new global settings:
- `indirect.agent.lb.algorithm`: The algorithm for the indirect agent LB.
- `indirect.agent.lb.check.interval`: The preferred host check interval
for the agent's background task that checks and switches to agent's
preferred host.
The indirect.agent.lb.algorithm supports following algorithm options:
- static: use the list as provided.
- roundrobin: evenly spreads hosts across management servers based on
host's id.
- shuffle: (pseudo) randomly sorts the list (not recommended for production).
Any changes to the global settings - `indirect.agent.lb.algorithm` and
`host` does not require restarting of the mangement server(s) and the
agents. A message bus based system dynamically reacts to change in these
global settings and propagates them to all connected agents.
Comma-separated management server list is propagated to agents on
following cases:
- Addition of a host (including ssvm, cpvm systevms).
- Connection or reconnection by the agents to a management server.
- After admin changes the 'host' and/or the
'indirect.agent.lb.algorithm' global settings.
On the agent side, the 'host' setting is saved in its properties file as:
`host=<comma separated addresses>@<algorithm name>`.
First the agent connects to the management server and sends its current
management server list, which is compared by the management server and
in case of failure a new/update list is sent for the agent to persist.
From the agent's perspective, the first address in the propagated list
will be considered the preferred host. A new background task can be
activated by configuring the `indirect.agent.lb.check.interval` which is
a cluster level global setting from CloudStack and admins can also
override this by configuring the 'host.lb.check.interval' in the
`agent.properties` file.
Every time agent gets a ms-host list and the algorithm, the host specific
background check interval is also sent and it dynamically reconfigures
the background task without need to restart agents.
Note: The 'static' and 'roundrobin' algorithms, strictly checks for the
order as expected by them, however, the 'shuffle' algorithm just checks
for content and not the order of the comma separate ms host addresses.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This would make keystore utility scripts being executed as sudoer
in case the process uid/owner is not root but still a sudoer user.
Also fails addHost while securing a KVM host and if keystore fails to be
setup for any reason.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Failure on HTTPS downloader for Direct Download templates on KVM.
Reason: Incorrect request caused NullPointerException getting the response InputStream