When scripts/vm/hypervisor/kvm/kvmvmactivity.sh is called with an incorrect file name, an error is printed which is then interpreted as output from the script.
When an incorrect file name is passed the script prints out:
stat: cannot stat ‘b51d7336-d964-44ee-be60-bf62783dabc’: No such file or directory
=====> DEAD <======
The KVMHAVMActivityChecker.java checkingHB() process is expecting just
=====> DEAD <======
but gets the unexpected error message and interprets the file as alive.
This introduces a new patching script for patching systemvms on KVM
using qemu-guest-agent that runs inside the systemvm on startup. This
also removes the vport device which was previously used by the legacy
patching script and instead uses the modern and new uniform guest
agent vport for host-guest communication.
Also updates the sytemvmtemplate build config to use the latest Debian
9.9.0 iso.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
On actual testing, I could see that kvmheartbeat.sh script fails on NFS
server failure and stops the agent only. Any HA VMs could be launched
in different hosts, and recovery of NFS server could lead to a state
where a HA enabled VM runs on two hosts and can potentially cause
disk corruptions. In most cases, VM disk corruption will be worse than
VM downtime. I've kept the sleep interval between check/rounds but
reduced it to 10s. The change in behaviour was introduced in #2722.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes regression introduced in PR #2295:
- Pass assign=true to fetch new public IP
- Use wait_until instead of sleep+wait in tests
- Loop through list of public IP ranges to match the systemvm gateway
- Fix potential NPE seen when adding simulator host(s)
- Removes aria2 installation from setup_agent.sh using yum, it's already
dependency for cloudstack-agent package
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows using templates and ISOs avoiding secondary storage as intermediate cache on KVM. The virtual machine deployment process is enhanced to supported bypassed registered templates and ISOs, delegating the work of downloading them to primary storage to the KVM agent instead of the SSVM agent.
Template and ISO registration:
- When hypervisor is KVM, a checkbox is displayed with 'Direct Download' label.
- API methods registerTemplate and registerISO are both extended with this new parameter directdownload.
- On template or ISO registration, no download job is sent to SSVM agent, CloudStack would only persist an entry on template_store_ref indicating that template or ISO has been marked as 'Direct Download' (bypassing Secondary Storage). These entries are persisted as:
template_id = Template or ISO id on vm_template table
store_id NULL
download_state = BYPASSED
state = Ready
(Note: these entries allow users to deploy virtual machine from registered templates or ISOs)
- An URL validation command is sent to a random KVM host to check if template/ISO location can be reached. Metalink are also supported by this feature. In case of a metalink, it is fetched and URL check is performed on each of its URLs.
- Checksum should be provided as indicated on #2246: {ALGORITHM}CHKSUMHASH
- After template or ISO is registered, it would be displayed in the UI
Virtual machine deployment:
When a 'Direct Download' template is selected for deployment, CloudStack would delegate template downloading to destination storage pool via destination host by a new pluggable download manager.
Download manager would handle template downloading depending on URL protocol. In case of HTTP, request headers can be set by the user via vm_template_details. Those details should be persisted as:
Key: HTTP_HEADER
Value: HEADERNAME:HEADERVALUE
In case of HTTPS, a new API method is added uploadTemplateDirectDownloadCertificate to allow user importing a client certificate into all KVM hosts' keystore before deployment.
After template or ISO is downloaded to primary storage, usual entry would be persisted on template_spool_ref indicating the mapping between template/ISO and storage pool.
- All tests should pass on KVM, Simulator
- Add test cases covering FSM state transitions and actions
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the agreed upon url on download.cloudstack.org in various
sql files and misc scripts.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
more detailed error if host file not found or cannot be opened
using mkstemp and mkdtemp for improved security
improve resource cleanup in error conditions in unit test
As requested here: https://github.com/apache/cloudstack/pull/1495
No scripts are using perl so that install requirement can be removed.
The new scripts are using standard python packages only.
Includes extensive unit test.
This setting works on CentOS 6 / RHEL 6 but does nothing, as
"cpu" cgroup is not mounted. On CentOS 7 / RHEL 7 systemd does
mount cgroups and "cpu" is co-mounted with "cpuacc". Hence, if
we specify "cpu" then this results in an error because it can
only use them both, or none.
By removing the setting, we rely on the default of qemu, which
is:
cgroup_controllers = ["cpu", "devices", "memory", "blkio", "cpuacct", "net_cls"]
Only if they are really mounted, they will be used. So, this will
work on both version 6 and 7.
The 'fix script' didn't work well, as after a reboot you'd still have qemu
throwing errors. Now we can handle the co-mountedcgroups.
The recent discussed improvement has the risk that if 'sync' hangs, the reboot may be delayed in the same way as the 'reboot' command would do. To work around, we're adding a 5 second timeout. If it cannot sync in 5 seconds, it will not succeed anyway and we should proceed the reset.
@snuf: Could we use your OVM3 heartbeat script for other hypervisors as well? One way to do it seems like a nice idea :-)
As discussed with @wido @pyr and @nuxro added an extra log line.
Tested it and it logs fine (tested to local disk) when syncing first:
Apr 3 15:31:23 mcctest2 heartbeat: kvmheartbeat.sh system because it was unable to write the heartbeat to the storage
By the way, it did also log to the agent.log but this extra log has the benefit of ending up in the system log so you'll probably find it easier there. Existing logs:
2015-04-03 15:27:23,943 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 0
2015-04-03 15:28:23,944 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 1
2015-04-03 15:29:23,946 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 2
2015-04-03 15:30:23,948 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 3
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 4
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout; reboot the host
This closes#145
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When storage cannot be reached, it does not make sense to reboot as it will try to flush buffers, umount NFS mounts, etc. This will not work and thus cause a long delay. With this change, the box will reboot immediately (like pressing the reset button).
Detail: This gets rid of the patchdisk method of passing cmdline and
authorized_keys to KVM system VMs. It instead passes them to a virtio socket,
which the KVM guest reads from the character device /dev/vport0p1 during
cloud-early-config. Tested to work on CentOS 6.3 and Ubuntu 12.04. Should
work with even older versions of libvirt.
Signed-off-by: Marcus Sorensen <marcus@betterservers.com> 1362691685 -0700
host if there are multiple storage pools in a cluster.
The issue is as follows:
1. When CloudStack detects that a host is not responding to ping
requests it'll send a fence command for this host to another host in the
cluster.
2. The agent takes a long time to respond to this check if the storage
is fenced. This is because the agent checks if the first host is writing
to its heartbeat file on all pools in the cluster. It is doing this in a
sequential manner on all storage pool.
Making a fix to get rid of sleep, wait during HA. The behavior is now
similar to Xenserver.
RB: https://reviews.apache.org/r/6133/
Send-by:devdeep.singh@citrix.com