16 Commits

Author SHA1 Message Date
Gabriel Beims Bräscher
b4db3db617
Use default timeout and retransmission values for the NFS mount. (#6019)
This also allows the mount command to apply NFS mount custom values set by ADMINS via '/etc/nfsmount.conf'.
2022-03-02 09:07:08 -03:00
Daniel Augusto Veronezi Salvador
82df04ecc8
Improve HA logs (#5241)
Co-authored-by: GutoVeronezi <daniel@scclouds.com.br>
2021-07-30 21:13:16 +02:00
Rohit Yadav
c6e53f6cc6
kvm: reset KVM host on heartbeat failure (#2984)
On actual testing, I could see that kvmheartbeat.sh script fails on NFS
server failure and stops the agent only. Any HA VMs could be launched
in different hosts, and recovery of NFS server could lead to a state
where a HA enabled VM runs on two hosts and can potentially cause
disk corruptions. In most cases, VM disk corruption will be worse than
VM downtime. I've kept the sleep interval between check/rounds but
reduced it to 10s. The change in behaviour was introduced in #2722.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-30 15:13:59 +05:30
Slair1
023dcec5ef CLOUDSTACK-10310 Fix KVM reboot on storage issue (#2722) 2018-08-20 10:28:03 +02:00
Remi Bergsma
7bce656b40 make sure sync cannot block reboot
The recent discussed improvement has the risk that if 'sync' hangs, the reboot may be delayed in the same way as the 'reboot' command would do. To work around, we're adding a 5 second timeout. If it cannot sync in 5 seconds, it will not succeed anyway and we should proceed the reset.

@snuf: Could we use your OVM3 heartbeat script for other hypervisors as well? One way to do it seems like a nice idea :-)
2015-04-09 12:18:21 +02:00
Remi Bergsma
c59308b0ee write logfile just before rebooting the host
As discussed with @wido @pyr and @nuxro added an extra log line.

Tested it and it logs fine (tested to local disk) when syncing first:
Apr  3 15:31:23 mcctest2 heartbeat: kvmheartbeat.sh system because it was unable to write the heartbeat to the storage

By the way, it did also log to the agent.log but this extra log has the benefit of ending up in the system log so you'll probably find it easier there. Existing logs:
2015-04-03 15:27:23,943 WARN  [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 0
2015-04-03 15:28:23,944 WARN  [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 1
2015-04-03 15:29:23,946 WARN  [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 2
2015-04-03 15:30:23,948 WARN  [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 3
2015-04-03 15:31:23,950 WARN  [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 4
2015-04-03 15:31:23,950 WARN  [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout; reboot the host

This closes #145

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2015-04-04 14:17:37 +05:30
Remi Bergsma
2b41f98346 reboot much faster in case of storage failure
When storage cannot be reached, it does not make sense to reboot as it will try to flush buffers, umount NFS mounts, etc. This will not work and thus cause a long delay. With this change, the box will reboot immediately (like pressing the reset button).
2015-04-01 19:45:16 +02:00
Edison Su
f497c7c031 Bug: HA takes a lot of time to migrate VMs (trigger HA) to another KVM
host if there are multiple storage pools in a cluster.

The issue is as follows:
1. When CloudStack detects that a host is not responding to ping
requests it'll send a fence command for this host to another host in the
cluster.
2. The agent takes a long time to respond to this check if the storage
is fenced. This is because the agent checks if the first host is writing
to its heartbeat file on all pools in the cluster. It is doing this in a
sequential manner on all storage pool.

Making a fix to get rid of sleep, wait during HA. The behavior is now
similar to Xenserver.

RB: https://reviews.apache.org/r/6133/
Send-by:devdeep.singh@citrix.com
2012-07-25 10:17:09 -07:00
David Nalley
d630fa8697 license header changes for scripts folder from Chip Childers 2012-06-23 00:58:00 -04:00
frank
2f634c0913 Switch to Apache license 2012-04-03 04:50:05 -07:00
frank
52610ffcb3 add copyright header to shell scripts 2012-01-11 18:41:53 -08:00
Edison Su
47380dc20e fix add host 2011-05-12 15:03:15 -04:00
Edison Su
d8ee7d9fc3 if storage network disconnected, reboot the host 2011-04-14 17:46:54 -04:00
Frank
92155522f2 Add license header to files 2011-04-14 11:23:14 -07:00
edison
007783f6cf add more logs when taking heartbeat, and make ha enabled even in oss 2010-11-10 09:49:03 -08:00
edison
4bc63e5c32 Enable KVM HA on nfs storage 2010-11-09 22:03:22 -08:00