This feature adds the ability to create a new instance from a VM backup for dummy, NAS and Veeam backup providers. It works even if the original instance used to create the backup was expunged or unmanaged. There are two parts to this functionality:
Saving all configuration details that the VM had at the time of taking the backup. And using them to create an instance from backup.
Enabling a user to expunge/unmanage an instance that has backups.
This PR allows attaching of GPU devices via PCI, mdev or VF to an Instance for KVM.
It allows the operator to discover the GPU devices on the KVM host and create a Compute Offering with GPU support based on the available GPU devices on the host. Once the operator has created the Compute offering, it can be used by users to launch Instances with GPU devices.
* NAS B&R Plugin enhancements
* Prevent printing mount opts which may include password by removing from response
* revert marvin change
* add sanity checks to validate minimum qemu and libvirt versions
* check is user running script is part of libvirt group
* revert changes of retore expunged VM
* add code coverage ignore file
* remove check
* issue with listing schedules and add defensive checks
* redirect logs to agent log file
* add some more debugging
* remove test file
* prevent deletion of cks cluster when vms associated to a backup offering
* delete all snapshot policies when bkp offering is disassociated from a VM
* Fix `updateTemplatePermission` when the UI is set to a language other than English (#9766)
* Fix updateTemplatePermission UI in non-english language
* Improve fix
---------
* Add nobrl in the mountopts for cifs file system
* Fix restoration of VM / volumes with cifs
* add cifs utils for el8
* add cifs-utils for ubuntu cloudstack-agent
* syntax error
* remove required constraint on both vmid and id params for the delete bkp schedule command
This is a simple NAS backup plugin for KVM which may be later expanded for other hypervisors. This backup plugin aims to use shared NAS storage on KVM hosts such as NFS (or CephFS and others in future), which is used to backup fully cloned VMs for backup & restore operations. This may NOT be as efficient and performant as some of the other B&R providers, but maybe useful for some KVM environments who are okay to only have full-instance backups and limited functionality.
Design & Implementation follows the `networker` B&R plugin, which is simply:
- Implement B&R plugin interfaces
- Use cmd-answer pattern to execute backup and restore operations on KVM host when VM is running (or needs to be restored) - instead of a B&R API client, relies on answers from KVM agent which executes the operations
- Backups are full VM domain snapshots, copied to a VM-specific folders on a NAS target (NFS) along with a domain XML
- Backup uses libvirt feature: https://libvirt.org/kbase/live_full_disk_backup.html orchestrated via virsh/bash script (nasbackup.sh) as the libvirt-java lacks the bindings
- Supported instance volume storage for restore operations: NFS & local storage
Refer the doc PR for feature limitations and usage details:
https://github.com/apache/cloudstack-documentation/pull/429
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
Extending the current functionality of KVM Host HA for the StorPool storage plugin and the option for easy integration for the rest of the storage plugins to support Host HA
This extension works like the current NFS storage implementation. It allows it to be used simultaneously with NFS and StorPool storage or only with StorPool primary storage.
If it is used with different primary storages like NFS and StorPool, and one of the health checks fails for storage, there is an option to report the failure to the management with the global config kvm.ha.fence.on.storage.heartbeat.failure. By default this option is disabled when enabled the Host HA service will continue with the checks on the host and eventually will fence the host
When scripts/vm/hypervisor/kvm/kvmvmactivity.sh is called with an incorrect file name, an error is printed which is then interpreted as output from the script.
When an incorrect file name is passed the script prints out:
stat: cannot stat ‘b51d7336-d964-44ee-be60-bf62783dabc’: No such file or directory
=====> DEAD <======
The KVMHAVMActivityChecker.java checkingHB() process is expecting just
=====> DEAD <======
but gets the unexpected error message and interprets the file as alive.
This introduces a new patching script for patching systemvms on KVM
using qemu-guest-agent that runs inside the systemvm on startup. This
also removes the vport device which was previously used by the legacy
patching script and instead uses the modern and new uniform guest
agent vport for host-guest communication.
Also updates the sytemvmtemplate build config to use the latest Debian
9.9.0 iso.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
On actual testing, I could see that kvmheartbeat.sh script fails on NFS
server failure and stops the agent only. Any HA VMs could be launched
in different hosts, and recovery of NFS server could lead to a state
where a HA enabled VM runs on two hosts and can potentially cause
disk corruptions. In most cases, VM disk corruption will be worse than
VM downtime. I've kept the sleep interval between check/rounds but
reduced it to 10s. The change in behaviour was introduced in #2722.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes regression introduced in PR #2295:
- Pass assign=true to fetch new public IP
- Use wait_until instead of sleep+wait in tests
- Loop through list of public IP ranges to match the systemvm gateway
- Fix potential NPE seen when adding simulator host(s)
- Removes aria2 installation from setup_agent.sh using yum, it's already
dependency for cloudstack-agent package
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This feature allows using templates and ISOs avoiding secondary storage as intermediate cache on KVM. The virtual machine deployment process is enhanced to supported bypassed registered templates and ISOs, delegating the work of downloading them to primary storage to the KVM agent instead of the SSVM agent.
Template and ISO registration:
- When hypervisor is KVM, a checkbox is displayed with 'Direct Download' label.
- API methods registerTemplate and registerISO are both extended with this new parameter directdownload.
- On template or ISO registration, no download job is sent to SSVM agent, CloudStack would only persist an entry on template_store_ref indicating that template or ISO has been marked as 'Direct Download' (bypassing Secondary Storage). These entries are persisted as:
template_id = Template or ISO id on vm_template table
store_id NULL
download_state = BYPASSED
state = Ready
(Note: these entries allow users to deploy virtual machine from registered templates or ISOs)
- An URL validation command is sent to a random KVM host to check if template/ISO location can be reached. Metalink are also supported by this feature. In case of a metalink, it is fetched and URL check is performed on each of its URLs.
- Checksum should be provided as indicated on #2246: {ALGORITHM}CHKSUMHASH
- After template or ISO is registered, it would be displayed in the UI
Virtual machine deployment:
When a 'Direct Download' template is selected for deployment, CloudStack would delegate template downloading to destination storage pool via destination host by a new pluggable download manager.
Download manager would handle template downloading depending on URL protocol. In case of HTTP, request headers can be set by the user via vm_template_details. Those details should be persisted as:
Key: HTTP_HEADER
Value: HEADERNAME:HEADERVALUE
In case of HTTPS, a new API method is added uploadTemplateDirectDownloadCertificate to allow user importing a client certificate into all KVM hosts' keystore before deployment.
After template or ISO is downloaded to primary storage, usual entry would be persisted on template_spool_ref indicating the mapping between template/ISO and storage pool.
- All tests should pass on KVM, Simulator
- Add test cases covering FSM state transitions and actions
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the agreed upon url on download.cloudstack.org in various
sql files and misc scripts.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
more detailed error if host file not found or cannot be opened
using mkstemp and mkdtemp for improved security
improve resource cleanup in error conditions in unit test
As requested here: https://github.com/apache/cloudstack/pull/1495
No scripts are using perl so that install requirement can be removed.
The new scripts are using standard python packages only.
Includes extensive unit test.
This setting works on CentOS 6 / RHEL 6 but does nothing, as
"cpu" cgroup is not mounted. On CentOS 7 / RHEL 7 systemd does
mount cgroups and "cpu" is co-mounted with "cpuacc". Hence, if
we specify "cpu" then this results in an error because it can
only use them both, or none.
By removing the setting, we rely on the default of qemu, which
is:
cgroup_controllers = ["cpu", "devices", "memory", "blkio", "cpuacct", "net_cls"]
Only if they are really mounted, they will be used. So, this will
work on both version 6 and 7.
The 'fix script' didn't work well, as after a reboot you'd still have qemu
throwing errors. Now we can handle the co-mountedcgroups.
The recent discussed improvement has the risk that if 'sync' hangs, the reboot may be delayed in the same way as the 'reboot' command would do. To work around, we're adding a 5 second timeout. If it cannot sync in 5 seconds, it will not succeed anyway and we should proceed the reset.
@snuf: Could we use your OVM3 heartbeat script for other hypervisors as well? One way to do it seems like a nice idea :-)
As discussed with @wido @pyr and @nuxro added an extra log line.
Tested it and it logs fine (tested to local disk) when syncing first:
Apr 3 15:31:23 mcctest2 heartbeat: kvmheartbeat.sh system because it was unable to write the heartbeat to the storage
By the way, it did also log to the agent.log but this extra log has the benefit of ending up in the system log so you'll probably find it easier there. Existing logs:
2015-04-03 15:27:23,943 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 0
2015-04-03 15:28:23,944 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 1
2015-04-03 15:29:23,946 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 2
2015-04-03 15:30:23,948 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 3
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout, retry: 4
2015-04-03 15:31:23,950 WARN [kvm.resource.KVMHAMonitor] (Thread-24:null) write heartbeat failed: timeout; reboot the host
This closes#145
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When storage cannot be reached, it does not make sense to reboot as it will try to flush buffers, umount NFS mounts, etc. This will not work and thus cause a long delay. With this change, the box will reboot immediately (like pressing the reset button).
Detail: This gets rid of the patchdisk method of passing cmdline and
authorized_keys to KVM system VMs. It instead passes them to a virtio socket,
which the KVM guest reads from the character device /dev/vport0p1 during
cloud-early-config. Tested to work on CentOS 6.3 and Ubuntu 12.04. Should
work with even older versions of libvirt.
Signed-off-by: Marcus Sorensen <marcus@betterservers.com> 1362691685 -0700
host if there are multiple storage pools in a cluster.
The issue is as follows:
1. When CloudStack detects that a host is not responding to ping
requests it'll send a fence command for this host to another host in the
cluster.
2. The agent takes a long time to respond to this check if the storage
is fenced. This is because the agent checks if the first host is writing
to its heartbeat file on all pools in the cluster. It is doing this in a
sequential manner on all storage pool.
Making a fix to get rid of sleep, wait during HA. The behavior is now
similar to Xenserver.
RB: https://reviews.apache.org/r/6133/
Send-by:devdeep.singh@citrix.com