This does not remove VM entries in dbags when hostnames match. The
current codebase already removes entry when a VM is stopped/removed so
we don't need to handle lazy removal. This will allow a VM on
multiple-tiers in a VPC to get dns/dhcp rules as expected.
This also fixes the issue of dhcp_release based on a specific interface and
removes dhcp/dns entry when a nic is removed on a guest VM.
Fixes#3273
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The VR code has provision for inserting rules at the top or bottom by specifying "front" as the second parameter to self.fw.append. However, there are a number of cases where someone has been unaware of this and added a rule with the pattern self.fw.append(["mangle", "", "-I PREROUTING".... This causes the code to check for the rule already being present to fail, and duplicate rules end up being added.
This PR fixes two of these cases which apply to adding static NAT rules. I am aware of more of these cases, but I don't have the ability to easily test the outcome of fixing them. I'm happy to add these in if you're confident that the automated tests will be sufficient. Searching for "-I (case sensitive) finds these.
The code for dealing with "front" is included below to show that this shouldn't have any ill effects:
if fw[1] == "front":
cpy = cpy.replace('-A', '-I')
Fixes#3177
To make sure that a qemu2-image won't be corrupted by the snapshot deletion procedure which is being performed after copying the snapshot to a secondary store, I'd propose to put a VM in to suspended state.
Additional reference: https://bugzilla.redhat.com/show_bug.cgi?id=920020#c5Fixes#3193
If mtu= value is defined in the parameters received by the SSVM agent
per the secstorage.vm.mtu.size setting, it applies the MTU setting on
eth1 which is the storage/management nic.
Fixes#3369
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Improvements on upload direct download certificates
* Move upload direct download certificate logic to KVM plugin
* Extend unit test certificate expiration days
* Add marvin tests and command to revoke certificates
* Review comments
* Do not include revoke certificates API
Since the CloudStack virtual router was redesigned on version 4.6 it has been observed that the DHCP leases file is not persistent across network operations. This causes conflicts on guest VMs static IPs, causing these static IPs to not be renewed by the DHCP server running on isolated and VPC networks' virtual routers (dnsmasq). On stopping or destroying a VM, its dhcp/dns records are not removed from the virtual router causing ghost effects.
Fixes#3272Fixes#3354
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This PR resolves 2 issues related to Virtual Routers with multiple public interfaces, and works around a third.
- Fixes#3353 - Adds missing throw routes for eth0/eth1 to eth3+ when there are >1 public IPs
- Fixes#3168 - Incorrect marks set on some static NAT rules (some code references were changed from hex(int(interfacenum)) to hex(100 + int(interfacenum)) - this change just adds the remaining ones
- Fixes#3352 - Work around that sends Gratuitous ARP messages when a HA VR becomes master to work around the problem of the MAC address being different between HA VRs. If that issue is fixed properly (i.e. a database entry for the subsequent interfaces so they can be static) then this is unnecessary, though should not cause any problems.
when we dedicate public ip range to a domain but some ips are used by an account in the domain,
the operation should be allowed but actually fails for now.
It is because cloudstack check if ips are used by same account by account name,
However, accountName is null when dedicate public ip range to a domain.
Modify the code to check account id only when dedicate ip range to account.
In virtual routers, there are different dnsmasq settings for default nic and non-default nic on vm.
We need to update dhcp informations on network vrs when default nic is changed.
For example, if 172.16.1.135 is non-default nic of vm VPC1-001-001, then
root@r-22-VM:~# cat /etc/dhcphosts.txt
02:00:1d:15:00:05,set:172_16_1_135,172.16.1.135,VPC1-001-001,710h
root@r-22-VM:~# cat /etc/dhcpopts.txt
172_16_1_135,3
172_16_1_135,6
172_16_1_135,15
If it is default nic,then
root@r-22-VM:~# cat /etc/dhcpopts.txt
root@r-22-VM:~# cat /etc/dhcphosts.txt
02:00:1d:15:00:05,172.16.1.135,VPC1-001-001,757h
Fixes#3201
This fixes potential NPE case when memory hotpluggability is checked
based on the guest OS descriptor.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When enable static nat in a vpc on UI, it only lists the primary and secondary ips of first nic of a vm, no matter which vpc tier is selected. The same issue happens when add a vm to load balancer.
Fixes#3334
If there are multiple IPs in different subnet assigned to a VPC, after restarting VPC with cleanup, the VRs will be FAULT state.
Step to reproduce:
(1) create vpc, source nat IP is 10.11.118.X
(2) assign two public IPs in other subnet to this VPC. 10.11.119.X and 10.11.119.Y
(3) deploy two vms in the vpc, and enable static nat 10.11.119.X and 10.11.119.Y to these two vms
(4) restart vpc with cleanup. There are more than 1 nic allocated for 10.11.119 to new VRs
Logs as below:
2019-05-10 14:12:24,652 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.118.157-vlan://untagged
2019-05-10 14:12:24,676 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.119.110-vlan://119
2019-05-10 14:12:24,699 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.119.110-vlan://119
2019-05-10 14:12:24,723 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.119.110-vlan://119
This is a regression issue caused by commit 1d382e0
See #3339: a runtime exception is thrown but it should be converted to an error return. Wrapping it in a CloudRuntimeException should do the trick.
Fixes#3339
On first startup, the management server creates and saves a random
ssh keypair using ssh-keygen in the database. The command does
not specify keys in PEM format which is not the default as generated
by latest ssh-keygen tool.
The systemvmtemplate always needs re-building whenever there is a change
in the cloud-early-config file. This also tries to fix that by introducing a
stage 2 bootstrap.sh where the changes specific to hypervisor detection
etc are refactored/moved. The initial cloud-early-config only patches
before the other scripts are called.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Keep connection alive when on maintenance
* Refactor cancel maintenance and unit tests
* Add marvin tests
* Refactor
* Changing the way we get ssh credentials
* Add check on SSH restart and improve marvin tests
This introduces a new patching script for patching systemvms on KVM
using qemu-guest-agent that runs inside the systemvm on startup. This
also removes the vport device which was previously used by the legacy
patching script and instead uses the modern and new uniform guest
agent vport for host-guest communication.
Also updates the sytemvmtemplate build config to use the latest Debian
9.9.0 iso.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
With this patch b766bf7
we started tracking disks in attaching state so that other attach request can fail gracefully. However this missed the case where disks were in allocated state but attach was requested.
For the use case where users want to attach disk in allocated state but not ready, we need to have allocated-attaching transition as well. We must take care of returning to the original state - allocated or ready - when attach request has completed.
For the use case of unstarted vm's the disk must proceed as follows - "Allocated" -> Attaching -> Allocated. When VM is started, the disk is "created" and pool is assigned. For the use case of started VMs it's more trivial and disk proceeds as follows - Ready -> Attaching -> Ready.
Test this by creating a VM with "startvm=false", create a disk and try attaching it in allocated state. It would give an exception on latest 4.11 but will be fixed on this patch.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This may slow down CI and release, but ensures that unit tests always
run as part of the packaging build process.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue that TLSv1 and TLSv1.1 are still used by CloudStack
management server to communicate with VMware vCenter server. With the
current defaults, the setup/deployment on VMware fails. Users/admins
can however setup the security file according to their env needs to
disable TLSv1 and TLSv1.1 for server sockets (8250/agent service for
example).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Fixes intermittent failures:
- Add a pause to avoid tests restarting before VRs recovering from HA have fully booted.
- Add some pauses to allow services to restart and hosts to reconnect before continuing tests.
- Adding a loop around this method because sometimes VRs are overloaded and just respond with exception, so it'll catch it and try 5 times with 30sec cooldown. If that fails as well it'll fail the test.
- wait until host is up using explicit check
* systemd: fix services to allow TLS configurations via java.security.ciphers
This fixes the management server and systemd services to allow the
java.security.ciphers file to configure disabled TLS protocols and
algorithms. This also cleans up systemd service files for agent and
usage server.
This fixes#3140
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* configure: fix travis failure due pycodestyle error
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* The snapshot.backup.rightafter configuration variable was removed by:
SHA: 6bb0ca2f854
This adds it back, though named snapshot.backup.to.secondary now instead.
This global parameter, once set, will allow you to prevent automatic backups of
snapshots to secondary storage, unless they're actually needed.
Fixes#3096
* updates per review
This cleanups management server default file, the `cloud.jks` is no
longer created by the management server but instead created in-memory
by the root CA plugin on management server startup.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This ensures that the systemvm agent (cloud.service) is not restarted
on certificate import. The agent has an inbuilt logic to attempt reconnection.
If the old certificates/keystore is invalid agent will attempt reconnection.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The static method syncVolumeToRootFolder() from VmwareStorageLayoutHelper.java:146 has been incorrectly called and leads to an infinite recursive call that ends up in a StackOverflowError. This PR fixes this.
public static void syncVolumeToRootFolder(DatacenterMO dcMo, DatastoreMO ds, String vmdkName, String vmName) throws Exception { syncVolumeToRootFolder(dcMo, ds, vmdkName, null); } -> public static void syncVolumeToRootFolder(DatacenterMO dcMo, DatastoreMO ds, String vmdkName, String vmName) throws Exception { syncVolumeToRootFolder(dcMo, ds, vmdkName, vmName, null); }
A corner case was found on 4.11.2 for #2493 leading to an infinite loop in state PrepareForMaintenance
To prevent such cases, in which failed migrations are detected but still running on the host, this feature adds a new cluster setting host.maintenance.retries which is the number of retries before marking the host as ErrorInMaintenance if migration errors persist.
How Has This Been Tested?
- 2 KVM hosts, pick one which has running VMs as H
- Block migrations ports on H to simulate failures on migrations:
iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 49152:49215 -m comment --comment 'test block migrations' iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 16509 -m comment --comment 'test block migrations
- Put host H in Maintenance
- Observe that host is indefinitely in PrepareForMaintenance state (after this fix it goes into ErrorInMaintenance after retrying host.maintenance.retries times)
Users reported that they weren't getting all apis listed in cloudmonkey when running a sync. After some debugging, I found that the problem is that the ApiDiscoveryService is calling ApiRateLimitServiceImpl.checkAccess(), so the results of the listApis command are being truncated because Cloudstack believes the user has exceeded their API throttling rate.
I enabled throttling with a 25 request per second limit. I then created a test role with only list* permissions and assigned it to a test user. When this user calls listApis, they will typically receive anywhere from 15-18 results. Checking the logs, you see The given user has reached his/her account api limit, please retry after 218 ms..
I raised the limit to 200 requests per second, restarted the management server and tried again. This time I got 143 results and no log messages about the user being throttled.
* travis: fail fast if --with-marvin fails with nose
Install missing dependency pycrypto.
This fixes issue with recent Travis runs which gave incorrect results
around smoketests with simulator where each test run failed with an
error like "nosetests: error: no such option: --with-marvin".
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Install CentOS 7 e.g. Build 1804 and Java build 1.8.0_181
if you inspect systemd in debug mode you will see some errors
1.
permission of the cloudstack-managment.service are not corretly set
2.
invalid classpath specified. it seems the string which is used will be divided... we now we use ${..} like the lines above ... confused
When vxlan://untagged is used for public (or guest) network, use the
default public/guest bridge device same as how vlan://untagged works.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Currently secondary ipv6 addresses are added to the ipv4 ipset in security_group.py.
This doesn't work, so this patch adds a function to split a set of ips in ipv4 and ipv6 addresses.
Both the default_network_rules and network_rules_vmSecondaryIp functions now utilise this function and add the ips to the appropriate ipsets.
Doing a ignored RC5 tree merge to get 4.11.2.0 tag on 4.11 branch. This
should have been done before merging the commit to move to
4.11.3.0-SNAPSHOT version on 4.11 branch.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Due to issue described in Surefix bug:
https://issues.apache.org/jira/browse/SUREFIRE-1588
Debian-based users/developers can no longer build CloudStack 4.11+
branches. The other workaround is to have the following jvm property:
jdk.net.URLClassPath.disableClassPathURLCheck=true
Signed-off-by: Rohit Yadav <rohit@apache.org>