This may slow down CI and release, but ensures that unit tests always
run as part of the packaging build process.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes the issue that TLSv1 and TLSv1.1 are still used by CloudStack
management server to communicate with VMware vCenter server. With the
current defaults, the setup/deployment on VMware fails. Users/admins
can however setup the security file according to their env needs to
disable TLSv1 and TLSv1.1 for server sockets (8250/agent service for
example).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Fixes intermittent failures:
- Add a pause to avoid tests restarting before VRs recovering from HA have fully booted.
- Add some pauses to allow services to restart and hosts to reconnect before continuing tests.
- Adding a loop around this method because sometimes VRs are overloaded and just respond with exception, so it'll catch it and try 5 times with 30sec cooldown. If that fails as well it'll fail the test.
- wait until host is up using explicit check
* systemd: fix services to allow TLS configurations via java.security.ciphers
This fixes the management server and systemd services to allow the
java.security.ciphers file to configure disabled TLS protocols and
algorithms. This also cleans up systemd service files for agent and
usage server.
This fixes#3140
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* configure: fix travis failure due pycodestyle error
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* The snapshot.backup.rightafter configuration variable was removed by:
SHA: 6bb0ca2f854
This adds it back, though named snapshot.backup.to.secondary now instead.
This global parameter, once set, will allow you to prevent automatic backups of
snapshots to secondary storage, unless they're actually needed.
Fixes#3096
* updates per review
This cleanups management server default file, the `cloud.jks` is no
longer created by the management server but instead created in-memory
by the root CA plugin on management server startup.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This ensures that the systemvm agent (cloud.service) is not restarted
on certificate import. The agent has an inbuilt logic to attempt reconnection.
If the old certificates/keystore is invalid agent will attempt reconnection.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
The static method syncVolumeToRootFolder() from VmwareStorageLayoutHelper.java:146 has been incorrectly called and leads to an infinite recursive call that ends up in a StackOverflowError. This PR fixes this.
public static void syncVolumeToRootFolder(DatacenterMO dcMo, DatastoreMO ds, String vmdkName, String vmName) throws Exception { syncVolumeToRootFolder(dcMo, ds, vmdkName, null); } -> public static void syncVolumeToRootFolder(DatacenterMO dcMo, DatastoreMO ds, String vmdkName, String vmName) throws Exception { syncVolumeToRootFolder(dcMo, ds, vmdkName, vmName, null); }
A corner case was found on 4.11.2 for #2493 leading to an infinite loop in state PrepareForMaintenance
To prevent such cases, in which failed migrations are detected but still running on the host, this feature adds a new cluster setting host.maintenance.retries which is the number of retries before marking the host as ErrorInMaintenance if migration errors persist.
How Has This Been Tested?
- 2 KVM hosts, pick one which has running VMs as H
- Block migrations ports on H to simulate failures on migrations:
iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 49152:49215 -m comment --comment 'test block migrations' iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 16509 -m comment --comment 'test block migrations
- Put host H in Maintenance
- Observe that host is indefinitely in PrepareForMaintenance state (after this fix it goes into ErrorInMaintenance after retrying host.maintenance.retries times)
Users reported that they weren't getting all apis listed in cloudmonkey when running a sync. After some debugging, I found that the problem is that the ApiDiscoveryService is calling ApiRateLimitServiceImpl.checkAccess(), so the results of the listApis command are being truncated because Cloudstack believes the user has exceeded their API throttling rate.
I enabled throttling with a 25 request per second limit. I then created a test role with only list* permissions and assigned it to a test user. When this user calls listApis, they will typically receive anywhere from 15-18 results. Checking the logs, you see The given user has reached his/her account api limit, please retry after 218 ms..
I raised the limit to 200 requests per second, restarted the management server and tried again. This time I got 143 results and no log messages about the user being throttled.
* travis: fail fast if --with-marvin fails with nose
Install missing dependency pycrypto.
This fixes issue with recent Travis runs which gave incorrect results
around smoketests with simulator where each test run failed with an
error like "nosetests: error: no such option: --with-marvin".
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This increases and uses a default 15mins timeout for VR scripts and for
KVM agent increases timeout from 60s to 5mins. The timeout can
specifically occur when keystore does not get enough entropy from CPU
and script gets killed due to timeout. This is a very specific corner
case and generally should not happen on baremetal/prod environment, but
sometimes seen in nested/test environments.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Install CentOS 7 e.g. Build 1804 and Java build 1.8.0_181
if you inspect systemd in debug mode you will see some errors
1.
permission of the cloudstack-managment.service are not corretly set
2.
invalid classpath specified. it seems the string which is used will be divided... we now we use ${..} like the lines above ... confused
When vxlan://untagged is used for public (or guest) network, use the
default public/guest bridge device same as how vlan://untagged works.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Currently secondary ipv6 addresses are added to the ipv4 ipset in security_group.py.
This doesn't work, so this patch adds a function to split a set of ips in ipv4 and ipv6 addresses.
Both the default_network_rules and network_rules_vmSecondaryIp functions now utilise this function and add the ips to the appropriate ipsets.
Doing a ignored RC5 tree merge to get 4.11.2.0 tag on 4.11 branch. This
should have been done before merging the commit to move to
4.11.3.0-SNAPSHOT version on 4.11 branch.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Due to issue described in Surefix bug:
https://issues.apache.org/jira/browse/SUREFIRE-1588
Debian-based users/developers can no longer build CloudStack 4.11+
branches. The other workaround is to have the following jvm property:
jdk.net.URLClassPath.disableClassPathURLCheck=true
Signed-off-by: Rohit Yadav <rohit@apache.org>
The view "service_offering_view" doesn't include removed SOs, as a result when SO is removed, the bug happens. The PR introduces a change for resource calculation changing "service_offering_view" to "service_offering" table which has all service offerings.
Must be fixed in:
4.12
4.11
Fixes: #3009
After upgrade existing environments to 4.11, ConfigDrive cannot be enabled for existing zones due to missing entry on 'physical_network_service_providers' table.
On actual testing, I could see that kvmheartbeat.sh script fails on NFS
server failure and stops the agent only. Any HA VMs could be launched
in different hosts, and recovery of NFS server could lead to a state
where a HA enabled VM runs on two hosts and can potentially cause
disk corruptions. In most cases, VM disk corruption will be worse than
VM downtime. I've kept the sleep interval between check/rounds but
reduced it to 10s. The change in behaviour was introduced in #2722.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Committed at f0491d5c72c3161777ca49ae809606a6704df5ff (#2979).
After upgrade from CS 4.10 to CS 4.11, multiple VRs did not start through.
It did not properly defer the finalize config in update_config.py.
Apparently, the json files are now called differently: where it used to
be vm_dhcp_entry.json it now has a uuid added, for example
vm_metadata.json.4d727b6e-2b48-49df-81c3-b8532f3d6745.
The if statement that checks if the finalize can be safely deferred
therefore no longer matches. This PR contains a fix so finalize is
defered again.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Prevents errors while migrating VM from ISO:
Test 1: Deploy VM from ISO -> Live migrate VM to another host -> ERROR
Test 2: Register ISO using Direct Download on KVM -> Deploy VM from ISO -> Live migrate VM to another host -> ERROR
- Prevent NullPointerException migrating VM from ISO
- Prevent mount secondary storage on ISO direct downloads on KVM
After upgrade from CS 4.10 to CS 4.11, multiple VRs did not start through.
It did not properly defer the finalize config in update_config.py.
Apparently, the json files are now called differently: where it used to
be vm_dhcp_entry.json it now has a uuid added, for example
vm_metadata.json.4d727b6e-2b48-49df-81c3-b8532f3d6745.
The if statement that checks if the finalize can be safely deferred
therefore no longer matches. This PR contains a fix so finalize is
defered again.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Improved performance on creating VM for KVM virtualization.
On a huge hosts every "ifconfig | grep" takes a lot of time (about 2.5-3 minutes on hosts with 500 machines). For example: ip link show dev $vlanDev > /dev/null is faster than ifconfig |grep -w $vlanDev > /dev/null. But using ip command is much better. Using this patch you can create 500s machine in 10 seconds. You don't need slow ifconfig prints anymore.
This force stops old VRs when performing rolling restart with
cleanup=true. This will ensure that VRs are powered off quickly than
wait longer for the normal ACPI shutdown. During testing, it was found
on VMware where VM stops are slow compared to XenServer and KVM.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Install any additional plugin jars in the lib directory to be picked up
by the classpath builder, otherwise one has to manually add the jar
to /etc/default/cloudstack-management after installation. This fixes
the issue for `mysql-ha` plugin.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This fixes an issue for systemvms (CPVM and SSVM) on VMware, as eth0
is not programmed (link-local) the networking.service fails to start
which is a dependency for cloud-postinit service. When cloud-postinit
service fails to start/run, it fails to start the agent (cloud) process.
This fixes the smoketest failures we saw in case of VMware 6.5 with
4.11.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Unify checksum API output for templates and ISOs: not list the checksum algorithm on:
KVM direct downloads
On in progress normal template downloads. The algorithm is shown on the listtemplates API, but after it is downloaded it is not shown anymore.
This adds a global setting for admins who may not want the rolling
restart of routers or are seeing any issues around it. In future, this
setting may be removed.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When agent is stopped, don't allow reconnection. Previously this would
send a shutdown command to the management server which would put the
host state to Disconnected but then agent's reconnection logic may kick
in sometimes which would connect the agent to the management server
but then the agent process would terminate causing the host to be
put in Alert state (due to ping timeout or it waiting too long).
This fixes the issue by ensuring that when the agent is stopped, it
does not reconnect to the management server.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>