10218 Commits

Author SHA1 Message Date
Nicolas Vazquez
c9ce3e2344 router: Persistent DHCP leases file on VRs and cleanup /etc/hosts on VM deletion (#3351)
Since the CloudStack virtual router was redesigned on version 4.6 it has been observed that the DHCP leases file is not persistent across network operations. This causes conflicts on guest VMs static IPs, causing these static IPs to not be renewed by the DHCP server running on isolated and VPC networks' virtual routers (dnsmasq). On stopping or destroying a VM, its dhcp/dns records are not removed from the virtual router causing ghost effects.

Fixes #3272
Fixes #3354

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-06-03 17:04:16 +05:30
ustcweizhou
b8522c97cb server: allow dedicate ip range to a domain if ips are used by an accout in the domain (#3206)
when we dedicate public ip range to a domain but some ips are used by an account in the domain,
the operation should be allowed but actually fails for now.
It is because cloudstack check if ips are used by same account by account name,
However, accountName is null when dedicate public ip range to a domain.

Modify the code to check account id only when dedicate ip range to account.
2019-05-31 12:24:33 +05:30
ustcweizhou
bd78030385 server: update dhcp configurations in vrs while update default nic of running vms (#3205)
In virtual routers, there are different dnsmasq settings for default nic and non-default nic on vm.
We need to update dhcp informations on network vrs when default nic is changed.

For example, if 172.16.1.135 is non-default nic of vm VPC1-001-001, then

root@r-22-VM:~# cat /etc/dhcphosts.txt
02:00:1d:15:00:05,set:172_16_1_135,172.16.1.135,VPC1-001-001,710h
root@r-22-VM:~# cat /etc/dhcpopts.txt
172_16_1_135,3
172_16_1_135,6
172_16_1_135,15

If it is default nic,then

root@r-22-VM:~# cat /etc/dhcpopts.txt
root@r-22-VM:~# cat /etc/dhcphosts.txt
02:00:1d:15:00:05,172.16.1.135,VPC1-001-001,757h

Fixes #3201
2019-05-31 12:23:55 +05:30
ustcweizhou
8e43d258f3 server: Fail to restart VPC with cleanup if there are multiple public IPs in different subnet" (#3342)
If there are multiple IPs in different subnet assigned to a VPC, after restarting VPC with cleanup, the VRs will be FAULT state.

Step to reproduce:
(1) create vpc, source nat IP is 10.11.118.X
(2) assign two public IPs in other subnet to this VPC. 10.11.119.X and 10.11.119.Y
(3) deploy two vms in the vpc, and enable static nat 10.11.119.X and 10.11.119.Y to these two vms
(4) restart vpc with cleanup. There are more than 1 nic allocated for 10.11.119 to new VRs

Logs as below:
2019-05-10 14:12:24,652 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.118.157-vlan://untagged
2019-05-10 14:12:24,676 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.119.110-vlan://119
2019-05-10 14:12:24,699 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.119.110-vlan://119
2019-05-10 14:12:24,723 DEBUG [o.a.c.e.o.NetworkOrchestrator] (API-Job-Executor-36:ctx-839f6522 job-652 ctx-35fb4667) (logid:1ab7aa37) Allocating nic for vm VM[DomainRouter|r-85-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-10.11.119.110-vlan://119

This is a regression issue caused by commit 1d382e0
2019-05-30 11:33:03 +05:30
dahn
910b08f72b server: fix duplicate tag exception as CloudRuntimeException (#3348)
See #3339: a runtime exception is thrown but it should be converted to an error return. Wrapping it in a CloudRuntimeException should do the trick.

Fixes #3339
2019-05-30 11:25:52 +05:30
Rohit Yadav
0929866956
server: ssh-keygen in PEM format and reduce main systemvm patching script (#3333)
On first startup, the management server creates and saves a random
ssh keypair using ssh-keygen in the database. The command does
not specify keys in PEM format which is not the default as generated
by latest ssh-keygen tool.

The systemvmtemplate always needs re-building whenever there is a change
in the cloud-early-config file. This also tries to fix that by introducing a
stage 2 bootstrap.sh where the changes specific to hypervisor detection
etc are refactored/moved. The initial cloud-early-config only patches
before the other scripts are called.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-05-23 18:08:00 +05:30
Nicolas Vazquez
e86f671c8e KVM: Fix agents dont reconnect post maintenance (#3239)
* Keep connection alive when on maintenance

* Refactor cancel maintenance and unit tests

* Add marvin tests

* Refactor

* Changing the way we get ssh credentials

* Add check on SSH restart and improve marvin tests
2019-05-23 14:13:17 +02:00
Anurag Awasthi
f9b61bc737 orchestration: Allow VM that has never started to have volumes attached (#3276)
With this patch b766bf7
we started tracking disks in attaching state so that other attach request can fail gracefully. However this missed the case where disks were in allocated state but attach was requested.

For the use case where users want to attach disk in allocated state but not ready, we need to have allocated-attaching transition as well. We must take care of returning to the original state - allocated or ready - when attach request has completed.

For the use case of unstarted vm's the disk must proceed as follows - "Allocated" -> Attaching -> Allocated. When VM is started, the disk is "created" and pool is assigned. For the use case of started VMs it's more trivial and disk proceeds as follows - Ready -> Attaching -> Ready.

Test this by creating a VM with "startvm=false", create a disk and try attaching it in allocated state. It would give an exception on latest 4.11 but will be fixed on this patch.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2019-05-10 23:40:38 +05:30
Dingane Hlaluku
55fb1c4eb6 server: Allow users to create L2 network types (#3158)
Allow users of all types to create L2 guest networks.

Fixes #3081
2019-03-25 13:12:19 +05:30
Nathan Johnson
bf805d1483 Add back ability to disable backup of snapshot to secondary (#3122)
* The snapshot.backup.rightafter configuration variable was removed by:

SHA: 6bb0ca2f854

This adds it back, though named snapshot.backup.to.secondary now instead.

This global parameter, once set, will allow you to prevent automatic backups of
     snapshots to secondary storage, unless they're actually needed.

Fixes #3096

* updates per review
2019-02-04 19:08:42 -02:00
Nicolas Vazquez
13c81a8ee4 server: Prevent corner case for infinite PrepareForMaintenance (#3095)
A corner case was found on 4.11.2 for #2493 leading to an infinite loop in state PrepareForMaintenance

To prevent such cases, in which failed migrations are detected but still running on the host, this feature adds a new cluster setting host.maintenance.retries which is the number of retries before marking the host as ErrorInMaintenance if migration errors persist.

How Has This Been Tested?
- 2 KVM hosts, pick one which has running VMs as H
- Block migrations ports on H to simulate failures on migrations:
iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 49152:49215 -m comment --comment 'test block migrations' iptables -I OUTPUT -j REJECT -m state --state NEW -m tcp -p tcp --dport 16509 -m comment --comment 'test block migrations
- Put host H in Maintenance
- Observe that host is indefinitely in PrepareForMaintenance state (after this fix it goes into ErrorInMaintenance after retrying host.maintenance.retries times)
2018-12-28 15:14:16 +05:30
Anurag Awasthi
a7ccbdc790 api: allow keyword search in listSSHKeyPairs (#2920) (#3098)
Adds support for keyword search that was ignored by listsshkeypairs command.

Fixes: #2920
2018-12-23 00:34:53 +05:30
Craig Squire
8d53557ba7 api: don't throttle api discovery for listApis command (#2894)
Users reported that they weren't getting all apis listed in cloudmonkey when running a sync. After some debugging, I found that the problem is that the ApiDiscoveryService is calling ApiRateLimitServiceImpl.checkAccess(), so the results of the listApis command are being truncated because Cloudstack believes the user has exceeded their API throttling rate.

I enabled throttling with a 25 request per second limit. I then created a test role with only list* permissions and assigned it to a test user. When this user calls listApis, they will typically receive anywhere from 15-18 results. Checking the logs, you see The given user has reached his/her account api limit, please retry after 218 ms..

I raised the limit to 200 requests per second, restarted the management server and tried again. This time I got 143 results and no log messages about the user being throttled.
2018-12-12 23:55:32 +05:30
Boris Stoyanov - a.k.a Bobby
44bc516609 api: move ostypeid from DB id to DB uuid, backports #2528 (#3066)
This is a backport to 4.11 of #2528
2018-11-29 22:20:51 +05:30
Paul Angus
fb80e51307 Updating pom.xml version numbers for release 4.11.3.0-SNAPSHOT
Signed-off-by: Paul Angus <paul.angus@shapeblue.com>
2018-11-20 13:11:52 +00:00
Nicolas Vazquez
bb7493ad4b configdrive: Add missing ConfigDrive entries on existing zones after upgrade (#3007)
After upgrade existing environments to 4.11, ConfigDrive cannot be enabled for existing zones due to missing entry on 'physical_network_service_providers' table.
2018-11-12 11:30:00 +05:30
Nicolas Vazquez
7d8eb37924 [4.11] Fix set initial reservation on public IP ranges (#2980)
* Fix initial reservation on public IP ranges

* Do not allow dedicating a system VM IP range
2018-11-07 10:48:07 -02:00
Nicolas Vazquez
af0c1e48cf Fix DirectNetworkGuru canHandle checks for lowercase isolation methods (#3010) 2018-11-07 09:53:01 -02:00
Nicolas Vazquez
dffb430975 kvm: Fix migrating VM from ISO failures (#2928)
Prevents errors while migrating VM from ISO:

Test 1: Deploy VM from ISO -> Live migrate VM to another host -> ERROR
Test 2: Register ISO using Direct Download on KVM -> Deploy VM from ISO -> Live migrate VM to another host -> ERROR

- Prevent NullPointerException migrating VM from ISO
- Prevent mount secondary storage on ISO direct downloads on KVM
2018-10-29 16:14:20 +05:30
Rohit Yadav
e2ba934c19
server: fix unwanted txn commit warning messages (#2927)
This fixes unwanted transaction commit warning messages such:

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-29 02:49:54 +05:30
Rohit Yadav
9cf57d2568
network: on rolling restart force stop old routers (#2926)
This force stops old VRs when performing rolling restart with
cleanup=true. This will ensure that VRs are powered off quickly than
wait longer for the normal ACPI shutdown. During testing, it was found
on VMware where VM stops are slow compared to XenServer and KVM.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-25 09:20:39 +05:30
Nicolas Vazquez
5cf163d888 server: Unify templates/ISOs checksum API output (#2911)
Unify checksum API output for templates and ISOs: not list the checksum algorithm on:
KVM direct downloads

On in progress normal template downloads. The algorithm is shown on the listtemplates API, but after it is downloaded it is not shown anymore.
2018-10-21 22:33:04 +05:30
Rohit Yadav
5ce14df31f
network: Allow ability to disable rolling restart feature (#2900)
This adds a global setting for admins who may not want the rolling
restart of routers or are seeing any issues around it. In future, this
setting may be removed.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-17 20:27:08 +05:30
Nicolas Vazquez
9003c7bfdc Add checksum sanity validation on template registration (#2902)
* Add checksum sanity validation on template registration

* Refactor

* Rename checksum sanity method
2018-10-16 10:21:20 -03:00
Rohit Yadav
ea771cfda4
router: Fixes #2719 program VR nics by device id order for VPC (#2888)
This fixes #2719 where private gateway IP might be incorrectly
programmed on a guest network nic. The VR would now check ipassoc
requests by mac addresses than provided nic/device id in case they are
wrong.

The root cause is that the device id information is lost when aggregated
commands are created upon starting of a new VPC VR, without the correct
device id in ip_associations json it mis-programs the VR.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-10-10 15:20:36 +05:30
Frank Maximus
02e2825d2d CLOUDSTACK-10380: Fix startvm giving another password after password reset. 2018-09-17 16:33:35 +02:00
Rohit Yadav
2ab3976c0d
CLOUDSTACK-9473: storage pool capacity check when volume is resized or migrated (#2829)
* CLOUDSTACK-9473: storage pool capacity check when volume is resized or migrated

Storage pool checker is not being called on resize and migrate volume.
This may lead to allocated percentage of storage above 100%.

Setup:
1 VMware cluster with 2 Hosts.

Executed Steps:

Applied the following global settings:
storage.overprovisioning.factor = 1
pool.storage.allocated.capacity.disablethreshold = 1
pool.storage.capacity.disablethreshold = 1
Restarted management server
Executed Resize and migrate pool and Observed that Storage pool checker is not performed on resizeVolume and migrateVolume.
Result:
Root cause analysis shows storage pool checker is not called when doing migration and resizing.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-09-07 22:01:16 +05:30
cl-k-takahashi
2c3424b478 server: fix a typo in UserVmManagerImpl.java (#2811)
Fixes typo presnt -> present

Signed-off-by: Kai Takahashi <k-takahashi@creationline.com>
2018-08-17 15:05:27 +05:30
Rohit Yadav
461c4ad027
vmware: reboot VR after mac updates (#2794)
This re-introduces the rebooting of VR after setup of nics/macs in
case of VMware. It also adds a minor enhancement to show the console
esp. for root admins when VRs and systemvms are in starting state.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-08-10 13:07:11 +05:30
Rohit Yadav
f60f3cec34
router: Fixes #2789 fix proper mark based packet routing across interfaces (#2791)
Previously, the ethernet device index was used as rt_table index and
packet marking id/integer. With eth0 that is sometimes used as link-local
interface, the rt_table index `0` would fail as `0` is already defined
as a catchall (unspecified). The fwmarking on packets on eth0 with 0x0
would also fail. This fixes the routing issues, by adding 100 to the
ethernet device index so the value is a non-zero, for example then the
relationship between rt_table index and ethernet would be like:

100 -> Table_eth0 -> eth0 -> fwmark 100 or 0x64
101 -> Table_eth1 -> eth1 -> fwmark 101 or 0x65
102 -> Table_eth2 -> eth2 -> fwmark 102 or 0x66

This would maintain the legacy design of routing based on packet mark
and appropriate routing table rules per table/ids. This also fixes a
minor NPE issue around listing of snapshots.

This also backports fixes to smoketests from master.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-08-08 12:05:42 +05:30
dahn
38d0274eb4
check volumes for state when retrieving pool for configDrive creation (#2709)
* only ask for the root volume, removing extensive query

* better name
2018-07-18 13:13:41 +02:00
Khosrow Moossavi
67860d9f46 maven: Updating pom.xml version numbers for release 4.11.2.0-SNAPSHOT (#2728)
Fixes the version in pom etc. to be consistent with versioning pattern as X.Y.Z.0-SNAPSHOT after a minor release.

Signed-off-by: Khosrow Moossavi <khos2ow@gmail.com>
2018-07-06 17:27:12 +05:30
Paul Angus
8ba318da19 Updating pom.xml version numbers for release 4.11.2-SNAPSHOT
Signed-off-by: Paul Angus <paul.angus@shapeblue.com>
2018-06-26 17:53:54 +01:00
Paul Angus
2cb2dacbe7 Updating pom.xml version numbers for release 4.11.1.0
Signed-off-by: Paul Angus <paulangus@PA-Ansible-GUI.sblab.local>
2018-06-21 15:52:43 +01:00
dahn
52b02de43f vpc: reuse private gateway ip for non redundant VPC (#2712)
As rolling restart does not deallocate an IP before configuring it on a new VR, the code must allow it to be reused on a non-redundant VPCs gateway nic.
In crease ping counts to reduce intermittent failures in smoketests.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-06-21 15:06:50 +05:30
Nicolas Vazquez
539d7e10f3
Merge pull request #2493 from shapeblue/fixmaintenance
CLOUDSTACK-10326: Prevent hosts fall into Maintenance when there are running VMs on it
2018-06-20 12:00:58 -03:00
Rohit Yadav
39471c8c00
configdrive: make fewer mountpoints on hosts (#2716)
This ensure that fewer mount points are made on hosts for either
primary storagepools or secondary storagepools.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-06-20 12:25:16 +05:30
Daan Hoogland
d126cd21ea comply with api key constraint 2018-06-13 16:45:30 +02:00
nvazquez
faf2a7760d Add unit tests 2018-06-12 11:56:41 -03:00
nvazquez
a22ab69bb6 Set host into ErrorInMaintenance in case of failure trying to enter Maintenance mode 2018-06-12 09:42:09 -03:00
nvazquez
08a8330633 CLOUDSTACK-10326: Fix for infinite loop on PrepareForMaintenance 2018-06-11 09:53:21 -03:00
nvazquez
cc35f9ddb0 CLOUDSTACK-10326: Prevent hosts fall into Maintenance when there are running VMs on it 2018-06-11 09:53:20 -03:00
Frank Maximus
68d87d8f2a CLOUDSTACK-10381: Fix password reset / reset ssh key with ConfigDrive 2018-06-08 18:41:47 +02:00
Nicolas Vazquez
a5856a6447 network: allow advanced zones with security groups and VXLAN isolation type (#2693)
Not possible to deploy an Advanced zone with Security Groups, and VXLAN isolation method on KVM. Exception: "Unable to convert network offering with specified id to network profile" is logged.
2018-06-08 13:13:25 +05:30
Nicolas Vazquez
76367db8fb L2: add default L2 network offerings (#2683)
Adds default L2 network offerings. Adds check for existing default L2 networks.
2018-06-07 11:23:35 +05:30
Frank Maximus
8798014ca8 CLOUDSTACK-10377: Fix Network restart for Nuage (#2672)
Changes in PR #2508 have caused network restart to fail in a Nuage setup,
as the new VR takes the same IP as the old one, and the old VR is still running.
Nuage doesn't support multiple VM's having the same IP.
We delay provisioning the interfaces in VSD until the old VR interface is released.
2018-06-06 12:17:10 +05:30
Rafael Weingärtner
9b83337658 Create unit test cases for 'ConfigDriveBuilder' class (#2674)
* Create unit test cases for 'ConfigDriveBuilder' class

* add method 'getProgramToGenerateIso' as suggested by rohit and Daan

* fix encoding for base64 to StandardCharsets.US_ASCII

* fix MockServerTest.testIsMockServerCanUpgradeConnectionToSsl()

This is another method that is causing Jenkins to fail for almost a month
2018-06-04 13:20:09 +02:00
dahn
7a3a882d12 server: Fixes #2545 revert dedicate vlan code removal (#2664)
This re-adds logic to allow dedication of public ip/range to a domain and its usage.
2018-05-23 20:40:34 +05:30
Rohit Yadav
ebb22a4818 server: Calculate fresh capacity per VM (#2663)
This fixes and ensures that every VM has its capacity individually
calculated, with the initial override of 1.0f as overcommit ratio.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2018-05-23 16:20:07 +02:00
Rafael Weingärtner
8b09620d77 CLOUDSTACK-10276: listVolumes not working when storage UUID is not a UUID (#2639)
When configuring a pre-setup primary storage we can enter the name-label of the storage that is going to be used by ACS and is already set up in the host. The problem is that we can use any String of characters there, and this String does not need to be a UUID. When listing volumes from a primary storage that has such conditions, the list will return all of the volumes in the cloud because the “API framework” will ignore that value as it is not a UUID type.
2018-05-22 17:02:40 +05:30