10571 Commits

Author SHA1 Message Date
slavkap
8afb451c1c
fix NPE in volumes statistics (#4388) 2020-10-30 15:53:05 +00:00
Pearl Dsilva
963d603ede
Fix usage record count (#4193)
Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
2020-10-21 19:15:34 +02:00
Gabriel Beims Bräscher
5c29d5ba45
influxdb: Avoid out of memory by influxDB (#4291)
After a few hours running with InfluxDB configured, CloudStack hangs due to OutOfMemoryException raised. The exception happens at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510):

2020-08-12 21:19:00,972 ERROR [c.c.s.StatsCollector] (StatsCollector-6:ctx-0a4cfe6a) (logid:03a7ba48) Error trying to retrieve host stats
java.lang.OutOfMemoryError: unable to create new native thread
        ...
        at org.influxdb.impl.BatchProcessor.<init>(BatchProcessor.java:294)
        at org.influxdb.impl.BatchProcessor$Builder.build(BatchProcessor.java:201)
        at org.influxdb.impl.InfluxDBImpl.enableBatch(InfluxDBImpl.java:311)
        at com.cloud.server.StatsCollector.writeBatches(StatsCollector.java:1510)
        at com.cloud.server.StatsCollector$AbstractStatsCollector.sendMetricsToInfluxdb(StatsCollector.java:1351)
        at com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:522)
Context on InfluxDB Batch: Enabling batch on InfluxDB is great and speeds writing but it requires caution to avoid Zombie threads.

Solution: This happens because the batching feature creates an internal thread pool that needs to be shut down explicitly; therefore, it is important to add: influxDB.close().
2020-09-01 15:59:43 +05:30
Spaceman1984
86939e7f9d
server: Fixed private gateway can't be deleted (#4016)
When the static route service is not available on the VPC and a static route is created, the static route is created in a revoked state.

Currently, the UI doesn't distinguish between active or revoked static routes.

This PR adds the missing state filter to the list routes command and only lists active routes in the UI.
It also ignores revoked routes when the private gateway is being removed but clears out the inactive routes before the gateway is removed.

Fixes #2908
2020-08-12 13:29:12 +05:30
Nicolas Vazquez
f843c537f0
Fix snapshots garbage collection (#4188)
* Cleanup orphan entries from snapshot store ref for primary storage

* Add debug message
2020-07-18 14:12:53 -03:00
Rohit Yadav
139aa13e6a
server: Purge all cookies on logout, set /client path on login (#4176)
This will purge all the cookies on logout including multiple sessionkey
cookies if passed. On login, this will restrict sessionkey cookie
(httponly) to the / path.

Fixes #4136

Co-authored-by: Pearl Dsilva <pearl.dsilva@shapeblue.com>
2020-07-08 08:03:51 +05:30
Wei Zhou
4da374b6b4
server: Dedicated hosts should be 'Not Suitable' while find hosts for vm migration (#4001)
While migrate a vm, in the popup, the host dedicated to other accounts/domains are also 'Suitable" for migration, which is obviously wrong.

The same issue happens with api findHostsForMigration
2020-07-04 11:01:41 +05:30
Wei Zhou
5526342f4a
server: Do not resize volume of running vm on KVM host if host is not Up or not Enabled (#4148)
If we resize a volume of a vm running on a host which is not Up or not Enable, the job will be scheduled to another normal host. Then the volume will be resized by "qemu-img resize" instead of "virsh blockresize", the image might be corrupted after resize.
2020-06-25 10:40:31 +05:30
davidjumani
b79407c50b
api: Adding missing fields to API responses (#4167)
Adding missing fields in the following APIs
osdisplayname in listVirtualMachines
vpcofferingname in listVpcs
vpcname in listPublicIpAddresses
vpcname in listPrivateGateways
vpcname in listVpnGateways
templatename, podname in listRouters
templatename, podname in listSystemVms

Fixes: #4161
2020-06-25 10:05:30 +05:30
Abhishek Kumar
8010718878
server: fix for wrong affinity group count (#4154)
Fixes wrong count in listAffinityGroup API.
API was returning the count of AffinityGroupJoinVO records. 

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2020-06-24 07:02:56 +05:30
Spaceman1984
97f21c1835
xenserver: Fixed null pointer and deployment issue on Xenserver with L2 Guest network with configDrive (#4004)
This PR fixes an issue where an instance fails to deploy due to a null pointer when using an L2 Guest Network with DefaultL2NetworkOfferingConfigDrive on Xenserver. It also fixes migrating an instance to another host.

This has been tested by:
- Creating an L2 Guest network, using DefaultL2NetworkOfferingConfigDrive as the network offering.
- Deploying an instance using the L2 Guest network created.
- Migrating the instance away from the host and back
2020-06-23 12:21:50 +05:30
davidjumani
06f3ff0b04
api: listVirtualMachinesMetrics should extend ListVMsCmd instead of ListVMsCmdByAdmin (#4145)
Fixes #4143

Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-06-18 19:47:13 +05:30
davidjumani
e9f59e2fd3
server: Adding showunique parameter to list templates and isos (#4140)
Adds a new parameter showunique to listTemplate and listIsos to return only unique templates / isos across all zones

Fixes #4041
2020-06-18 09:05:36 +05:30
Spaceman1984
88d51ce353
server: Restarting all networks that needs a restart in a VPC (#4007)
When a VPC is restarted, the networks in the VPC is not restarted, this PR will add the logic to restart the networks in the VPC that needs a restart when the VPC is restarted.

Fixes #3816
2020-06-17 07:12:04 +05:30
harikrishna-patnala
5054766d9f
server: Submitting multiple dynamic VM Scaling API commands for the same instance can result in two usage events in the same second causing a compound key violation in usage service (#3991)
Root cause:
Even though dynamic scaling job is handled in vmworkjob queue which ensures serilizing multiple jobs but the database updating and generating usage events are out of the job queue.

Solution:
Moved all updations into the job queue

Firstly I have tested all the scenarios to check if nothing is broken:
Scaling on a running VM with normal compute offering
Scaling on a stopped VM with normal compute offering
Scaling on a running VM with custom compute offering
Scaling on stopped VM with custom compute offering
Scaling on stopped/running VM between custom compute offering and normal compute offering and combinations among these. Checked if the custom parameters have been populated or deleted accordingly based on the offering to which the VM is scaled
Since this is a corner scenario I could not test the exact point where two usage events are recorded at the same time for two different API calls on same VM.
2020-06-16 11:41:14 +05:30
Nicolas Vazquez
056e6768a2
server: Cannot migrate VM on PVLAN shared network (#4062)
Fix casting issue.

Fixes #4061
2020-06-08 07:01:11 +05:30
andrijapanicsb
398e685e01 Updating pom.xml version numbers for release 4.13.2.0-SNAPSHOT
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-04-29 12:29:12 +01:00
andrijapanicsb
b2ffa3efa5 Updating pom.xml version numbers for release 4.13.1.0
Signed-off-by: andrijapanicsb <andrija.panic@shapeblue.com>
2020-04-23 19:17:09 +01:00
dahn
6a72e6e9f8 do not put in default accept rules for DNS and BOOTPS 2020-04-16 15:09:51 +02:00
Wei Zhou
e0b67a4c68
server: Cannot list affinity group if there are hosts dedicated… (#4025) 2020-04-10 09:10:51 +02:00
Wei Zhou
6bf92fb136
server: Search zone-wide storage pool when allocation algothrim is firstfitleastconsumed (#4002) 2020-04-06 22:01:40 +02:00
harikrishna-patnala
78fda2d163
With basic zone and VMware hypervisor, VR fails to start since eth1 is getting empty instead of a private IP. (#3977)
Though VMware does not support security groups, but in a basic zone with VMware and no isolation VMs should be able to deploy.

Root cause:
In case of VMware and basic zone control nic is set to 0.0.0.0 assuming control network will be shared with guest network.
But to have access to VMware instances management/private needs to be assigned to it.

Solution:
Assing a private ip even in case of basic zone VMware.
2020-03-27 19:46:01 +01:00
Rohit Yadav
2e3390f06e
server: export full response view for zones response when caller is root admin (#3989)
The listZonesMetrics does not return same keys are listZones as the
default response view is restricted. This fixes that by ensuring that
for root admin full response view is used.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-03-25 12:53:41 +05:30
Gabriel Beims Bräscher
cd6f0cb1e1
Prevent overflow on StatsCollector + add a few enhancements on code (#3932) 2020-03-13 19:51:12 +01:00
Wei Zhou
19fb23781b
server: password is not displayed when reinstall a vm or reset… (#3948) 2020-03-12 11:14:34 +01:00
Rohit Yadav
0fab5e8d60
server: fix database exception while searching network offerings (#3947)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-03-06 10:34:09 +01:00
Nicolas Vazquez
bd7d41bf6d
server: fix VM with ISO attached migration issue (#3935)
As previously described by PR #3929:
If vm has attached ISO, the migration fails with error message "org.libvirt.LibvirtException: Cannot access storage file /mnt/b33e5a1d-e4ea-3465-b6ac-c98dc8ff8af0/207-2-cc5fd717-2d57-3bb3-bcf6-2c930268db6c.iso"
2020-03-06 13:32:19 +05:30
Rohit Yadav
b4fdf22397
kvm: fix/optimize propogating configs (#3911)
Make some changes based on @nvazquez 's comments in PR #3491
Fix a bug in #3491
2020-03-05 12:20:51 +05:30
Wei Zhou
313e21a0da
VR: Fix Redundant VRouter guest network on wrong interface (#3847) 2020-02-29 19:52:40 +01:00
Wei Zhou
79f7f0f007
server: fix issue while list ssh keypairs by keyword (#3916)
in 4.13, list sshkeypairs with keyword will ignore the search by name if name is specifed
Fixes an issue in #3098

for example,
(local) > list sshkeypairs name=wei keyword=wei filter=name
{
  "count": 3,
  "sshkeypair": [
    {
      "name": "wei3"
    },
    {
      "name": "wei2"
    },
    {
      "name": "wei"
    }
  ]
}

with this patch ,it gives correct result.

(local) > list sshkeypairs name=wei keyword=wei filter=name
{
  "count": 1,
  "sshkeypair": [
    {
      "name": "wei"
    }
  ]
}
2020-02-28 15:05:49 +05:30
Rakesh
7e30e3d141
router: Avoid duplicate alerts when router state changes (#3904)
When both routers of VPC is in MASTER state
then multiple alerts are sent equally to the number of tiers in the VPC.
If the VPC has 3 tiers then 6 alerts will be sent. This is not good
if VPC has more than 10 networks in it.

Instead of checking the router status for all the tiers in the VPC,
just check the status of the router for one tier in a VPC so that
multiple duplicate alerts can be avoided
2020-02-28 14:24:12 +05:30
Rakesh
abb39a25af
server: send VM password to all Running VRs in network/vpc (#3903)
Currently, the cloudstack sends VM password only to the first
router in the network even if its the backup and return the result.

In some cases the first router will be back up and the second will be master.
Since password server is not running in backup, when the user resets the password,
it is sent to the first router which can be backup.
In that case, the new password is not stored in the password server and users cant log in with a new password.

This change ensures that we send the password to both the routers instead
of the first router so that a new password is stored in the master router.
2020-02-28 12:00:16 +05:30
Pearl Dsilva
4d8a2da133
api: Fix count and item issues returned by list APIs (#3894) 2020-02-26 15:14:23 +00:00
Rakesh
e269b14095
Fix network rules issue if default egress policy is Allow (#3905) 2020-02-23 21:12:06 +00:00
Wei Zhou
ac7bcde45b
KVM: Propagating changes on host parameters to the agents (#3491) 2020-02-19 13:13:37 +00:00
Wei Zhou
37d2b8537c
kvm: Enable virtio drivers based on guest os display name (#3879)
When we add new guest os, sometimes we missed the records in guest_os_hypervisor.
However, the guest disk model (virtio/ide) is determined by record in the table.
It causes the issue that some new guest os(eg Debian 8/9) uses e1000 instead of virtio nic, and ide disk instead of virtio disk.

To fix the issue permanantly, pass the guest os name in guest_os if the record for kvm is not found in guest_os_hypervisor.

Related commit:7ac9f00eeeb4cd37ec39efeba066e799b581b1a0
2020-02-19 13:12:20 +05:30
Wei Zhou
649ed45965
kvm: fix exception in volume statts after storage migration (#3884)
On kvm, the 'path' of volume is the file name on primary storage. we should use 'path' instead of 'uuid' in volume statistics.

Fixes: #3878
2020-02-19 13:06:19 +05:30
Rakesh
bbe2bf1a6e
server: ignore site to site vpn status check on internallbvm (#3864)
When the state of the site to site vpn changes, the check
is done on all the virtual routers including the internal
load balancing vm as well. It is not needed to check the
state for internal load balancing vm
2020-02-18 14:03:30 +05:30
Rohit Yadav
78cc0a44c1
server: use host record related to a ssvm/cpvm (#3876)
This implements the systemvm list API response creator to find and use
the host record for a ssvm/cpvm to get the agent status and other
details like last disconnected date and agent version.

Fixes 3875

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
2020-02-10 19:01:34 +05:30
Rohit Yadav
afcbbc4b3e
systemvm: list systemvm does not return agent state and version (#3870)
This makes the listSystemVms API to return the host status (agent state),
version and last pinged information. This makes it possible for UIs
to call a single API to get this information.
2020-02-07 13:19:35 +01:00
Wei Zhou
a9a1737dd9
vpc: set traffic type of private gateway IP to Public to fix ke… (#3851) 2020-02-06 20:22:08 +01:00
Pearl Dsilva
bfdb914693
usage: publish zone id while uploading template and volume (#3867)
After a local template is uploaded via browser, the generated usage event with type = "TEMPLATE.CREATE" is persisted with the data store ID instead of the zone ID on the zone_id column. The fix will refactor the upload monitor logic, as after the upload completes, it sets the datastore ID on the zone ID column for the created "TEMPLATE.CREATE" usage event. This refactor will query the DB for the data store and will set its associated zone ID in the usage field.
The fix produces the same behaviour as when registering a template from URL.
FIx is also for uploading VOLUME from local/via browser.
2020-02-06 11:31:24 +05:30
Abhishek Kumar
a71874682c
server: fix checking disk offering access for snapshot volume (#3791)
Fixes #3783
As reported in the issue, creating volumes from pure snapshot fails with NPE. This is due to order of calls where disk offering access is checked before checking disk offering value. This PR fixes the same.

Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
2020-01-31 15:24:25 +05:30
Abhishek Kumar
9d105b6546
template: copy md5 mismatch (#3383)
Fixes #3191

When a template is registered, code stores md5sum of the downloaded file in the vm_template table. However, this downloaded file could be deleted after template installation if it is not an actual (.qcow2, .ova, etc.) file. When the user copies a template using copyTemplate API, the actual template file will be copied across the image stores. Matching checksum for the copied templated file and the stored value from the vm_template table will result in a mismatch.
Changes will set an empty checksum value for the copied template while passing to download service which allows skipping wrong checksum check for the copied while install.
However, this results in a change in checksum value for concerned template entry in vm_template table post template install.

Co-authored-by: dahn <daan.hoogland@gmail.com>
2020-01-31 14:16:37 +05:30
davidjumani
7a25e40d5a
api: allow listing management server by id and name (#3840)
The List Management Server api returns a list of all the management servers but fails when trying to list by id or name. This ensures that it fetches the details as per the parameters passed.
Fixes: #3833
2020-01-30 10:38:25 +05:30
Pearl Dsilva
1c130a5dd4
api: metrics API response is not super-set of resources response keys (#3834)
The metrics API has few properties missing that are present in the corresponding resource. 

Fixes #3831

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Co-authored-by: Rohit Yadav <rohit@apache.org>
2020-01-30 08:49:45 +05:30
Gregor Riepl
8792070f84 Rethrow takeVMSnapshot() exception instead of returning null in VMSnapshotManagerImpl (#3761)
Fixes: #3518
2020-01-28 11:05:15 +05:30
Wei Zhou
a77d74ba0d server: Fix NPE while update displayvm on vm with dynamic service offering (#3758)
Steps to reproduce the issue
(1) create a custom service offering
(2) create a vm with the offering
(3) update vm with displayvm=false, returns an error

(local) > update virtualmachine id=f33fd06a-7643-40d1-833f-272845d9ba09 displayvm=false
Error 530: {"updatevirtualmachineresponse":{"uuidList":[],"errorcode":530,"cserrorcode":9999}}
2020-01-28 11:04:26 +05:30
Wei Zhou
136505b22c server: double check host capacity when start/migrate a vm (#3728)
When start a vm or migrate a vm (away from a host in host maintenance), cloudstack will check capacity of all hosts and choose one. If there are hundreds of hosts on the platform, it will take some seconds. When cloudstack choose a host and start/migrate vm to it, the resource consumption of the host might have been changed. This normally happens when we start/migrate multiple vms.
It would be better to double check the host capacity when start vm on a host.

This PR includes the fix for cpucore capacity when start/migrate a vm.
2020-01-28 10:55:11 +05:30
Wei Zhou
71e53ab01d server: Capacity check should take vms in Migrating state into calculation (#3727)
When we calculate a resource consumption of a host, we need to take the vms in following states into calculation: Running, Starting, Stopping, Migrating (to the host), and vms are Migrating from the host. Because, when stop a vm, the resource on host will be released when vm is stopped. When migrate a vm, the resource on destination host will be increased before migration starts, and resource on source host will be decreased after migraiton succeeds.

In cloudstack, there is a task named CapacityChecked which run every 5 minutes (capacity.check.period =300000 ms by default). It recalculates capacity of all hosts. However, it takes only vms in Running and Starting into consideration. We have faced some issues in host maintenance due to it.

Steps to reproduce the issue
(1) migrate N vms from host A to host B, cpu/ram resource increases before the migration.
(2) capacity check recalculate the capacity of hosts. used capacity of Host B will be reset to original value (not including the vms in Migrating).
(3) migrate some more vms from other host to host B, the migrations are allowed by cloudstack (because used capacity is incorrect). If the actual used memory exceed the physical memory on the host, there might be some critical issues (for example, libvirt dies)
2020-01-28 10:54:32 +05:30