* CPU to Memory weight based algorithm to order cluster
host.capacityType.to.order.clusters config will support new algorithm: COMBINED
which will work with host.capacityType.to.order.clusters.cputomemoryweight and capacity will be
computed based on CPU and memory both and using weight factor
* minor changes
* add unit tests
* update desc and add validation
* handle copilot review comments
* add log indicating chosen capacityType for ordering
---------
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Management Server - Prepare for Maintenance and Cancel Maintenance improvements:
- Added new setting 'management.server.maintenance.ignore.maintenance.hosts' to ignore hosts in maintenance states while preparing management server for maintenance. This skips agent transfer and agents count check for hosts in maintenance.
- Rebalance indirect agents after cancel maintenance, using rebalance parameter in cancelMaintenance API
- Force maintenance after maintenance window timeout, using forced parameter in prepareForMaintenance API.
- Propagate 'indirect.agent.lb.check.interval' setting change to the host agents.
* rebases fixes
* code improvements, cleanup
* [UI] Set rebalance true by default in cancel maintenance dialog
* Update MS state after executing cluster cmd in the target MS, and some code improvements
* code improvements
* Ensure the host lb algorithm 'shuffle' is applied once before disabling the indirect agent lb check background task
* Introducing Storage Access Groups to define the host and storage pool connections
In CloudStack, when a primary storage is added at the Zone or Cluster scope, it is by default connected to all hosts within that scope. This default behavior can be refined using storage access groups, which allow operators to control and limit which hosts can access specific storage pools.
Storage access groups can be assigned to hosts, clusters, pods, zones, and primary storage pools. When a storage access group is set on a cluster/pod/zone, all hosts within that scope inherit the group. Connectivity between a host and a storage pool is then governed by whether they share the same storage access group.
A storage pool with a storage access group will connect only to hosts that have the same storage access group. A storage pool without a storage access group will connect to all hosts, including those with or without a storage access group.
* Add & Remove PowerFlex/ScaleIO MDMs while preparing & unpreparing the storage SDC connections (instead of start & stop scini)
* Add/Remove MDM IP addresses during Host connection/disconnection to/from storage pool when powerflex.connect.on.demand is false
* unit test fixes
* Don't remove MDM IPs from SDC when any volumes mapped to SDC
* Don't remove MDM IPs when other pools of same ScaleIO/PowerFlex cluster are connected
* rebase fixes
* update changes, to not remove/disconnect MDMs on maintenance
* import fixes after rebase
* Update last agents during ms maintenance, and some code improvements
* Send 503 (Service Unavailable) response status when maintenance or shutdown is initiated
[Any load balancer in the clustered environment can avoid routing requests to this MS node]
* Migrate systemvm agents before routing host agents, and some code improvements
* Added events for ms maintenance and shutdown operations
* Added the following ms maintenance and shutdown improvements
- block new agent connections during prepare for maintenance of ms
- maintain avoids ms list
- propagate updated management servers list and lb algorithm in host and indirect.agent.lb.algorithm settings respectively, to systemvm (non-routing) agents
- updated setup ms list and migrate agent connections to executor service
- migrate agent connection through executor, and send the answer to the ms host that initiated the migration
- re-initialize ssl handshake executor if it is shutdown
- don't allow prepare for maintenance or shutdown when other management server nodes are in preparing states
- don't allow trigger shutdown when management server is up and other management server nodes are in preparing states
- stop agent connections monitor on ms maintenance
- update avoid ms list in ready command
- updated connected host from the client connection
- update last agents in ms metrics from the database
- updated some agent config descriptions
- update last management server in the hosts during shutdown
- added agents and lastagents in management server response
- updated management server maintenance & shutdown unit tests
- some code improvements
* refactored code / addressed comments
* removed shutdown testcase (maybe, calling System.exit)
* Revert "removed shutdown testcase (maybe, calling System.exit)"
This reverts commit e14b0717152ef6c8be102d61c80f42803a53172e.
* avoid system.exit during shutdown test
* code improvements
* testcase fix
* Fix cutoff time in agent connections monitor thread
This PR introduces the concept of multi-scope configuration settings. In addition to the Global level, currently all configurations can be set at a single scope level.
It will be useful if a configuration can be set at multiple scopes. For example, a configuration set at the domain level
will apply for all accounts, but it can be set for an account as well. In which case the account level setting will override the domain level setting.
This is done by changing the column `scope` of table `configuration` from string (single scope) to bitmask (multiple scopes).
```
public enum Scope {
Global(null, 1),
Zone(Global, 1 << 1),
Cluster(Zone, 1 << 2),
StoragePool(Cluster, 1 << 3),
ManagementServer(Global, 1 << 4),
ImageStore(Zone, 1 << 5),
Domain(Global, 1 << 6),
Account(Domain, 1 << 7);
```
Each scope is also assigned a parent scope. When a configuration for a given scope is not defined but is available for multiple scope types, the value will be retrieved from the parent scope. If there is no parent scope or if the configuration is defined for a single scope only, the value will fall back to the global level.
Hierarchy for different scopes is defined as below :
- Global
- Zone
- Cluster
- Storage Pool
- Image Store
- Management Server
- Domain
- Account
This PR also updates the scope of the following configurations (Storage Pool scope is added in addition to the existing Zone scope):
- pool.storage.allocated.capacity.disablethreshold
- pool.storage.allocated.resize.capacity.disablethreshold
- pool.storage.capacity.disablethreshold
Doc PR : https://github.com/apache/cloudstack-documentation/pull/476
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* api,agent,server,engine-schema: scalability improvements
Following changes and improvements have been added:
- Improvements in handling of PingRoutingCommand
1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs.
2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch
3. Optimized scanning stalled VMs
- Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers`
- Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine
- Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0.
- Added caching for account/use role API access with expiration after write set to 60 seconds.
- Added caching for some recurring DB retrievals
1. CapacityManager - listing service offerings - beneficial in host capacity calculation
2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins
3. DownloadListener - hypervisors for zone - beneficial for host joins
5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands
- Optimized MS list retrieval for agent connect
- Optimize finding ready systemvm template for zone
- Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks
- Changes in agent-agentmanager connection with NIO client-server classes
1. Optimized the use of the executor service
2. Refactore Agent class to better handle connections.
3. Do SSL handshakes within worker threads
5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end.
6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used.
- Improvements in StatsCollection - minimize DB retrievals.
- Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals.
- Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools.
- Minor improvements in resource limit calculations wrt DB retrievals
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* test1, domaindetails, capacitymanager fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* test2 - agent tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* capacitymanagertest fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* change
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix missing changes
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert marvin/setup.py
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix indent
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* use space in sql
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address duplicate
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* update host logs
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* revert e36c6a5d07
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix npe in capacity calculation
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* move schema changes to 4.20.1 upgrade
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* address comments
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* fix build
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add some more tests
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* checkstyle fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* remove unnecessary mocks
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* build fix
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* replace statics
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* engine/orchestration,utils: limit number of concurrent new agent
connections
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* refactor - remove unused
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* unregister closed connections, monitor & cleanup
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* add check for outdated vm filter in power sync
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* agent: synchronize sendRequest wait
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
---------
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* Support for Management Server Maintenance
- New APIs: prepareForMaintenance and cancelMaintenance, with required parameter - managementserverid.
- New management server states for maintenance: PreparingForMaintenance, Maintenance.
- listHosts API with optional parameter – managementserverid, to list the hosts connected to the management server.
- Support management server maintenance when more than one active management servers available.
- Triggers transfer agents to other available management servers for maintenance, new agent command MigrateAgentConnectionCommand to initiate transfer of indirect agents.
- New global config 'management.server.maintenance.timeout', to set the timeout (in mins) for the management server maintenance window, default: 60 mins.
- UI changes: Prepare and Cancel Maintenance in Management Server section, Connected Agents tab, New fields for hosts and management servers.
* Updated pending jobs check timer task with ScheduledExecutorService
* keep maintenance state on trigger shutdown call when ms is in maintenance
* add pending jobs count to ms response
* during ms heartbeat, update state to up only when it's down
* allow vm work jobs of async job created before prepare for maintenance
* Revert "keep maintenance state on trigger shutdown call when ms is in maintenance"
This reverts commit 607e13364679eac897f4d146bb3325ea7a61ba17.
* skip maintenance test when multiple management servers are not available, and not configured in host setting for kvm
For HA work items that are created for host state change, checks must be
done when execution is called in a new management server session.
A new column, reason, has been added in cloud.op_ha_work table to track
the reason for HA work.
When HighAvailabilityManager starts it finds and puts all pending HA
work items in Investigating state. During execution of the HA work if it
is found in investigating state, checks are done to verify if the work
is still valid. If the jobs is found to be invalid it is cancelled.
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* 4.20:
merge errors fixed
Restrict the migration of volumes attached to VMs in Starting state (#9725)
server, plugin: enhance storage stats for IOPS (#10034)
Introducing granular command timeouts global setting (#9659)
Improve logging to include more identifiable information (#9873)
* Introducing granular command timeouts global setting
* fix marvin tests
* Fixed log messages
* some more log message fix
* Fix empty value setting
* Converted the global setting to non-dynamic
* set wait on command only when granular wait is defined. This is to keep the backward compatibility
* Improve error logging
* Improve logging to include more identifiable information for kvm plugin
* Update logging for scaleio plugin
* Improve logging to include more identifiable information for default volume storage plugin
* Improve logging to include more identifiable information for agent managers
* Improve logging to include more identifiable information for Listeners
* Replace ids with objects or uuids
* Improve logging to include more identifiable information for engine
* Improve logging to include more identifiable information for server
* Fixups in engine
* Improve logging to include more identifiable information for plugins
* Improve logging to include more identifiable information for Cmd classes
* Fix toString method for StorageFilterTO.java
* 4.20:
VR: fix site-2-site VPN if split connections is enabled (#10067)
UI: fix cannot open 'Edit tags' modal for static routes (#10065)
Update ownership selection component to be language independent (#10052)
Support to enable/disable VM High Availability manager and related alerts (#10118)
- Adds new config 'vm.ha.enabled' with Zone scope, to enable/disable VM High Availability manager. This is enable by default (for backward compatibilty).
When enabled, the VM HA WorkItems (for VM Stop, Restart, Migration, Destroy) can be created and the scheduled items are executed.
When disabled, new VM HA WorkItems are not allowed and the scheduled items are retried until max retries configured at 'vm.ha.migration.max.retries' (executed in case HA is re-enabled during retry attempts), and then purged after 'time.between.failures' by the cleanup thread that runs regularly at 'time.between.cleanup'.
- Adds new config 'vm.ha.alerts.enabled' with Zone scope, to enable/disable alerts for the VM HA operations. This is enabled by default.
* Fix `updateTemplatePermission` when the UI is set to a language other than English (#9766)
* Fix updateTemplatePermission UI in non-english language
* Improve fix
---------
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
* Prepend vpc name to vpc tier network name based on global setting
* Added UT for createVpcGuestNetwork
* rename connector to delimiter and add configKey.Category.Network
* Move setting the name to a new method
---------
Co-authored-by: Daan Hoogland <daan@onecht.net>
Co-authored-by: Lucas Martins <56271185+lucas-a-martins@users.noreply.github.com>
Co-authored-by: Lucas Martins <lucas.martins@scclouds.com.br>
* Prevent addition of duplicate PF rules on scale up and no rules left behind on scale down (#32)
* fix missing dependency injection
* NSX: Fix concurrency issues on port forwarding rules deletion (#37)
* Fix concurrency issues on port forwarding rules deletion
* Refactor objectExists
* Fix unit test
* Fix test
* Small fixes
* CKS: Externalize control and worker node setup wait time and installation attempts (#38)
* NSX: Add shared network support (#41)
* NSX: Fix number of physical networks for Guest traffic checks and leftover rules on CKS cluster deletion (#45)
* Fix pf rules removal on CKS cluster deletion
* Fix check for number of physical networks for guest traffic
* Fix unit test
* fix logger
* NSX: Handle CheckHealthCommand to avoid host disconnection and errors on APIs
* NSX: Handle CheckHealthCommand to avoid host disconnection and errors on APIs
* Remove unused string
* fix logger
* Update UDP active monitor to ICMP
* Fix NPE on restarting VPC with additional public IPs
* NSX / VPC: Reuse Source NAT IP from systemVM range on restarts
* CKS: Public IP not found for VPC networks
* Externalize retries and inverval for NSX segment deletion (#67)
* remove unused import
* remove duplicate imports
* remove unused import
* revert externalizing cks settings
* fix test
* Refactor log messages
* Address comments
* Fix issue caused due to forward merge: 90fe1d
---------
Co-authored-by: Nicolas Vazquez <nicovazquez90@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This PR contains 3 features
- IPv4 Static Routing (Routed mode) #9346
Design document: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=306153967
- AS Numbers Management #9410
Design Document: https://cwiki.apache.org/confluence/display/CLOUDSTACK/BGP+AS+Numbers+Management
- Dynamic routing
Design Document: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=315492858
- Document: https://github.com/apache/cloudstack-documentation/pull/419
Rename nsx mode to routing mode
by
```
git grep -l nsx_mode |xargs sed -i "s/nsx_mode/routing_mode/g"
git grep -l nsxmode |xargs sed -i "s/nsxmode/routingmode/g"
git grep -l nsxMode |xargs sed -i "s/nsxMode/routingMode/g"
git grep -l NsxMode |xargs sed -i "s/NsxMode/RoutingMode/g"
```
- re-organize sql changes
- fix NPE as rules do not have public ip
- fix missing destination cidr in ingress rules
- disable network usage for routed network
- fix DB exception as network_id is -1 during network creation
- apply ingress/egress routing rules
- VR changes to configure nft rules for isolated network
- VR: setup nft rule for control network
- VR: flush all iptables rules
- fix NPE which is because ingress rules do not have public ip associated
- fix dest cidr is missing in nft tables
- add ip4 routing and ip4 routes to list network and list vpc response
- fix ingress rule is missing when vr is restarted
- fix icmp types in nft rules
- add tab to manage routing firewall rules
- fix ingress rules are not applied when VR is restarted
- add default rules in FORWARD chain
- fix create vpc offerings
- fix public ip is not assigned to vpc
- fix network offering is not listed when create vpc tier
- add is_routing to boot args of vpc vr
- remove table ip4_firewall in vpc vr
- release or remove subnet when remove a network
- implemenent fw_vpcrouter_routing
- fix wrong ip familty when flush ipv4 rules
- fix acl rules are not applied due to wrong version (should be 6 which means ip6 rules are removed)
- add default rules for vpc tiers so that tcp connections (e.g. ssh) work
- append policy rules after default rules
- remove /usr/local/cloud/systemvm/ in routers
- throw an exception when allocate subnet with cidrsize
- fix some TODOs
- add new parameters to update API
- return type Ipv4GuestSubnetNetworkMap when get or create subnet
- fix firewall rules are broken
- add domain_id and account_id to db
- add domain/account/project to ipv4 subnet response
- create ipv4 subnet for domain/account/project
- check conflict when update ipv4 subnet
- ui changes
- add parent subnet to response
- add list for ipv4 subnet
- implement some methods
- fix list subnets for guest networks by zoneid
- UI changes
- fix delete ipv4 subnet for network
- fix ipv4 subnet is set to zone guest network cidr if cidrsize is specified
- add zone info to response if parent subnet is null but network is not
- fix gateway/cidr is not set when create network with cidrsize
- fix order of nft rules in the VRs
* Routed v24
- add classes in marvin base.py
* Routed v25
- add test_01_subnet_zone
- fix dedicate to domain/account failure
- list subnets for network by keyword and subnet
* Routed v26: implement subnet auto-allocation
- add utils for split ip ranges into small subnets
- add utils to get start/end ip of a cidr
- implement subnet auto-generation
- add global settings
* Routed 27: add subnet for VPC
- add db column for vpc_id
- add db record for vpc
- remove db record when delete a vpc
- add checkConflicts methods
- remove duplicated settings
- check ipv4 cidr when create subnet
* Routed v28: update smoke tests
- update test_ipv4_routing.py
- search subnets by networkid
* Routed 29: fix vpc and add more tests
- fix createnetwork in vpc
- add vpc id/name to response
- fix zone id/name are not displayed in some cases
- add smoke test for vpc
- add smoke tests for failed cases
- add smoke test for connectivity checks
- marvin: add "-q" to ssh command
* Routed 31: ui and smoke tests
- UI: add link to network in list view
- add nftables rules check in VRs
* Routed 32: add chain OUTPUT and more rules
- fix the issue 80/443/8080 is not reachable from VR itself
```
2024-06-27 10:21:52,121 INFO Executing: systemctl start cloud-password-server@172.31.1.1
2024-06-27 10:21:52,128 INFO Service cloud-password-server@172.31.1.1 start
2024-06-27 10:21:52,129 INFO Executing: ps aux
2024-06-27 10:24:02,175 ERROR Failed to update password server due to: <urlopen error [Errno 110] Connection timed out>
```
* Routed: fix dns search from VMs in Isolated networks
* Routed: fix VPC dns issue due to gateway IP is missing in cloud.conf
This is caused by NSX integration, and fixed by
https://github.com/apache/cloudstack/pull/9102/
* Routed: rename routing_mode to network_mode
* Routed: replace centos5.5 template in smoke test as dhclient does not work in the vms
// this does not work
refer to https://dominikrys.com/posts/disable-udp-checksum-validation/#ignoring-udp-checksums-with-nftables
and
https://forum.openwrt.org/t/udp-checksum-with-nftables/161522/11
the vm should have checksum offloading disabled
* Routed: fix smoke test due to wrong cidrlist of egress rules and missing ingress rule from VR
* PR 9346: fix lint error schema-41910to42000.sql
* PR 9346: ui polish v1
* PR 9346: create VPC with cidrsize
* Routed: fix test failures with test_network_ipv6 and test_vpc_ipv6 due to 'ssh -q'
* Routed: fix /usr/local/cloud/systemvm/ are removed after SSVM/CPVM reboot
* Routed: fix IP of additional nics of VPC VR is not gateway
* PR 9346: fix cidrsize check when create VPC with cidrsize
* Routed: fix test/integration/smoke/test_ipv4_routing.py:279:16: E713 test for membership should be 'not in'
* PR9346: fix/Update api
* PR 9346: set response object name
* PR9346: UI refactor and small fixes
* PR9346: change return type of getNetworkMode
* PR9346: move IPv4 subnet to seperated tab
* PR9346: revert IpRangesTabGuest.vue back to original
* PR9346: fix remove ipv4 subnet on UI
* PR9346: fix test_ipv4_routing.py
* AS Number Range Management
* Create AS Number Range for a Zone
* Fix build
* Add ListASNRange and fix create ASN range
* Add List AS numbers
* Add UI for AS Numbers
* Fix UI and filter AS Numbers
* Add AS Number on Isolated network creation and refactor UI and response
* Release AS Number
* Add network offering new columns
* Add UI support to view and add AS number and configure network offering
* Automatically assign AS Number if not specify AS number
* update variable name
* Fix routing mode check
* UI: Only allow selecting AS number when routing mode is Dynamic and specifyAsNumber is true
* UI: Only pass AS number when supported by the network offering
* Release AS number on network deletion
* Add deleteASNRange command (#81)
* API: List ASNumbers by asnumber (#83)
---------
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
* AS number management extensions
* Support AS number on VPC tier creation based on the offering
* Fix delete AS Range
* Fix UI values
* UI: Minor fix for releasing AS number
* UI: Move management of AS Range to Zone details view
* Fix specify_as_number column in network_offering table to set the default false
* Add events for AS number operations
* Allow users to list AS Numbers and fix network form for Normal users
* Add AS number details to list networks response
* Fix Allocated time format
* Fix Allocated time format
* support in details view too
* Fix: Do not release AS number if acquired network requires AS number
* Fix: Do not release AS number if acquired network requires AS number
* Fix typo
* Fix allocated release
* Fix event type
* UI: Add Routing mode and Specify AS to the network offering details
* UI: Add Routing mode and Specify AS to the network offering details
* Address comment
* Fix release AS number of network deletion
* Fix release AS number of network deletion
* Fix
* Restore release to its place based on the boolean
* Rename boolean
* API: Add networkId as listASNumber parameter
* Add Network name to the search view filter for AS numbers
* Present allocated time in human readable format - Pubilc IP / AS Numbers
* Add account / domain filter for AS numbers
* Add support for AS numbers on VPC offerings
* Refactor AS number allocation to VPC and non VPC isolated networks
* Checkstyle
* Add support for AS numbers on VPC offerings
* extend vpc offering view and vpcoffering response
* merge https://github.com/shapeblue/cloudstack-playtika/pull/115 and change network_id of as_numbers to include vpc_id
* Display AS number of VPC tiers as the AS number of the VPC
* extend asnumber response and ui support
* improve UI and as number response to view VPC details
* List only dynamic offerings for vpc tiers with specify as numbers
* Fix release AS number
* Fix AS number displayed as 0 when no AS number assigned
* Fix VPC offering creation without specify AS
---------
Co-authored-by: nvazquez <nicovazquez90@gmail.com>
* Fix release AS number on VPC deletion
* Update server/src/main/java/com/cloud/dc/BGPServiceImpl.java
* Update server/src/main/java/com/cloud/dc/BGPServiceImpl.java
* Fix missing column on asnumber table
* Fix listASNumbers API to support vpcid and obtain AS number from vpc for tiers
* Prevent listing 0 AS number for VPC
* Fix create Isolated Network form
* Update server/src/main/java/com/cloud/network/vpc/VpcManagerImpl.java
* Update server/src/main/java/com/cloud/network/vpc/VpcManagerImpl.java
* Dynamic: move routingmode/specifyasn after networkmode in AddNetworkOffering.vue on UI
* Dynamic: fix ip4routing in network response
* Dynamic/systemvm: add FRR to systemvm template
* Dynamic: BGP peers (DB,VO,Dao)
* Dynamic: BGP peers (VR/server)
* Dynamic: v3
- remove BgpPeer class
- fix vpc vr has bgp peers of only 1 tier
- rename ip4_cidr to guest_ip4_cidr
- rename ip6_cidr to guest_ip6_cidr
- generate /etc/frr/frr.conf
- apply BGP peers on Dynamic-Routed network even if there is no BGP peers
* Dynamic v4: fix vpc vr
- fix duplicated guest cidr in frr.conf in vpc vr
todo
- restart frr / reload frr (reload will cause bgp session to Policy state)
- apis for bgp peers
- assign/release bgp peer from/to network
* Dynamic v5: add apis for bgp peers
* Dynamic v6: fix bugs
- set response object name
- remove required as number when update
- fix checks when update
- allow regular users to list bgp peers
* Dynamic v7: move apis to bgp sub-dir
* Dynamic v8: add tab for manage BGP peers on UI
* Dynamic v9: fix update bgp with same config
* Dynamiv v10: add changeBgpPeersForNetworkCmd
* Dynamic v11: create network with bgppeerids
- create network with bgppeerids
- add marvin classes
- add smoke tests
- remove uuid from bgp_peer_network_map
- fix created/removed in bgp_peer_network_map
- remove bgppeers when remove a network
- UI: fix delete bgp peer
* Dynamic v12: add test for vpc tiers
* Dynamic v13: bug fixes
- fix change BGP peers for network in Allocated state
- fix listing network returns removed record
- fix all vpc tiers have the same settings
- remove BGP peers as part of network removal
- remove FRR settings for vpc tiers without any BGP peers
- UI: fix no error msg when change BGP peers
* Dynamic v14: assign BGP Peers for VPC instead of VPC tiers
- create vpc with bgppeerids
- do not allow create/update vpc tier with bgppeerids
- apply all bgp peers when create/delete a vpc tier
- UI: change bgp peers for vpc
- test: update tests on vpc
* Dynamic: fix build errors after merging as number PR
* Dynamic: fix TODOs
* Dynamic: fix smoke test on VPC
* Allow creation of networks by users with as numbers
* Address review comments
* Move BGPService to bgp package and inject it on BaseCmd
* Revert changes for CKS and address more comments
* Display left side menu option for AS number only for root admin
* Dynamic: create/update BGP peer with details
refer to https://docs.frrouting.org/en/latest/bgp.html
* Dynamic: fix build error and remove access to ListBgpPeers cmd for regular users
* Dynamic: assign all zone BGP peers to user networks
* Dynamic: show BGP peer info of networks only for root admin
* AS number: disable specifyasnumber for non-NSX offerings
* Dynamic: pass bgppeer details to command and fix typo with ip6 addr
* Dynamic: list BGP peers by isdedicated, and fix change bgppeers for network/vpc
* Dynamic: add UI labels
* Dynamic: add bgp peers to vpc response
* Dynamic: list bgp peers by keyword, fix list by asnumber
* Dynamic: fix list bgppeers by keyword and db schema
* Dynamic: fix list bgppeers do not return dedicated peers
* Dynamic: update UI when create network/vpc offering
* Update server/src/main/java/com/cloud/configuration/ConfigurationManagerImpl.java
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
* Update tools/marvin/setup.py
* Dynamic: network mode must be same when update a network with new offering
* Dynamic: add method networkModel.isAnyServiceSupportedInNetwork
* Dynamic: rename APIs and classes
* Dynamic: fix unit tests due to previous changes
* Dynamic: validateNetworkCidrSize when auto-create subnet
* Dynamic: check AS number overlap
* Dynamic: add ActionEvent
* Dynamic: small code optimization
* Dynamic: fix ui bugs after api rename
* Dynamic: add marvin and test for ASN ranges and AS numbers
* Dynamic: add account setting use.system.bgp.peers
also
- change the default value of routed.ipv4.vpc.max.cidr.size and routed.ipv4.vpc.min.cidr.size
- change the category of settings
* static: fix ui error when delete zone ipv4 subnets
* static: small UI polish
* Dynamic: throw exception when as number is required but not passed
* Dynamic: fix typo when create FRR directory which causes network deletion failures
* Dynamic: connect to ALL (or ALL dedicated) BGP peers if no BGP peer mapping for the network/vpc
* Dynamic: throw exception when as number is required for VPC but not passed
* Dynamic: list bgp peers by useSystemBgpPeers
* Dynamic: fix frr config in VPC VR when change bgp peers
* Dynamic: create frr config even if there is no VPC tiers
* Dynamic: list bgp peers by zoneid (required for account) and account
* Dynamic: only apply FRR config for vpc tiers with dynamic routing
* Dynamic: donot send commands to router if commands size is 0
* Dynamic: fix 'new IPv6 address is not valid' when update bgp peer without IPv6
* Dynamic: throw exception if fail to allocate AS number when create network/vpc with dynamic routing
* Dynamic: enable ipv6 unicast and 'ip nht resolve-via-default'
* Dynamic: delete network/vpc if fail to allocate AS number when create network/vpc with dynamic routing
* test: add unit tests for ASN APIs
* test: add unit tests for core module
* test: add unit tests for API responses
* test: add unit tests for BgpPeerTO
* test: add minor changes
* test: add tests for create/delete/update/list RoutingFirewallRuleCmd
* Static: show ip4 routes for vpc tiers
* test: fix smoke test failure caused by type change of as number
* test: add test for Ipv4SubnetForZoneCmd
* test: add test for Ipv4SubnetForGuestNetworkCmd and BgpPeerCmd
* UI: do not show redundant router when network mode is ROUTED as RVR is not supported
* UI: hide 'Conserve mode' when networkmode is ROUTED
* test: add unit tests for ListASNumbersCmdTest
* Static: remove allocated IPv4 subnet when delete a network or vpc
* test: add unit tests for BgpPeersRules
* Dynamic: set ipv4routing from network offering
* server: list as numbers and ipv4 subnets by keyword
* server: remove dedicated bgp peers and ipv4 subnets when delete an account or domain
* server: fix dedicated ipv4 subnet is allocated to other accounts
* UI: fix allocated time format
* server: ignore project is projectid is -1 so bgppeers/ipv4subnets works in project view
* UI: add project column to bgp peers and ipv4 subnets
* server: fix list AS numbers by domain admin or normal user
* server: fix network creation when ipv4 subnet is dedicated
* UI: polish network.js
* Dynamic: fix frr config for ipv6 routing
* Static routing: support cks cluster
* Static: get/create IPv4 subnet from dedicated subnets at first
* Dynamic: add BGP peers tab
* Static: remove redundant loops
* api: add since to api and response
* server: add unit tests
---------
Co-authored-by: Nicolas Vazquez <nicovazquez90@gmail.com>
Co-authored-by: Pearl Dsilva <pearl1594@gmail.com>
Co-authored-by: Harikrishna Patnala <harikrishna.patnala@gmail.com>
Co-authored-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* New feature: Change storage pool scope
* Added checks for Ceph/RBD
* Update op_host_capacity table on primary storage scope change
* Storage pool scope change integration test
* pull 8875 : Addressed review comments
* Pull 8875: remove storage checks, AbstractPrimayStorageLifeCycleImpl class
* Pull 8875: Fixed integration test failure
* Pull 8875: Review comments
* Pull 8875: review comments + broke changeStoragePoolScope into smaller functions
* Added UT for changeStoragePoolScope
* Rename AbstractPrimaryDataStoreLifeCycleImpl to BasePrimaryDataStoreLifeCycleImpl
* Pull 8875: Dao review comments
* Pull 8875: Rename changeStoragePoolScope.vue to ChangeStoragePoolScope.vue
* Pull 8875: Created a new smokes test file + A single warning msg in ui
* Pull 8875: Added cleanup in test_primary_storage_scope.py
* Pull 8875: Type in en.json
* Pull 8875: cleanup array in test_primary_storage_scope.py
* Pull:8875 Removing extra whitespace at eof of StorageManagerImplTest
* Pull 8875: Added UT for PrimaryDataStoreHelper and BasePrimaryDataStoreLifeCycleImpl
* Pull 8875: Added license header
* Pull 8875: Fixed sql query for vmstates
* Pull 8875: Changed icon plus info on disabled mode in apidoc
* Pull 8875: Change scope should not work for local storage
* Pull 8875: Change scope completion event
* Pull 8875: Added api findAffectedVmsForStorageScopeChange
* Pull 8875: Added UT for findAffectedVmsForStorageScopeChange and removed listByPoolIdVMStatesNotInCluster
* Pull 8875: Review comments + Vm name in response
* Pull 8875: listByVmsNotInClusterUsingPool was returning duplicate VM entries because of multiple volumes in the VM satisfying the criteria
* Pull 8875: fixed listAffectedVmsForStorageScopeChange UT
* listAffectedVmsForStorageScopeChange should work if the pool is not disabled
* Fix listAffectedVmsForStorageScopeChangeTest UT
* Pull 8875: add volume.removed not null check in VmsNotInClusterUsingPool query
* Pull 8875: minor refactoring in changeStoragePoolScopeToCluster
* Update server/src/main/java/com/cloud/storage/StorageManagerImpl.java
* fix eof
* changeStoragePoolScopeToZone should connect pool to all Up hosts
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>
* Mitigation for non-scalable Powerflex/ScaleIO clients
- Added ScaleIOSDCManager to manage SDC connections, checks clients limit, prepare and unprepare SDC on the hosts.
- Added commands for prepare and unprepare storage clients to prepare/start and stop SDC service respectively on the hosts.
- Introduced config 'storage.pool.connected.clients.limit' at storage level for client limits, currently support for Powerflex only.
* tests issue fixed
* refactor / improvements
* lock with powerflex systemid while checking connections limit
* updated powerflex systemid lock to hold till sdc preparation
* Added custom stats support for storage pool, through listStoragePools API
* code improvements, and unit tests
* unit tests fixes
* Update config 'storage.pool.connected.clients.limit' to dynamic, and some improvements
* Stop SDC on host after migration if no volumes mapped to host
* Wait for SDC to connect after scini service start, and some log improvements
* Do not throw exception (log it) when SDC is not connected while revoking access for the powerflex volume
* some log improvements
* Restart agent when host comes out of maintenance
* Don't send CreateStoragePoolCommand to hosts in maintenance mode
* CreateStoragePoolCommand can run when host in maintenance. Reverted the change to restart agent when host was already up and in maintenance
* Reverted changes done to ResourceManagerImplTest
* Create/Export OVA file of the VM on external vCenter host, to temporary conversion location (NFS)
* Fixed ova issue on untar/extract ovf from ova file
"tar -xf" cmd on ova fails with "ovf: Not found in archive" while extracting ovf file
* Updated VMware to KVM instance migration using OVA
* Refactoring and cleanup
* test fixes
* Consider zone wide pools in the destination cluster for instance conversion
* Remove local storage pool support as temporary conversion location
- OVA export not possible as the pool is not accessible outside host, NFS pools are supported.
* cleanup unused code
* some improvements, and refactoring
* import nic unit tests
* vmware guru unit tests
* Separate clone VM and create template file for VMware migration
- Export OVA (of the cloned VM) to the conversion location takes time.
- Do any validations with cloned VM before creating the template (and fail early).
- Updated unit tests.
* Check conversion support on host before clone vm / create template on vmware (and fail early)
* minor code improvements
* Auto select the host with instance conversion capability
* Skip instance conversion supported response param for non-KVM hosts
* Show supported conversion hosts in the UI
* Skip persistence map update if network doesn't exist
* Added support to export OVA from KVM host, through ovftool (when installed in KVM host)
* Updated importvm api param 'usemsforovaexport' to 'forcemstodownloadvmfiles', to be generic
* Updated hardcoded UI messages with message labels
* Updated UI to support importvm api param - forcemstodownloadvmfiles
* Improved instance conversion support checks on ubuntu hosts, and for windows guest vms
* Use OVF template (VM disks and spec files) for instance conversion from VMware, instead of OVA file
- this would further increase the migration performance (as it reduces the time for OVA preparation / archiving of the VM files into a single file)
* OVF export tool parallel threads code improvements
* Updated 'convert.vmware.instance.to.kvm.timeout' config default value to 3 hrs
* Config values check & code improvements
* Updated import log, with time taken and vm details
* Support for parallel downloads of VMware VM disk files while exporting OVF from MS, and other changes below.
- Skip clone for powered off VMs
- Fixes to support standalone host (with its default datacenter)
- Some code improvements
* rebase fixes
* rebase fixes
* minor improvement
* code improvements - threads configuration, and api parameter changes to import vm files
* typo fix in error msg
* Ability to specify NFS mount options while adding a primary storage and modify it later
* Pull 8947: Rename all occurrence of nfsopt to nfsMountOpt and added nfsMountOpts to ApiConstants
* Pull 8947: Refactor code - move into separate methods
* Pull 8947: CollectionsUtils.isNotEmpty and switch statement in LibvirtStoragePoolDef.java
* Pull 8947: UI - cancel maintainenace will remount the storage pool and apply the options
* Pull 8947: UI - moved edit NFS mount options to edit Primary Storage form
* Pull 8947: UI - moved 'NFS Mount Options' to below 'Type' in dataview
* Pull 8947: Fixed message in AddPrimaryStorage.vue
* Pull 8947: Convert _nfsmountOpts to Set in libvirtStoragePoolDef
* Pull 8947: Throw exception and log error if mount fails due to incorrect mount option
* Pull 8947: Added UT and moved integration test to component/maint
* Pull 8947: Review comments
* Pull 8947: Removed password from integration test
* Pull 8947: move details allocation to inside the if loop in getStoragePoolNFSMountOpts
* Pull 8947: Fixed a bug in AddPrimaryStorage.vue
* Pull 8947: Pool should remain in maintenance mode if mount fails
* Pull 8947: Removed password from integration test
* Pull 8947: Added UT
* Pull 8875: Fixed a bug in CloudStackPrimaryDataStoreLifeCycleImplTest
* Pull 8875: Fixed a bug in LibvirtStoragePoolDefTest
* Pull 8947: minor code restructuring
* Pull 8947 : added some ut for coverage
* Fix LibvirtStorageAdapterTest UT
During live migration of a VM from between hosts having different cgroup versions (cgroupv2 & cgroup), overcommit ratio is ignored.
This PR fixes the above issue.
This PR introduces the functionality of purging removed DB entries for CloudStack entities (currently only for VirtualMachine). There would be three mechanisms for purging removed resources:
Background task - CloudStack will run a background task which runs at a defined interval. Other parameters for this task can be controlled with new global settings.
API - New admin-only API purgeExpungedResources. It will allow passing the following parameters - resourcetype, batchsize, startdate, enddate. Currently, API is not supported in the UI.
Config for service offering. Service offerings can be created with purgeresources parameter which would allow purging resources immediately on expunge.
Following new global settings have been added:
expunged.resources.purge.enabled: Default: false. Whether to run a background task to purge the expunged resources
expunged.resources.purge.resources: Default: (empty). A comma-separated list of resource types that will be considered by the background task to purge the expunged resources. Currently only VirtualMachine is supported. An empty "value will result in considering all resource types for purging
expunged.resources.purge.interval: Default: 86400. Interval (in seconds) for the background task to purge the expunged resources
expunged.resources.purge.delay: Default: 300. Initial delay (in seconds) to start the background task to purge the expunged resources task.
expunged.resources.purge.batch.size: Default: 50. Batch size to be used during expunged resources purging.
expunged.resources.purge.start.time: Default: (empty). Start time to be used by the background task to purge the expunged resources. Use format yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.
expunged.resources.purge.keep.past.days: Default: 30. The number of days in the past from the execution time of the background task to purge the expunged resources for which the expunged resources must not be purged. To enable purging expunged resource till the execution of the background task, set the value to zero.
expunged.resource.purge.job.delay: Default: 180. Delay (in seconds) to execute the purging of an expunged resource initiated by the configuration in the offering. Minimum value should be 180 seconds and if a lower value is set then the minimum value will be used.
Documentation PR: apache/cloudstack-documentation#397
Signed-off-by: Abhishek Kumar <abhishek.mrt22@gmail.com>
Co-authored-by: Wei Zhou <weizhou@apache.org>
Co-authored-by: Suresh Kumar Anaparti <sureshkumar.anaparti@gmail.com>