Host-HA offers investigation, fencing and recovery mechanisms for host that for
any reason are malfunctioning. It uses Activity and Health checks to determine
current host state based on which it may degrade a host or try to recover it. On
failing to recover it, it may try to fence the host.
The core feature is implemented in a hypervisor agnostic way, with two separate
implementations of the driver/provider for Simulator and KVM hypervisors. The
framework also allows for implementation of other hypervisor specific provider
implementation in future.
The Host-HA provider implementation for KVM hypervisor uses the out-of-band
management sub-system to issue IPMI calls to reset (recover) or poweroff (fence)
a host.
The Host-HA provider implementation for Simulator provides a means of testing
and validating the core framework implementation.
Signed-off-by: Abhinandan Prateek <abhinandan.prateek@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.
This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.
Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
radomized comma separated list to which they will attempt connection
or reconnection in provided order. This removes need of a TCP LB on
port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
connecting agents will be required to present certificates issued
by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
authentication and connecting agents will not be required to present
certificates.
- A script `keystore-setup` is responsible for initial keystore setup
and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
SSH, and later provisioning is handled via an existing agent connection
using command-answers. The supported clients and agents are limited to
CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
connection, however rejects a revoked certificate used during SSL
handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
be used by mgmt server(s) for SSL negotiations and handshake. New
keystores will be named `cloud.jks`, any additional SSL certificates
should not be imported in it for use with tomcat etc. The `cloud.jks`
keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
the validity of them are same as the CA certificates.
New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial
Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates
Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed
UI changes:
- Button to download/save the CA certificates.
Misc changes:
- Upgrades bountycastle version and uses newer classes
- Refactors SAMLUtil to use new CertUtils
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-9669:egress destination cidr VR python script changes
CLOUDSTACK-9669:egress destination API and orchestration changes
CLOUDSTACK-9669: Added the ipset package in systemvm template
CLOUDSTACK-9669:Added licence header for new files
CLOUDSTACK-9669: replacing 0.0.0.0/0 with the network cidr
ipset member add with 0.0.0.0/0 fails. So 0.0.0.0/0 replaced with the network cidr.
In source cidr 0.0.0.0/0 is nothing but network cidr.
updated the default egress all cidr with network cidr
The 'force' option provided with the stopVirtualMachine API command is
often assumed to be a hard shutdown sent to the hypervisor, when in fact
it is for CloudStacks' internal use. CloudStack should be able to send
the 'hard' power-off request to the hosts.
When forced parameter on the stopVM API is true, power off (hard shutdown)
a VM. This uses initial changes from #1635 to pass the forced parameter
to hypervisor plugin via the StopCommand, and fixes force stop (poweroff)
handling for KVM, VMware and XenServer.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
* 4.9:
moved logrotate from cron.daily to cron.hourly for vpcrouter in cloud-early-config
CLOUDSTACK-9569: propagate global configuration router.aggregation.command.each.timeout to KVM agent
[4.9] CLOUDSTACK-9569: propagate global configuration router.aggregation.command.each.timeout to KVM agentThe router.aggregation.command.each.timeout in global configuration is only applied on new created KVM host.
For existing KVM host, changing the value will not be effective.
We need to propagate the configuration to existing host when cloudstack-agent is connected.
* pr/1856:
CLOUDSTACK-9569: propagate global configuration router.aggregation.command.each.timeout to KVM agent
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
[4.10] CLOUDSTACK-8746: VM Snapshotting implementation for KVM
* pr/977:
Fixes for testing VM Snapshots on KVM. Related to PR 977
CLOUDSTACK-8746: vm snapshot implementation for KVM
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
This commit implements basic Security Grouping for KVM in
Basic Networking.
It does not implement full Security Grouping yet, but it does:
- Prevent IP-Address source spoofing
- Allow DHCPv6 clients, but disallow DHCPv6 servers
- Disallow Instances to send out Router Advertisements
The Security Grouping allows ICMPv6 packets as described by RFC4890
as they are essential for IPv6 connectivity.
Following RFC4890 it allows:
- Router Solicitations
- Router Advertisements (incoming only)
- Neighbor Advertisements
- Neighbor Solicitations
- Packet Too Big
- Time Exceeded
- Destination Unreachable
- Parameter Problem
- Echo Request
ICMPv6 is a essential part of IPv6, without it connectivity will break or be very
unreliable.
For now it allows any UDP and TCP packet to be send in to the Instance which
effectively opens up the firewall completely.
Future commits will implement Security Grouping further which allows controlling UDP and TCP
ports for IPv6 like can be done with IPv4.
Regardless of the egress filtering (which can't be done yet) it will always allow outbound DNS
to port 53 over UDP or TCP.
Signed-off-by: Wido den Hollander <wido@widodh.nl>
(1) add support to create/delete/revert vm snapshots on running vms with QCOW2 format
(2) add new API to create volume snapshot from vm snapshot
(3) delete metadata of vm snapshots before stopping/migrating and recover vm snapshots after starting/migrating
(4) enable deleting of VM snapshot on stopped vm or vm snapshot is not listed in qcow2 image.
(5) enable smoke tests for vmsnaphsots on KVM
- Bump spring-framework version to 4.x and Jetty to version that runs with JDK8
- Bump servet dependency version
- Migrate spring xmls to version 4, fixes schema locations that are 3.0
dependent in various xmls.
- Fix failing tests due to spring upgrade
(Thanks @marcaurele Marc-Aurèle Brothier for fixing them)
* Fix test DeploymentPlanningManagerImplTest
* Fix GloboDNS test
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
-when processing static nat rule, add a mangle table rule, to mark the traffic
from the guest vm when it has associated static nat rule so that traffic gets
routed using the route tabe of the device which has public ip associated
-fix the case where nic_device_id is empty when ip is getting disassociated
resulting in empty deviceid in ips.json
-add utility methods in CsRule, and CsRoute to add 'ip rule' and 'ip route' rules respectivley
-ensure traffic from all public interfaces are connection marked with device number, and restored
for the reverse traffic. use the connection marked number to do device specific routing table lookup
fill the device specific routing table with default route
-component tests for testing multiple public interfaces of VR
CLOUDSTACK-9402 : Support for underlay features (Source & Static NAT to underlay) in Nuage VSP pluginSupport for underlay features (Source & Static NAT to underlay) with Nuage VSP SDN Plugin including Marvin test coverage for corresponding Source & Static NAT features on master. Moreover, our Marvin tests are written in such a way that they can validate our supported feature set with both Nuage VSP SDN platform's overlay and underlay infra.
PR contents:
1) Support for Source NAT to underlay feature on master with Nuage VSP SDN Plugin.
2) Support for Static NAT to underlay feature on master with Nuage VSP SDN Plugin.
3) Marvin test coverage for Source & Static NAT to underlay on master with Nuage VSP SDN Plugin.
4) Enhancements on our exiting Marvin test code (nuagevsp plugins directory).
5) PEP8 & PyFlakes compliance with our Marvin test code.
* pr/1580:
CLOUDSTACK-9402 : Support for underlay features (Source & Static NAT to underlay) in Nuage VSP plugin
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-9402 : Marvin tests for Source NAT and Static NAT features verification with NuageVsp (both overlay and underlay infra).
Co-Authored-By: Prashanth Manthena <prashanth.manthena@nuagenetworks.net>, Frank Maximus <frank.maximus@nuagenetworks.net>
CLOUDSTACK-9503: Increased the VR script timeout. Most of the changes are about converting int/long time values to joda Duration.
* pr/1745:
CLOUDSTACK-9503: Increased the VR script timeout. Most of the changes are about converting int/long time values to joda Duration.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Previously it was using the HttpClient to make an initial request
to an ISO. This would follow redirects. Then it would make
another request using built-in Java URL and InputStream, which
doesn't follow redirects. This results in the ISO getting stuck
at 0% forever and also causing DOS effects.
This PR adds an ability to Pass a new parameter, locationType,
to the “createSnapshot” API command. Depending on the locationType,
we decide where the snapshot should go in case of managed storage.
There are two possible values for the locationType param
1) `Standard`: The standard operation for managed storage is to
keep the snapshot on the device. For non-managed storage, this will
be to upload it to secondary storage. This option will be the
default.
2) `Archive`: Applicable only to managed storage. This will
keep the snapshot on the secondary storage. For non-managed
storage, this will result in an error.
The reason for implementing this feature is to avoid a single
point of failure for primary storage. Right now in case of managed
storage, if the primary storage goes down, there is no easy way
to recover data as all snapshots are also stored on the primary.
This features allows us to mitigate that risk.
CLOUDSTACK-9319: Use timeout when applying config to virtual routerFrom the [JIRA issue](https://issues.apache.org/jira/browse/CLOUDSTACK-9319):
> The timeout parameter is not passed down to `applyConfigToVR` inside `VirtualRoutingResource` in all cases.
>
> This timeout is worked out as 3 seconds per command or 120 seconds (whichever is larger), but because it's not passed to the first invocation, the default (120 seconds, DEFAULT_EXECUTEINVR_TIMEOUT) is used.
>
> In a recent upgrade of our Virtual Routers, the timeout was being hit and increasing `router.aggregation.command.each.timeout` had no effect. I built a custom 4.8 agent with the timeout increased to allow the upgrade to continue.
* pr/1451:
Remove dangerous prototype of applyConfigToVR
Use timeout when applying config to virtual router
Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
Often, patch and security releases do not require schema migrations or
data migrations. However, if an empty upgrade class and associated
scripts are not defined, the upgrade process will break. With this
change, if a release does not have an upgrade, a noop DbUpgrade is added
to the upgrade path. This approach allows the upgrade to proceed and
for the database to properly reflect the installed version. This change
should make the release process simpler as RMs no longer need to
rememeber to create this boilerplate code when starting a new release.
Beginning with the 4.8.2.0 and 4.9.1.0 releases, the project will
formally adopt a four (4) position release number to properly accomodate
rekeases that contain only CVE fixes. The DatabaseUpgradeChecker and
Version classes made assumptions that they would always parse and
compare three (3) position version numbers. This change adds the
CloudStackVersion value object that supports both three (3) and four (4)
version numbers. It encapsulates version comparsion logic, as well as,
the rules to allow three (3) and four (4) to interoperate.
* Modifies DatabaseUpgradeChecker to handle derive an upgrade path for
a version that was not explicitly specified. It determines the
releases the first release before it with database migrations and uses
that list as the basis for the list for version being calculated. A
noop upgrade is then added to the list which causes no schema changes
or data migrations, but will update the database to the version.
* Adds unit tests for the upgrade path calculation logic in
DatabaseUpgradeChecker
* Removes dummy upgrade logic for the 4.8.2.0 introduced in previous
versions of this patch
* Introduces the CloudStackVersion value object which parses and
compares three (3) and four (4) position version numbers. This class
is intended to replace com.cloud.maint.Version.
* Adds the junit-dataprovider dependency -- allowing test data to be
concisely generated separately from the execution of a test case.
Used extensively in the CloudStackVersionTest.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>