This enables the firewall/mangle tables rules to ACCEPT instead of RETURN, which
is the same behaviour as observed in ACS 4.5. By accepting the traffic, guest
VMs will be able to communicate tcp traffic between each other over snat public
IPs.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- do not keep passwords in databag (/etc/cloudstack/vmpasswd.json)
- process only the password we get in (vm_password.json) from mgt server
- lookup the correct passwd server instead of adding passwd to all of them
Example:
- 4 tiers and 199 VMs running
- Start vm 200 would cause new passwd from vm_password.json (1) to be merged with /etc/cloudstack/vmpasswd.json (199)
- A curl command was exected foreach password (200) foreach tier (4) resulting in 800 calls
- In fact, since passwds are never cleaned it could very well be even more as the ip address was the key in the json file so until the ip address was reused the original password would remain and be sent to passwd server every time another vm starts.
- This took ~40 seconds
Now we just figure out the right tier and only process the new password resulting in a single curl call.
- takes 0,03 seconds!
when a shared network is secondary the default gateway gets overwritten by a bogus one
dnsmasq does the right thing and replaces it with its own default which is not good for us
so check for '0.0.0.0'
The password server in RVRs has wrong parameters as the gateway of guest nics is None.
In this case, we should get the gateway from /var/cache/cloud/cmdline.
-when processing static nat rule, add a mangle table rule, to mark the traffic
from the guest vm when it has associated static nat rule so that traffic gets
routed using the route tabe of the device which has public ip associated
-fix the case where nic_device_id is empty when ip is getting disassociated
resulting in empty deviceid in ips.json
-add utility methods in CsRule, and CsRoute to add 'ip rule' and 'ip route' rules respectivley
-ensure traffic from all public interfaces are connection marked with device number, and restored
for the reverse traffic. use the connection marked number to do device specific routing table lookup
fill the device specific routing table with default route
-component tests for testing multiple public interfaces of VR
CLOUDSTACK-9598: wrong defaut gateway for the nic in non-default network when guest VM has nic's in more than one guest network set the tag for each host in /etc/dhcphosts.txt, and use the tag to add exception in /etc/dhcpopts.txt to prevent sending default route, dns server in case if the nic is in non-default network
this was the behaviour with edithosts.sh prior to 4.6
* pr/1766:
CLOUDSTACK-9598: wrong defaut gateway for the nic in non-default network
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-9498: VR CsFile search utility methods fail when search stThere is no real use of python 're' module in CsFile.py utility methods searchString, deleteLine. Regular string search is sufficient. These methods are used only for VPN user add/delete. Since VPN user password can have python 're' module meta characters, it interfere with search functionality.
Replacing re.search() with regular string search instead.
Change is confined to VPN add/delete users. Have run the test/integration/component/test_vpn_users.py
VPN remote access user limit tests ... === TestName: test_01_VPN_user_limit | Status : SUCCESS ===
ok
Test create VPN when L2TP port in use ... === TestName: test_02_use_vpn_port | Status : SUCCESS ===
ok
Test create NAT rule when VPN when L2TP enabled ... === TestName: test_03_enable_vpn_use_port | Status : SUCCESS ===
ok
Test add new users to existing VPN ... === TestName: test_04_add_new_users | Status : SUCCESS ===
ok
Test add duplicate user to existing VPN ... === TestName: test_05_add_duplicate_user | Status : SUCCESS ===
ok
Test as global admin, add a new VPN user to an existing VPN entry ... === TestName: test_06_add_VPN_user_global_admin | Status : SUCCESS ===
ok
Test as domain admin, add a new VPN user to an existing VPN entry ... === TestName: test_07_add_VPN_user_domain_admin | Status : SUCCESS ===
ok
* pr/1680:
CLOUDSTACK-9498: VR CsFile search utility methods fail when search string has 're' meta chars, and causing VPN user add/deelte to fail
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
when guest VM has nic's in more than one guest network set the tag for
each host in /etc/dhcphosts.txt, and use the tag to add exception in
/etc/dhcpopts.txt to prevent sending default route, dns server in case if the nic is in non-default network
this was the behaviour with edithosts.sh prior to 4.6
added new test case test_router_dhcp_opts to test DHCP option file use of cloudstack
The VR executes a ip route flush command as part of configurations. This command performs a
DNS lookup on the VR hostname. Since the VR does not have a DNS entry, the ip command would
wait 5 seconds before timing out and executing the flush operation. This fix adds the VR
hostname to /etc/hosts mapped to 127.0.0.1 to answer the DNS lookup – reducing the
execution time.
In some cases the UDP checksums in packets from DHCP servers are
incorrect. This is a problem for some DHCP clients that ignore
packets with bad checksums. This patch inserts an iptables rule
to ensure DHCP servers always send packets with correct checksums.
Due to this bug DHCP offers are sometimes not accepted by Instances.
The end-result without this fix is no connectivity for the Instance
due to the lack of a IPv4 address.
This is also commited in OpenStack:
- https://github.com/projectcalico/felix/issues/40
- https://review.openstack.org/148718
- https://bugzilla.redhat.com/show_bug.cgi?id=910619
Signed-off-by: Wido den Hollander <wido@widodh.nl>
're' meta chars, and causing VPN user add/deelte to fail
-there is no real use of python 're' in CsFile.py utility methods searchString, deleteLine
Replacing with regular string search instead.
-modifying the smoke test for VPN user add/delete to have all permissable chars
DNS on VR should not be publically accessible as it may be prone to DNS
amplification/reflection attacks. This fixes the issue by only allowing VR
DNS (port 53) to be accessible from guest network cidr, as per the fix in:
https://issues.apache.org/jira/browse/CLOUDSTACK-6432
- Only allows guest network cidrs to query VR DNS on port 53.
- Includes marvin smoke test that checks the VR DNS accessibility checks from
guest and non-guest network.
- Fixes Marvin sshClient to avoid using ssh agent when password is provided,
previous some environments may have seen 'No existing session' exception without
this fix.
- Adds a new dnspython dependency that is used to perform dns resolutions in the
tests.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Lower the time we wait for interfaces to appearWaiting for interfaces is tricky. They might never appear.. for example when we have entries in `/etc/cloudstack/ips.json` that haven't been plugged yet. Waiting this long makes everything horribly slow (every vm, interface, static route, etc, etc, will hit this wait, for every device). We've seen CloudStack send an `ip_assoc.json` command for `eth1` public nic only and then the router goes crazy waiting for all other interfaces that were there before reboot and aren't there. If only the router would return to the mgt server a success of `eth1`, it would get the command for `eth2` etc etc. Obviously, a destroy works much faster because no state services, so no knowledge of previous devices so no waits :-)
After a stop/start the router has state in `/etc/cloudstack/ips.json` and every commands waits. Eventually hitting the hardcoded 120 sec timeout.
* pr/1471:
lower the time we wait for interfaces to appear
Signed-off-by: Will Stevens <williamstevens@gmail.com>
CLOUDSTACK-6975: Prevent dnsmasq from starting on backup redundant RvRRebase of PR #1509 against the 4.7 branch as requested by @swill
One LGTM from @ustcweizhou carried from previous PR. Previous PR will be closed.
Description from PR #1509:
CLOUDSTACK-6975 refers to service monitoring bringing up dnsmasq but this is no-longer accurate, as service monitoring is not active on the post-4.6 routers. These routers still suffer an essentially identical issue, however, because "dnsmasq needs to be restarted each time configure.py is called in order to avoid lease problems." As such, dnsmasq is still running on backup RvRs, causing the issues described in CLOUDSTACK-6975.
This PR is based on a patch submitted by @ustcweizhou. The code now checks the redundant state of the router before restarting dnsmasq.
RvR networks without this patch have dnsmasq running on both master and backup routers. RvR networks with this patch have dnsmasq running on only the master router.
* pr/1514:
CLOUDSTACK-6975: Prevent dnsmasq from starting on backup redundant RvR.
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Seems to have a license issue so reverting for now.
This reverts commit 9a20ab8bcbbd39aa012a0ec5a65e66bcc737ee0e, reversing
changes made to 7a0b37a29a8be14011427dcf61bf3ea86e47dbf4.
Reimplement router.redundant.vrrp.interval settingGlobal setting `router.redundant.vrrp.interval` is not used any more and it is now set to a hardcoded 1.
This results in a failover from master->backup when the backup doesn't hear from the master in ~3.6sec. This is a bit too tight, as we've seen failovers during live migrations. We could reproduce it in about half of the cases. Setting this to setting to 2 (tested it by hardcoding it in the systemvms) gives twice as much time and we didn't see issues any more. Instead of updating the hardcoded setting from 1 to 2, I reimplemented the global setting by sending it to the router with the cmd_line, as the non-VPC router also does.
Background:
Why is the maximum failover time in the example 3.6 seconds? This comes from the advertisement interval and the skew time. The default advertisement interval is 1 second (configurable in keepalived.conf). The skew time helps to keep everyone from trying to transition at once. It is a number between 0 and 1, based on the formula (256 - priority) / 256
As defined in the RFC, the backup must receive an advertisement from the master every (3 * advert_int) + skew_time seconds. If it doesn't hear anything from the master, it takes over. With a backup router priority of 100 (as in the example), the failover will happen at most 3.6 seconds after the master goes down.
Source: http://www.hollenback.net/KeepalivedForNetworkReliability
* pr/1486:
Configure rVPC for router.redundant.vrrp.interval advert_int setting
Have rVPCs use the router.redundant.vrrp.interval setting
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Restore iptables at once using iptables-restore instead of calling iptables numerous timesThis makes handling the firewall rules about 50-60 times faster because it is generated in memory and then loaded once. It's work by @borisroman see PR #1400. Reopened it here because I think this is a great improvement.
* pr/1482:
Resolve conflict as forceencap is already in master
Split the cidr lists so we won't hit the iptables-resture limits
Check the existence of 'forceencap' parameter before use
Do not load previous firewall rules as we replace everyhing anyway
Wait for dnsmasq to finish restart
Remove duplicate spaces, and thus duplicate rules.
Restore iptables at once using iptables-restore instead of calling iptables numerous times
Add iptables copnversion script.
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Honour GS use_ext_dns and redundant VR VIPThis patch addresses two issues:
On redundant VR setups, the primary resolver being handed out to instances is the guest_ip (primary IP for the VR). This might lead to problems upon failover, at least while the DHCP lease doesn't update (because the primary resolver will be checked first until times out, however it'll be gone upon failover).
If Global Setting use_ext_dns is true, we don't want the VR to be the primary resolver at all.
* pr/1536:
This patch addresses two issues:
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Handle private gateways more reliablyWhen initialising a VPC router we need to know which IP/device corresponds to a private gateway. This is to solve a problem when stop/starting a VPC router (which gets the private gateway config as a guest network and as a result breaks the functionality). You read it right, the private gateway is sent as type=guest after reboot and type=public initially.
Before this change, you could add a private gw to a running router but you couldn't restart it (it would mix up the tiers). Now the private gateway is detected properly and it works just fine.
Booting without private gateway:
```
root@r-167-VM:~# cat /etc/cloudstack/cmdline.json
{
"config": {
"baremetalnotificationapikey": "V2l1u3wKJVan01h8kq63-5Y5Ia3VLEW1v_Z6i-31QIRJXlt5vkqaqf6DVcdK0jP3u79SW6X9pqJSLSwQP2c2Rw",
"baremetalnotificationsecuritykey": "OXI16srCrxFBi-xOtEwcYqwLlMfSFTlTg66YHtXBBqR7HNN1us3HP5zWOKxfVmz4a3C1kUNLPrUH13gNmZlu4w",
"disable_rp_filter": "true",
"dns1": "8.8.8.8",
"domain": "cs2cloud",
"eth0ip": "169.254.0.42",
"eth0mask": "255.255.0.0",
"host": "192.168.22.61",
"name": "r-167-VM",
"port": "8080",
"privategateway": "None",
"redundant_router": "false",
"template": "domP",
"type": "vpcrouter",
"vpccidr": "10.0.0.0/24"
},
"id": "cmdline"
```
Booting with private gateway:
```
root@r-167-VM:~# cat /etc/cloudstack/cmdline.json
{
"config": {
"baremetalnotificationapikey": "V2l1u3wKJVan01h8kq63-5Y5Ia3VLEW1v_Z6i-31QIRJXlt5vkqaqf6DVcdK0jP3u79SW6X9pqJSLSwQP2c2Rw",
"baremetalnotificationsecuritykey": "OXI16srCrxFBi-xOtEwcYqwLlMfSFTlTg66YHtXBBqR7HNN1us3HP5zWOKxfVmz4a3C1kUNLPrUH13gNmZlu4w",
"disable_rp_filter": "true",
"dns1": "8.8.8.8",
"domain": "cs2cloud",
"eth0ip": "169.254.2.227",
"eth0mask": "255.255.0.0",
"host": "192.168.22.61",
"name": "r-167-VM",
"port": "8080",
"privategateway": "10.201.10.1",
"redundant_router": "false",
"template": "domP",
"type": "vpcrouter",
"vpccidr": "10.0.0.0/24"
},
"id": "cmdline"
```
And:
```
cat cmdline
vpccidr=10.0.0.0/24 domain=cs2cloud dns1=8.8.8.8 privategateway=10.201.10.1 template=domP name=r-167-VM eth0ip=169.254.2.227 eth0mask=255.255.0.0 type=vpcrouter disable_rp_filter=true baremetalnotificationsecuritykey=OXI16srCrxFBi-xOtEwcYqwLlMfSFTlTg66YHtXBBqR7HNN1us3HP5zWOKxfVmz4a3C1kUNLPrUH13gNmZlu4w baremetalnotificationapikey=V2l1u3wKJVan01h8kq63-5Y5Ia3VLEW1v_Z6i-31QIRJXlt5vkqaqf6DVcdK0jP3u79SW6X9pqJSLSwQP2c2Rw host=192.168.22.61 port=8080
```
Logs:
```
2016-02-24 20:08:45,723 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl] (Work-Job-Executor-4:ctx-458d4c52 job-1402/job-1403 ctx-d5355fca) (logid:5772906c) Set privategateway field in cmd_line.json to 10.201.10.1
```
* pr/1474:
Handle private gateways more reliably
Add private gateway IP to router initialization config
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Apply static routes on change to master stateRefactored static routes for private gateways so they also get loaded when the router switches to master state. Otherwise they're lost and connections drop after fail over.
* pr/1472:
apply static routes on change to master state
Signed-off-by: Will Stevens <williamstevens@gmail.com>
On redundant VR setups, the primary resolver being handed out to instances is the guest_ip (primary IP for the VR). This might lead to problems upon failover, at least while the DHCP lease doesn't update (because the primary resolver will be checked first until times out, however it'll be gone upon failover).
If Global Setting use_ext_dns is true, we don't want the VR to be the primary resolver at all.
They might never appear.. for example when we have entries in
/etc/cloudstack/ips.json that haven't been plugged yet. Waiting
this long makes everything horribly slow (every vm, interface,
static route, etc, etc, will hit this wait, for every device).