CLOUDSTACK-9155 make sure logrotate is effective for cloud.logMany processes on the VRs log to cloud.log. When log rotate kicks in, the file is rotated but the scripts still write to the old inode (cloud.log.1 after rotate). Tis quickly fills up the tiny log partition.
Using 'copytruncate' is a small tradeoff, there is a slight change of missing a log entry, but in the old situation nothing ended up in cloud.log after rotate (except for stuff that was (re)started) so I think this is the best solution until we properly rewrite the script to either use their own script or syslog.
More details: https://issues.apache.org/jira/browse/CLOUDSTACK-9155
* pr/1235:
CLOUDSTACK-9155 make sure logrotate is effective
Signed-off-by: Remi Bergsma <github@remi.nl>
Many processes on the VRs log to cloud.log. When logrotate
kicks in, the file is rotated but the scripts still write
to the old inode (cloud.log.1 after rotate). Tis quickly
fills up the tiny log partition.
Using 'copytruncate' is a tradeoff, there is a slight
change of missing a log entry, but in the old situation
we were missing all of them after logrotate.
- Just like with RVRs, use the VRID 51 instead of making it dependent on the VPCID
- Reason: arbitary unique number 0..255 used to differentiate multiple instances of vrrpd running on the same NIC (and hence same socket). virtual_router_id 51
- Use the router to retrieve the instance ID
- Check if the VPC is redundant in order to reuse the private gateway address.
- Brings the private gateways interfaces up.
Send arping to the gateway instead of our own addressWe need to send an Unsolicited ARP to the gateway, instead of our own address. We now encounter problems when people deploy/destroy/deploy and get the same public ip.
Packets arrive, but with incorrect / cached mac and are ignored by the routervm kernel.
Run arping manually to update the arp-cache on the gateway and things start to work.
Then we discovered the `arping` is actually done, but sent to its own address. Therefore the gateway doesn't pick it up. We only saw this happening when rapid deploy tools are used, like Terraform that do deploy/destroy/deploy and might get the same ip but on a new router having a new mac.
```
2015-12-03 18:07:25,589 CsHelper.py execute:160 Executing: arping -c 1 -I eth1 -A -U -s 192.168.23.8 192.168.23.1
```
The integration tests seem happy, although the full run is still ongoing:
```
=== TestName: test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Status : SUCCESS ===
```
Thanks @sspans for helping trouble shoot this. Ping @wilderrodrigues can you review please?
* pr/1163:
CLOUDSTACK-9097 Make public ip work immediately
Signed-off-by: Remi Bergsma <github@remi.nl>
We need to send an Unsolicited ARP to the gateway, instead of our own address. We now encounter problems when people deploy/destroy/deploy and get the same public ip.
[4.6.1] CLOUDSTACK-9015 - Redundant VPC Virtual Router's state is BACKUP & BACKUP or MASTER & MASTERThis PR closes#1064
All the details can be found in the original PR, which won't be merged because it was created agains master. Once this PR is closed, the original one will be also closed.
* pr/1070:
CLOUDSTACK-9015 - Delete public IP in order to get both IP and NAT rule removed.
CLOUDSTACK-9015 - Add test to cover the rVPC routers stop/start/reboot scenario
CLOUDSTACK-9015 - Make sure the Backup router can talk to the Master router after a stop/start/reboot
Signed-off-by: Remi Bergsma <github@remi.nl>
- Stop KeepaliveD/ConntrackD if the eth2 (guest) interface is not configured and UP
- Only setup the redundancy after all the router configuration is done
- Open the FW for the VRRP communitation
- 224.0.0.18 and 225.0.0.50
- Set keepalived.conf.templ by default to use interface eth2 (guest)
- It will be reconfigured anyway, but having eth2 there is more clear
CLOUDSTACK-8993: DHCP fails with "no address available" when an IP is reused
Repopulate /etc/dhcphosts.txt to remove old entries with the same IP address.
* pr/981:
CLOUDSTACK-8993: DHCP fails with "no address available" when an IP is reused
Signed-off-by: Remi Bergsma <github@remi.nl>
- If we stop/start a router, the state in the file will still say MASTER, when it is actually not
- Checking the state based on the interface (eth1) state
- Once master.py is called by keepalived, save the state in the json file to BACKUP just to make sure it's also written there
- Do not use the API call because it will read what is in the database, that might not have been updated yet
* Check the status in the router directly instead
- Remove all the sleeps
- It was working before because the Routers were restarting about 10 times for each operation
e.g. adding a VM to a network ot acquiring a new IP.
- Adding stat_rules of internal LB to iptables
We needed one extra rule in the INPUT chain
- With the keepalived fixed they should not be needed anymore. So first reducing them drasticaly
- I am now making a backup of the template file, write to the template file and compare it with the existing configuration
- The template file is recovered afer the process
- I also check if the process is running
- I fixed a bug in the compare method
- I am now updating the configuration variable once the file content is flushed to disk
- The default is Accept and will be changed based on the configuration of the offering.
CLOUDSTACK-8934 - The default egress is set as Deny in the router.
- We had to change it on the Java side in order to make the apply it once the default is defined as allowed on the net offering
- That's not the place to fix the default routes for redundant VPC,
- Adding tests to cover PF and FW in isolated networks
* Will still add some tests for egress as well
- The cidr was replaced by the single IP, which broke the feature.
- Wait during transition from master to backup otherwise the test fails due to wronge state