The interval between keepalived.ts and keepalived.ts2 should be >= 60 seconds in
normal condition, because every 10 seconds keepalived.ts would be updated, and
at least every 60 seconds, keepalived.ts would be copy to keepalived.ts2.
If the interval is less than 60 seconds, then keepalived process failed to
update keepalived.ts every 10 seconds.
Take some delay of updating into consideration, check_heartbeat.sh would use 30
seconds as a way to tell keepalived process is alive or not.
In the previous version it would take at least 1 seconds for arping, and it
would be big if the VR has more than 30 IPs - our biglock default timeout is 30
seconds.
Fix it by send out two arping immediately, and then sleep 1 second for router to
update arp cache.
This fix would work because:
1. When booting up the router, there is possible that no ip information have
been set for the interface(CS would do it after confirm router is up), so the
interface isn't associate with any ip, then ifconfig cannot work. We have to use
ifup, this is especially true for the first router become master.
2. After booting up phase, the ip would be associated with interfaces, then we
can use ifconfig to bring them up.
Also added license header for passwd_server_ip
Ported from:
commit 1072ec7ae36911ed794c182a1146025a0e969ea9
Author: Sheng Yang <sheng.yang@citrix.com>
Date: Wed Sep 12 11:15:33 2012 -0700
CS-16318: Update the fix with some tweak
1. The old fix run cloud-passwd-srvr twice because cloud-passwd-srvr is
still in the list of enabled_svcs
2. The lock should be applied on serve_password.sh, which controlled the
accessing to the password. Applied on the MASTER/BACKUP switch is useless, two
instance of serve_password.sh would still able to access the password file at
the same time.
3. Password service is a part of redundant router state transition process
now, so if the service failed to start, then the transition failed.
4. Restart password service should be put before restart dnsmasq, which
would sent out DHCP offer to the user vms. If user VMs got the DHCP offer first
but failed to get password, there would be an issue.
Reviewed-by: Anthony Xu
commit fa94da114099da357df7daa1aad3c327868393ca
Author: Jayapal Reddy <jayapalreddy.uradi@citrix.com>
Date: Wed Sep 12 17:57:03 2012 +0530
Bug:CS-16318 Starting password server on the both IPs in RRVM
Reviewed-by: Abhi
Conflicts:
patches/systemvm/debian/config/opt/cloud/bin/passwd_server
Signed-off-by: Chip Childers <chip.childers@gmail.com>
I've assumed that Gavin's commit is appropriate, based
on an assumption that we will keep these files in the source
tree. If https://issues.apache.org/jira/browse/LEGAL-146
results in a different opionion from the members, then we
will end up having to do something more drastic anyway.
We need to use ifup/ifdown to bring up the interfaces, because ifconfig don't
know the ip of the interface after we modify cloud-early-config to avoid
first start up of public interface.
Reviewed-by: Edison
The issue happened quite rare, but indeed can show.
And when the issue happen, the status of redundant router would be "Status:
FAULT".
It's due to ipassoc.sh wasn't executed before the system bring eth2 up and go to
master mode, then eth2 wasn't configured correctly. Then "ip route add default
xx" can't complete.
This commit should fixes the issue.
This patch enable redundant virtual routers.
1. To enable this feature, db need to be updated using follow SQL by now(we
would get a UI way later):
UPDATE network_offerings SET redundant_router=1 WHERE guest_type="Virtual" AND
system_only=0;
2. System would try to start up two routers at different hosts. But if there is
only one host in the zone, system would start up two routers on it.
3. The failover part is using keepalived, and connection tracking part is using
conntrackd. There would be one master router and one backup router. The status
of router(master or backup) can be query from the database table domain_router
now. Management server would update the status every 30s by default.
4. The routers for the same zone would use same external NIC(same ip and mac).
The script used for fail-over would ensure only one external NIC present in the
network at any time.
5. Currently management server don't got the ability to stop one of router is
both of them reported as master. The feature is in the todo list.
After two routers start up, disconnect anyone of them, the guest network
shouldn't be affected, and established connection(http, ssh, etc.) should still
works. The fail-over on gateway part should be 3~4 seconds.
Currently the patch works with KVM. Would deal with vmware and XenServer soon.