Sometime when keepalived start up(during system boot up period), it would fail
to(likely due to unable to receive the packet), and think itself is the only
router, then make itself master.
Add 10 seconds delay after start up to work around the issue.
This patch enable redundant virtual routers.
1. To enable this feature, db need to be updated using follow SQL by now(we
would get a UI way later):
UPDATE network_offerings SET redundant_router=1 WHERE guest_type="Virtual" AND
system_only=0;
2. System would try to start up two routers at different hosts. But if there is
only one host in the zone, system would start up two routers on it.
3. The failover part is using keepalived, and connection tracking part is using
conntrackd. There would be one master router and one backup router. The status
of router(master or backup) can be query from the database table domain_router
now. Management server would update the status every 30s by default.
4. The routers for the same zone would use same external NIC(same ip and mac).
The script used for fail-over would ensure only one external NIC present in the
network at any time.
5. Currently management server don't got the ability to stop one of router is
both of them reported as master. The feature is in the todo list.
After two routers start up, disconnect anyone of them, the guest network
shouldn't be affected, and established connection(http, ssh, etc.) should still
works. The fail-over on gateway part should be 3~4 seconds.
Currently the patch works with KVM. Would deal with vmware and XenServer soon.
haproxy tunning:
0. Test case:
httpd running in 5 user VMs, all of them created on a xenserver host(16 core, 42G memroy, 10G network)
domR running on an anther host with same hardware configuration.
test application, ab, running on anther host behind an anther seperate switch
1.haproxy is not a memory intensive app. I can get 4625.96 connection/s with 1G memory. While it's really a CPU intensive app, domR always uses around 100% CPU on the host.
2.By default, you can't get better connection/s rate, because ip_conntrack_max and tw_bucket are too small, you will see the error in domR like:
"TCP: time wait bucket table overflow" or "nf_conntrack: table full, dropping packet".
So I increase these numbers to 1000000 from 65536, then I can steadly get around 4600 connection/s when memory is >= 1G.
Here is the connection per second, tested by "ab -n 1000000 -c 100 http://192.168.170.152:880/test.html"
domR memory conn/s
128M: 3545.55
256M: 4081.38
512M: 4318.18
1G: 4625.96
7G: 4745.53
3. If I enable notrack for both connections between domr/user vm, and public network, that tell iptable in domR don't track the connection during my test, then I can get better number, around
5800 connections/s. But we can't enable notrack, as iptables is used to track throughput in domR.
4. In a word, with this commit, the connection rate of haproxy can be increased from 1000-2000/s to 4700/s when domR's memory is larger than 1G.
5. How many CPU need to assign to domR to get this number? Haven't finished yet, as CPU is shared by all the VMs on the host, if other VMs are busy, it will impact the performance of haproxy.