Merge pull request #1486 from remibergsma/reimplement-vrrp-setting-47

Reimplement router.redundant.vrrp.interval settingGlobal setting `router.redundant.vrrp.interval` is not used any more and it is now set to a hardcoded 1.

This results in a failover from master->backup when the backup doesn't hear from the master in ~3.6sec. This is a bit too tight, as we've seen failovers during live migrations. We could reproduce it in about half of the cases. Setting this to setting to 2 (tested it by hardcoding it in the systemvms) gives twice as much time and we didn't see issues any more. Instead of updating the hardcoded setting from 1 to 2, I reimplemented the global setting by sending it to the router with the cmd_line, as the non-VPC router also does.

Background:
Why is the maximum failover time in the example 3.6 seconds? This comes from the advertisement interval and the skew time. The default advertisement interval is 1 second (configurable in keepalived.conf). The skew time helps to keep everyone from trying to transition at once. It is a number between 0 and 1, based on the formula (256 - priority) / 256

As defined in the RFC, the backup must receive an advertisement from the master every (3 * advert_int) + skew_time seconds. If it doesn't hear anything from the master, it takes over. With a backup router priority of 100 (as in the example), the failover will happen at most 3.6 seconds after the master goes down.

Source: http://www.hollenback.net/KeepalivedForNetworkReliability

* pr/1486:
  Configure rVPC for router.redundant.vrrp.interval advert_int setting
  Have rVPCs use the router.redundant.vrrp.interval setting

Signed-off-by: Will Stevens <williamstevens@gmail.com>
This commit is contained in:
Will Stevens 2016-05-18 15:52:38 -04:00
commit ebc70a51e2
4 changed files with 11 additions and 0 deletions

View File

@ -1598,6 +1598,9 @@ Configurable, StateListener<VirtualMachine.State, VirtualMachine.Event, VirtualM
if (isRedundant) {
buf.append(" redundant_router=1");
final int advertInt = NumbersUtil.parseInt(_configDao.getValue(Config.RedundantRouterVrrpInterval.key()), 1);
buf.append(" advert_int=").append(advertInt);
final Long vpcId = router.getVpcId();
final List<DomainRouterVO> routers;
if (vpcId != null) {

View File

@ -154,3 +154,7 @@ class CsCmdLine(CsDataBag):
return self.idata()['useextdns']
return False
def get_advert_int(self):
if 'advert_int' in self.idata():
return self.idata()['advert_int']
return 1

View File

@ -113,6 +113,7 @@ class CsFile:
self.new_config[sind:eind] = content
def greplace(self, search, replace):
logging.debug("Searching for %s and replacing with %s" % (search, replace))
self.new_config = [w.replace(search, replace) for w in self.new_config]
def search(self, search, replace):

View File

@ -138,6 +138,9 @@ class CsRedundant(object):
" router_id ", " router_id %s" % self.cl.get_name())
keepalived_conf.search(
" interface ", " interface %s" % guest.get_device())
keepalived_conf.search(
" advert_int ", " advert_int %s" % self.cl.get_advert_int())
keepalived_conf.greplace("[RROUTER_BIN_PATH]", self.CS_ROUTER_DIR)
keepalived_conf.section("authentication {", "}", [
" auth_type AH \n", " auth_pass %s\n" % self.cl.get_router_password()])