Merge pull request #863 from borisroman/CLOUDSTACK-8883

[4.6][BLOCKER]CLOUDSTACK-8883: Resolved connect/reconnect issue.Hi!

@wilderrodrigues by implementing Callable you switched a couple of methods and fields. I switched them some more!

The reason why the Agent wouldn't reconnect was due to two facts.

Problem 1: Selector was blocking.
In the while loop at [1] _selector.select(); was blocking when the connection was lost. This means at [2] _isStartup = false; was never excecuted. Therefore at [3] the call to isStartup() always returned true resulting in an infinite loop.

Resolution 1: Move the call to cleanUp() [4] before checking if isStartup() has turned to false. cleanUp() will close() the _selector resulting in _isStartup to be set to false.

Problem 2: Setting _isStartup & _isRunning to true when init() throwed an unchecked exception (ConnectException).
The exception was nicely caught, but only logged. No action was taken! Resulting in _isStartup & _isRunning being set to true. Resulting in the fact the Agent thought it was connected successfully, though it wasn't.

Resolution 2: Adding return to the catch statement [5]. This way _isStartup & _isRunning aren't set to true.

Steps to test:
1. Deploy ACS.
2. Try all combinations of stopping/starting managment server/agent.

[1]b34f86c8d5/utils/src/main/java/com/cloud/utils/nio/NioConnection.java (L128)
[2]b34f86c8d5/utils/src/main/java/com/cloud/utils/nio/NioConnection.java (L176)
[3]b34f86c8d5/agent/src/com/cloud/agent/Agent.java (L404)
[4]b34f86c8d5/agent/src/com/cloud/agent/Agent.java (L399)
[5]b34f86c8d5/utils/src/main/java/com/cloud/utils/nio/NioConnection.java (L91)

* pr/863:
  Added return statement to stop start() if there has been an ConnectException.
  Call cleanUp() before looping isStartup().

Signed-off-by: Rajani Karuturi <rajani.karuturi@citrix.com>
This commit is contained in:
Rajani Karuturi 2015-09-23 11:21:39 +05:30
commit 1a474374b9
2 changed files with 6 additions and 3 deletions

View File

@ -394,15 +394,17 @@ public class Agent implements HandlerFactory, IAgentControl {
} while (inProgress > 0);
_connection.stop();
while (_connection.isStartup()) {
_shell.getBackoffAlgorithm().waitBeforeRetry();
}
try {
_connection.cleanUp();
} catch (final IOException e) {
s_logger.warn("Fail to clean up old connection. " + e);
}
while (_connection.isStartup()) {
_shell.getBackoffAlgorithm().waitBeforeRetry();
}
_connection = new NioClient("Agent", _shell.getHost(), _shell.getPort(), _shell.getWorkers(), this);
do {
s_logger.info("Reconnecting...");

View File

@ -88,6 +88,7 @@ public abstract class NioConnection implements Callable<Boolean> {
init();
} catch (final ConnectException e) {
s_logger.warn("Unable to connect to remote: is there a server running on port " + _port);
return;
} catch (final IOException e) {
s_logger.error("Unable to initialize the threads.", e);
throw new NioConnectionException(e.getMessage(), e);