With commit d79f1f6fdc8307aa4038bfb2c7607904b89eedbe the AgentMonitor
was replaced with a pluggable service. However the ping timeout in the
original constructor was not passed on anymore, leading to a default
pingTimout of 0. This would fail all agents constantly.
Modified the startMonitor command to take a pingtimeout as an argument
and instruct AgentManagerImpl to pass it along.
* send StartupAnswer right after StartupCommand is recieved
* if post processor going wrong, send out readycommand with error message to agent, then agent will exit
73be77a4c1877ae7e3613c7562d562ad96cde7ee
I've renamed discover to discoverer to fix the issue. My ant debug fails
with:
[java] ERROR [utils.component.ComponentLocator] (main:) Unable to
load configuration for management-server from components.xml
[java] com.cloud.utils.exception.CloudRuntimeException: Unable to
find class: com.cloud.hypervisor.kvm.discoverer.KvmServerDiscoverer
RB: https://reviews.apache.org/r/6239/
Send-by: rohit.yadav@citrix.com
Changes:
- in case of external service providers, there is no discoverer that could load the resource.
- So we have to rely on agentMgr to load the resource as earlier.
Changes:
- We do not need these global setting anymore. These will be hidden since 3.0
- The default traffic label will be picked from the global setting which is null by default. When traffic label is null it means the resource uses tag on the default gateway
- Changes to invoke discoverer to reload the resource object on host connection
- Since a zone can have many physical networks, there can be multiple guest, public networks. Only the zone wide storage and management traffic label will be stored in host_details henceforth.
- If traffic labels are updated, discoverer should update the host_details
2. add ha parameter to dissconnect host to indicate if HA VMs on this host
status 12844, 13394: resolved fixed
reviewed-by : edison
Conflicts:
server/src/com/cloud/agent/manager/AgentManagerImpl.java
server/src/com/cloud/agent/manager/ClusteredAgentManagerImpl.java
we use 'update count' to make sure agent status transformation is atomic.
However, atomic means success or fail which is not true for agent status.
some important transformation occassionally fails because race condition that
some other one is changing it simultaneously which finally makes agent stuck in a
wrong status.
use reenterent lock to serialize the agent status transformation. this memory lock
works in clusterd environement as well because in our design an agent is only active
in one mgmt server
status 13269: resolved fixed