[Sis-ams] 'reconnect' timer?
Ray, Timothy J. (GSFC-583.0)
timothy.j.ray at nasa.gov
Mon Aug 25 15:55:52 EDT 2008
Scott,
Ok. I think I understand your idea. What follows is an attempt to
describe it in my own words (boy, it's much easier to implement
something than to specify it!).
With regard to the heartbeat cycle, a node can be thought of as having 3
states:
* Registered - For each heartbeat-cycle, the node sends a
heartbeat and checks to see if a heartbeat was received. If the
incoming heartbeat is missed for N6 consecutive heartbeat-cycles, then
the registrar's death is imputed and the state changes to
'reconnecting'.
* Reconnecting - For each heartbeat-cycle, the node initiates a
sequence of message exchanges (AMS devotees know the details here - for
example, send a 'registrar-query' to the server, receive a 'cell-spec'
from the server, send a 'reconnect' to the restarted registrar, ...)
with outcomes that fall into 3 categories. First, if "fully
successful", the node will receive a 'reconnected' response from the
restarted registrar (and change state to 'registered'). Second, if
"fully unsuccessful", the node will receive a 'you-are-dead' response
from the registrar, and change state to 'unregistered'. Third, all
other possible outcomes result in no change - i.e the 'reconnect'
sequence will be initiated again during the following hearbeat cycle.
* Unregistered - This state applies whenever the node is neither
'registered' nor 'reconnecting'. We can either say that there is no
heartbeat-cycle while in this state, or that there is a heartbeat cycle
but no action is taken.
How does this line up with your thinking?
Tim
________________________________
From: sis-ams-bounces at mailman.ccsds.org
[mailto:sis-ams-bounces at mailman.ccsds.org] On Behalf Of Scott Burleigh
Sent: Monday, August 25, 2008 1:57 PM
To: sis-ams at mailman.ccsds.org
Subject: Re: [Sis-ams] 'reconnect' timer?
Ray, Timothy J. (GSFC-583.0) wrote:
Dear WG members,
4.2.7.4.2 specifies that a node shall (in response to imputing the
death of its registrar) query the configuration server to determine the
network location of the restarted registrar, and then send a 'reconnect'
message to the new registrar (and the registrar shall respond with
either a 'reconnected' or 'you-are-dead'). Good so far. But what if
the node does not receive a response from the registrar? I don't think
our spec covers this situation.
In my opinion, AMS should require the node to start a timer when it
sends the 'reconnect' message. AMS should also specify the action to be
taken if the timer expires.
Tim, good point; the spec is not as clear on this as it should be. The
intent is that:
a) The immediate response of a node to the imputed death of a
registrar should be simply to note that the registrar is no longer
running, nothing more. (The same is true, really, of the effect on a
registrar of the imputed death of a configuration server.)
b) At the expiration of any heartbeat period, the node is supposed to
send a heartbeat to the registrar. If it knows that the registrar is
running (at a known network location), great. If not, the node is
supposed to try to determine the new location of the restarted
registrar, by interrogating the configuration server, and then -- if
that attempt was successful -- send a reconnect message to the restarted
registrar at its new location.
c) If there's no response (reconnected or you_are_dead) to the
reconnect message, no problem. At the expiration of the next heartbeat
period, the node will do step (b) again and this will result in (at
most) transmission of another reconnect.
That is, the reconnect cycle is "soft state"; in effect, the heartbeat
cycle functions as the timer on reconnect messages.
I like this approach, as it is pretty simple and at this point I think
we want to avoid introducing any more complexity (such as another timer)
into the spec if we can possibly help it. But what I have written here
is really hard to infer from the current text of 4.2.7, so I need to do
some serious reworking of this text. I'll try to do that this week,
unless we don't agree that this approach is okay. What do you think?
Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ccsds.org/pipermail/sis-ams/attachments/20080825/3b9507e1/attachment.htm
More information about the Sis-ams
mailing list