[Sis-ams] Re: AMS

Tue Apr 29 12:01:17 EDT 2008

Edell, David J. wrote:
> Scott,
>  
> I'm finishing up our AMS implementation and assembled a few more 
> miscllaneous comments/questions that I have come across.  Funding on 
> the project is about at an end though (and I'm leaving on vacation for 
> CA tonight), therefore our first version of AMS can be considered 
> completed for now.  The APL-AMS implementation is largely, with the 
> exception of invitations and related functions, compliant with 
> conformance class 4 as described. 
>  
> I was taking another look at the configuration server interrogation 
> process.  My current implementation utilizes a shared memory location 
> for announcing the registrar's location, however I'm looking into an 
> alternative MAMS-based method to avoid some of the synchronization and 
> related issues with the shared memory. 
>  
> If I modify the process to use the MAMS registrar_query, and await a 
> cell_spec/registrar_unknown response though, how would the CS know 
> where to direct the response as defined?  It seems that the 
> registrar_query message would require a MAMS endpoint as its 
> supplemental data, indicating the origin of, and return 
> destination for, the given message.
(David, I hope you don't mind, but I think the points you raise here are 
important enough to bring forward to the whole working group, so I'm 
cc:ing sis-ams on my reply.)

That's one of the big reasons TCP isn't a really good choice as a 
primary transport protocol (which is what is used for MAMS traffic): 
it's a bootstrapping problem.  If you use UDP or some other 
non-connection-oriented protocol, the transport protocol itself will 
give you
the origin of the registrar_query; that "echo" location (host/port, for 
example) can be opaquely passed up to the registrar_query handler and 
then passed back down when the cell_spec or registrar_unknown is returned.

You're right, though, that including the response MAMS endpoint in every 
MAMS message that is of a query nature is a reasonable alternative: it 
removes one constraint on what makes a good primary transport protocol, 
at the cost of a little extra transmission.  I dunno.  My inclination is 
not to change something that currently seems to work okay unless there's 
a really compelling reason, and I'm not sure we've got one yet.   
Anybody have any thoughts on this?
> It's specified that a Fault.Indication is generated if no response is 
> received from a node_registration request within N2 seconds.  
> Shouldn't there be a similar condition for a reconnect messge?  I've 
> implemented the same timeout for both states, with the node generating 
> a fault.indication and then automatically resending the registration 
> or reconnect message as appropriate.
I think I see your point.  The node_registration Fault.indication gives 
the application an opportunity to decide whether it wants to keep trying 
or bail.  Once the node is connected, it is likely engaged in ongoing 
application message exchange with its peers; if the registrar dies while 
this is going on, the node probably doesn't want to interrupt its 
application message exchange activity just because there's no registrar, 
so my inclination has been not to bother the node with reconnection 
status.  On the other hand, it might be helpful to tell the node that 
any new subscriptions it posts aren't going to have any effect because 
the registrar is dead.

 From that point of view, what would probably be more useful would be to 
deliver a Fault.indication when the absence of the registrar is first 
noticed, and then deliver some other sort of indication -- something new 
-- when the reconnection succeeds, ignoring the timeouts (since they 
don't change the registrar connection state).  But that isn't quite 
perfect either, since the registrar could actually have been dead for 30 
seconds before AMS detected the third missing heartbeat and notified the 
application.

I like the idea of giving the application better information, but I 
don't know how really substantial the benefit would be.  Because we're 
trying to get a Blue Book out relatively soon, so missions can have 
something stable to work from, my inclination is to defer making 
significant changes until after we've got some operational experience 
and then incorporate the lessons learned into a second version.
> The reference field of a 'you_are_dead' message is noted as an echo, 
> however technically there is no originating message when the message 
> is transmitted in response to a heartbeat timeout.  I'm assumming the 
> targets node ID as the reference value in this case (although the 
> field is ignored either way by the node)
Actually the node shouldn't send you_are_dead other than in response to 
a message (either reconnect or heartbeat); a heartbeat timeout just 
causes an "imputed termination".  This has various effects depending on 
what terminated, but itdoesn't produce you_are_dead in any of those cases.

You're right, though, that the reference value in a you_are_dead is only 
meaningful when the message it's responding to is reconnect, in which 
case it's what enables the (nominally blocking) reconnect query to be 
unblocked and terminated.  A you_are_dead that is sent in response to a 
heartbeat isn't turning off anything that's blocking; its reference 
value is the heartbeat source, copied from the heartbeat message, but 
that isn't especially useful to the now-officially-dead sender of the 
heartbeat.

Scott

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ccsds.org/pipermail/sis-ams/attachments/20080429/879f2c77/attachment.html