[Sis-ams] Predictability concerns with AMS, and other questions too

Tue Jan 31 12:24:15 EST 2006

Marek Prochazka wrote:
> Hi Scott,
> 
> going through your answers (thanks for them!):
> 
>>>1) Predictability issues. There is a number of places where I'm 
>>
>>This is an important topic.  The AMS design isn't principally 
>>aimed at hard real-time applications; the main intent is to 
>>reduce the cost of developing and operating distributed 
>>systems over networks, including the future interplanetary
> 
> This is certainly an important message for us, as what we definitely
> want is support for RT. I always thought that the requirements for the
> MTS included support for RT.

I think the requirements for MTS explicitly included support for a number of specific transmission features that would have the effect of supporting real-time applications; this may be splitting hairs, but I think it's an important distinction.

In any event, though, AMS was not designed to respond to the MTS requirements; if it had been then it would have been appropriate to advance it in SOIS rather than in SIS.  It was only after AMS was formally proposed that CCSDS began to wonder if it might be able to address the requirements for MTS and for SMCP as well, so that the number of distinct messaging standardization efforts funded by CCSDS member agencies might be reduced.  Our study on this point seemed to indicate that AMS could reasonably be enhanced to meet the MTS and SMCP requirements, and we've been proceeding along that line since then.

And, as I've been saying, I think AMS can indeed support the transmission features called out in the MTS requirements, either within the operations of AMS itself or (in most cases) by providing a way to pass QOS constraints through to the underlying transport layer.  But that was not the original intent of the design.

>>As you say, it's all a question of exactly how you use the 
>>system: once the communication configuration of the real-time 
>>elements of the message space has stabilized (other bits of 
>>configuration can continue to change without noticeable 
>>effect) - and provided your real-time nodes are using a 
>>real-time-suitable transport system (such as message queues) 
>>underneath AMS - I believe you can get bounded maximum 
>>latency in AMS message exchange among those nodes.  This 
>>remains to be demonstrated, of course, and a lot does depend 
>>on careful implementation, but my experience with Tramel 
>>makes me hopeful.
>
> O.K.
> My feeling is that some level of laziness could be better then immediate
> (and perhaps simultaneous) propagation of configuration modifications,
> cycling through nodes, etc.

I think the specifications for meta-AMS traffic in the White Book tend to make it sound as if this activity will be massive and ongoing, just because they occupy so much text.  In practice (in my experience), this activity is minimal in any functioning application: you simply don't start and stop application nodes frequently, nor do most applications dynamically alter their interest in specific message subjects after they start running.

I would suggest that somebody do a real quantitative analysis of MAMS traffic in a reasonably-sized on-board application and give us an idea of just how much overhead there is.  I think the bottom line will be that the responsiveness and (relative) simplicity of immediate propagation of reconfiguration cues far outweigh any performance optimization that one might get from scheduling them for future transmission.

One other point I'd suggest here is that implementation strategy will likely have a great deal of impact on exactly how AMS behaves with regard to parameters like bounded transmission latency.  In particular, the JPL implementation locates all MAMS traffic issuance and reception in threads that are distinct from application threads, to assure that MAMS activity doesn't necessarily have to be interleaved immediately with application activity.  In a vxWorks environment I think one could use tasks the same way and, for example, give the MAMS activity tasks lower priority than selected real-time application tasks; the result would still be perfectly conformant to the AMS spec.

>>I'd say that real-time performance (a guaranteed upper bound 
>>on message delivery latency) is an element of Quality of 
>>Service, just as reliability (retransmission), preservation 
>>of data transmission order, etc. are elements of Quality of 
>>Service.  Underlying the AMS design is a commitment to the 
>>layering principle and a deliberate and resolute refusal to 
>>reinvent communications all over again, so a fundamental 
>>design principle of AMS is to rely on the underlying 
>>transport systems to provide QOS.  For example, AMS doesn't 
>>do retransmission itself: it relies on (say) TCP, where many 
>>thousands of hours of work have gone into a sound 
>>retransmission design.  So one reason you're not seeing a lot 
>>of discussion of real-time performance guarantees in the AMS 
>>spec is that AMS, by design, is going to rely on the 
>>real-time performance of underlying transport systems when 
>>applications deem real-time performance necessary.
>>If the underlying transport system 
>>is, say, vxWorks messages queues - or TCONS - then the 
>>latency between the moment of transmission and the moment of 
>>arrival of each message will be quite predictable.  But it 
>>won't really be AMS that has provided that predictability: 
>>AMS has just conveyed the application's mandate to the 
>>transport layer.
> 
> 
> I have to emphasize that I'm more RT and middleware guy than a
> networking one.
> The comments above are true. My only arguments are:
> 1) In addition to the transport layer worst case overheads, you have to
> take into account the worst case overheads of AMS in case of a node
> failure, registrar failure, config server failure etc.  If I take into
> account e.g. a potential failure of a config server, normal messages
> will be delayed proportionally to the number of nodes in the same zone
> (maybe this number is bounded by a number of alternative config server
> addresses). The delay on a node X is caused a) by AMS on node X cycling
> through registrar addresses (CPU, network interface, kernel calls,
> threads perhaps not preempted if AMS runs or lower priority but maybe
> blocked due to shared access to some resources, depends on transport
> protocol, drivers, etc.), b) by number of messages received by the
> driver, c) by network traffic.

I think I see your point.  Again, I'd suggest that an appropriate implementation strategy - one that protects high-priority application threads from pre-emption by meta-AMS activity - is the right way to deal with this concern.  It's not really a wire protocol issue.

> 2) It is certainly a good thing that AMS relies on e.g. TCP
> retransmissions, but I'm missing AMS failure codes mapped to various
> failures of the underlying transport protocol, so there is absolutely no
> way to be aware of certain nodes eventually failed or having problems
> when receiving messages. (Yes there is a way - to build yet another
> application-level protocol on top of AMS, message delivery
> acknowledgements or something like that. Not a very effective way.)

Right, this is the sort of thing I was driving at in my email of yesterday: exactly what sort of behavior would be helpful here?  This too is not a wire protocol issue, but instead just a service specification (and implementation) issue; however, it's one that would be reasonable to deal with in the spec if we can nail it down.

 >>predictability of fail-over will not be good.  The point of 
>>the fail-over design isn't preservation of real-time 
>>performance (which would be unaffected by failure of a 
>>configuration server anyway, since the real-time application 
>>messages are exchanged directly between nodes) but the
> 
> That's what I think is not true. The performance of normal communication
> will be affected.

Right, this is (I believe) that implementation issue again.

>>>2) Priority of a message: It is mentioned number of times, 
>>
>>Good point, there should be some clarifying language somewhere.  
>>Priority and flow label are both merely passed through to the 
>>underlying transport layer adapter, to be used (or not) as 
>>makes sense for that protocol; the AMS protocol itself 
>>doesn't use them at all.  The JPL implementation does use 
>>priority to order arriving messages in the queue of messages 
>>awaiting delivery to the application, but this is strictly an 
>>implementation choice; interoperability is not affected.
> 
> I think that the implementation of AMS should somehow deal with
> priorities either. E.g. when a message is being multicasted to all
> subscribed nodes and a higher priority message arrives. Or what has a
> higher priority - reconfiguration or "normal" messages? What if two
> messages of different priority are received by a node at the same
> moment?

These are all really implementation decisions, because they have no effect on interoperability.  The AMS White Book defines a wire protocol, and it says enough about the service interface to levy some requirements (not design) on the developer of an implementation, but it's not a program specification.

That said, it's certainly reasonable to develop some good guidelines for AMS implementations that would be real-time-capable and to collect those guidelines in an AMS Green Book (which is yet to be written, and in fact is yet to be added to the charter of AMS-WG).

For what it's worth, the JPL implementation handles these questions by being quite heavily multi-threaded and using linked lists in shared memory to queue incoming application messages in priority sequence.  Incoming MAMS messages go into a separate queue, in FIFO sequence, that is only serviced by the MAMS thread - application threads never see these messages at all.

> What if a node sends query, is suspended when waiting for a
> reply, and a higher priority message arrives?

If the node is suspended then it's suspended, regardless of the arrival of new messages - though, again, this is of course an implementation matter; you could build an AMS implementation that would interrupt the suspension of a node upon occurrence of some event, such as arrival of a high-priority message, and the only deviation from the AMS spec would be in its violation of the expectations of the application based on the service spec.

> BTW what does it mean that
> the node is suspended? No message can be sent by other application
> threads?

Section 4.3.5 is as close as we've got to a definition of "suspension": the node stops handling inbound AMS (not MAMS) messages until either the response arrives or the timeout expires, but no other activity is inhibited. 

>>These intervals could be configuration options rather than 
>>fixed values, but in my experience that introduces a lot of 
>>operational and implementation complexity for little if any 
>>benefit.  Certainly wide variations in signal propagation 
>>delay could make the fixed values in the spec less than 
>>useful, but I would argue that in this case you should 
>>partition your system into multiple closed continua and use 
>>remote AMS for message exchange across the long-delay links; 
>>that's really what RAMS was designed for.
> 
> Perhaps you're right and a suggestion like this should go to the green
> book.

I agree.

>>>7) Node registration (4.2.5): Forwarding an I_am_starting 
>>>
>>
>>I don't understand how this would happen: I don't think 
>>there's any clause in the spec that talks about a registrar 
>>forwarding a MAMS message to another registrar, except when 
>>the source of that MAMS message is a node in its own zone 
>>(i.e., NOT a registrar).  So there can't be any looping of 
>>messages through registrars.
> 
> 
> I don't know what I meant in 4.2.5 ;-)
> In 2.3.10, I thought that forwarding a message to "every other RAMS
> gateway which it's linked and whose message space contains at least one
> node..." actually implies that all the "other" RAMS gateways also
> forward the message once again to all the RAMS gateways, as they match
> the condition in the quotes. But perhaps 4.4.10 addresses this.

Okay, that clarifies things a little.  The RAMS gateways are really different from the registrars, and (as you say) I think 4.4.10 prevents the kind of looping you were talking about.  Let me know if you see a bug in this; I know the RAMS gateway processing is pretty complex, especially now that we're enabling limitation of the scope of subscriptions and invitations.

>>>8) Heartbeats (4.2.7): After it receives "reconnect" form a node, a 
>>
>>I think we're okay here.  Suppose the registrar crashes at 
>>time T and the new replacement registrar starts at time T+x.  
>>By time T+60 every node in the zone will have noticed the 
>>death of the original registrar and will have started asking 
>>the configuration server where the new one is.  All of the 
>>nodes will be querying the configuration server and trying to 
>>reconnect, every 20 seconds, so by T+x+20 every node in the
> 
> I don't see how you get T+x+20 here. The nodes start to reconnect after
> they notice that the registrar is gone, which is by T+60 as you say
> above. So the nodes will be querying the configuration and trying to
> reconnect at t+60 or t+80 (the document doesn't say whether reconnection
> happens immediately or in the next 20 sec time).

Actually the *latest* time at which a node will start working on reconnection is T+60, because that's the latest time at which it can notice three missing heartbeats from the registrar (20 seconds apart).  A given node might be expecting the first heartbeat at T+5, the next one at T+25, and the third at T+45, in which case it would query the configuration server for the new location of the registrar at T+45.

But you're right that there is an edge case in which things are not quite so comfortable; see below.

>>zone will have learned about the new location of the 
>>registrar and will have sent a reconnect message to it.
>>Since the registrar doesn't shut off reconnects until T+x+60, 
>>there's no problem, no matter what the value of x is.
> 
> The registrar doesn't shut off reconnects until T+x+60, but the nodes in
> the zone notice the old registrar's crash by T+60, as you say above. So
> they have only x to reconnect, don't they? Some misunderstanding must be
> here...

Every node will start querying the configuration server for the new location of the registrar at T+n where 40 < n <= 60.  (In the case I described above, n = 45.)

If x > n, then these queries (repeated every 20 seconds) will fail until T+x plus at most 20, depending on exactly what point in the cycle the registrar restarts at.  As soon as the query succeeds, the node immediately reconnects with the revived registrar.  So the node is reconnected at T+x+20 or sooner, and since reconnection isn't shut off until T+x+60 everything is great.

But now suppose x < n for some node.  This means that the node's very first configuration server query will succeed, so the node will immediately reconnect with the revived registrar at that time, which is T+n.  However, the upper bound on n is 60 and the lower bound on x is effectively zero (weird but possible), so the revived registrar might shut off reconnections as early as T+0+60 while this particular node might not try to reconnect to it until T+60; depending on the vagaries of network performance and process scheduling, it's possible that the node might miss the deadline.

So the bottom line is that we probably should build a bit more cushion into the reconnection interval, making it (say) 90 seconds instead of 60 seconds.

>>Again, for communication over long-delay links it's important 
>>to deploy multiple continua and use Remote AMS; ordinary AMS 
>>and MAMS functioning just doesn't make sense over that sort 
>>of distance, and there's no need for it to.
> 
> O.K., maybe another paragraph in the future green book might help. The
> thing is that your explanation above is not just a suggestion, it is a
> must - given the fixed numbers in the protocol.

I agree that it's a must for some ops concepts, but it's not a must for operation of the protocol; it's not normative.  That's why I'd say it belongs in the Green Book.

Scott