[Sois-tcoa] [Sois-tcons] RE: reference model (2 levels ofSOIS-compliance)

Thu Mar 10 22:28:32 EST 2005

Abhijit Sengupta writes:
 > 
 > I noticed the significant and exciting exchange of posting on March 7 and 8 
 > which I missed real-time (being trapped in bed with a back-pain - sign of 
 > old age) and here are some in-line comments.
 > 
 > At 3/8/2005 08:54 AM, Keith Scott wrote:
 > >[Greg wrote]:
 > > >We did simplistic measurements of selected UDP rx and tx transit time
 > > >from app layer to hardware on a 233mhz PowerPC.  Packet traffic
 > > >experienced delays up to 5ms when ambient traffic loads approached
 > 
 > What was the ambient traffic pattern? Does it match the traffic pattern we 
 > expect to see onboard? Because if it is not, I do not see how the 
 > measurement adds anything to what we are looking for.
 > We need to remember that a typical internet kind of traffic is not expected 
 > in spacecraft onboard environment where the high data rate traffic (arising 
 > out of camera or SAR) are usually highly predictable and precisely 
 > controlled through system engineering. My point being any measurement we 
 > want to rely upon needs to be a typical traffic distribution expected 
 > onboard (this reminds me of using processor speed evaluation by Dhrystone 
 > benchmark knowing perfectly well that flight software  are, in no way 
 > similar to such benchmarks).

The traffic pattern was our "nominal" flight software configuration.
The system under measurement was the spacecraft C&DH.  Its traffic
load consisted of a 4 to 5 megabit CFDP download outoing, a 150kbps
instrument simulator incoming and ground command/telemetry outgoing.
The command/telemetry packets were those measured by the test.  This
is a simplified traffic model, a real flight system would have a
greater app-layer cpu load and would likely have quite a few more data
streams going to other parts of the spacecraft.

The reason this is relevant is the measured traffic experienced
latencies ranging from 100us or so up to 5ms.  Some of the latencies
were periodic and others were not.  Short of app-layer throttling,
there is no way at all to limit stack utilization.  Given that this
processor is running about 200% greater clock than expected flight PPC
processors- and also has an L2 cache, which flight systems may well
not have, it suggests that an IP stack alone cannot limit latencies
well enough for some flight software latency requirements.

 > > >several megabits/second.  The latency SD was quite large, though the
 > > >minimum was acceptable.  On a flight processor running 100mhz or so,
 > > >possibly without L2 cache and with limited memory, the situation will
 > > >be much worse.
 > 
 > Probably it will be - but that is more of a guess unless the traffic 
 > distribution has some similarity to onboard traffic.

The test software is real flight software, from the ST-5 mission, with
hardware-specific tasks disabled and a basic complement of command &
telementry processing tasks, as well as instrument data recording and
downlink features.

 > >[Keith Scott]
 > >Why is it that we believe that TCONS is going to be able to meet much
 > >tighter timing requirements while at the same time providing a more capable
 > >service spec as ambient traffic loads pass several megabits per second, in a
 > 
 > I perfectly agree with Keith. In fact, I went to read the "scheduled 
 > service" portion in the latest version of Red Book that Greg sent me (early 
 > February) and did not find anything how the scheduled service works.

It is hoped that we can get this worked out in Athens, we purposely
didn't try in Toulouse due to time constraints.

 > 
 > >In testing UDP transmission delays through the stack, it would seem likely
 > >that the latency and jitter are introduced by interrupts (there's not a lot
 > >to UDP processing, especially if you turn the checksums off).  Interrupts
 > >will be present in the OS(s) running TCONS as well, right?  How is it that
 > >TCONS is going to address this?  Do we say that to meet the latency
 > >requirements, TCONS will be implemented in hardware/firmware and thus not be
 > >subject to resource usage on the 'host' system?  Is the argument that TCONS
 > 
 > Then we have a much bigger problem about needed hardware - compatibility 
 > with typical processor board etc

If you mean the software has to be designed with hardware constraints
in mind, I agree- but this is always the case.

 > >can control the interrupt rate by limiting the rate at which traffic is
 > >allowed to impact the box?  This might be difficult to do with, say, IP
 > >(since it could involve limiting the aggregate rate of a number of disparate
 > >sources impinging on a single destination).
 > >
 > > >The previous generation of GSFC spacecraft have packet transit latency
 > > >requirements of less than 4ms.  System designers are not going to
 > > >accept an increase in maximum packet latency even after clock rates
 > > >increase by a factor of 10.
 > > >
 > > >Once again, if a spacecraft does not need or have realtime networking
 > > >requirements, then there is no reason to use TCONS.
 > 
 > Now I think I am as confused (what's new?). According to the previous 
 > sentence, if we "not need or have realtime networking requirements, then 
 > there is no reason to use TCONS", then we use TCONS  only for "realtime 
 > networking" which I presume is "scheduled service", about which I could not 
 > find anything that tells me how it works.

The schedule service is not documented yet.  Best effort and
guaranteed services are there for traffic that does not require
realtime guarantees, but must be carried over a network which also
carries realtime traffic.

 > >[Keith Scott]
 > >TCONS has been specifying interoperability requirements at the service spec
 > >(and more specifically, the API) level, as well as protocol interoperability
 > >'on-the-wire'.  This allows for the highest degree of portability
 > >(applications and devices connected to a TCONS-compliant network should be
 > >able to be picked up and put on another TCONS-compliant network with no
 > >changes whatsoever).  At the same time it removes all choice for the
 > >spacecraft designer.  As above, maybe TCONS needs to be used for the
 > >realtime, and ONLY the realtime, services.  If this means that TCONS must
 > >implement the scheduled service (as I think it would), then IP will be
 > >perfectly happy to run over that as well as anything else.  This would allow
 > >TCONS (really OBL) to maintain control over all the timing and scheduling
 > >required to meet the realtime requirements, while allowing COTS
 > >plug-and-play of everything that does NOT require them.
 > 
 > For a network with single subnetwork, the problem is non-existent. For a 
 > network with several subnetworks, for example a 1553 in one - SpaceWire in 
 > another and a 1394 in the third subnetwork, how would this timing and 
 > scheduling work (recall that in 1553, an RT cannot do anything unless 
 > specifically commanded by BC, whereas there will be arbitration in 1394 for 
 > gaining access to bus and the requestor might lose arbitration and to make 
 > life real miserable, the SpaceWire path might be blocked because some other 
 > pair is using some of the links of the path)

A single subnet does not guarantee there are no realtime latency
issues.  There can be a complex mix of realtime and nonrealtime
traffic on a single subnet.

 > > >For a general measure, anything we implement should offer no worse
 > > >realtime properties than existing systems, so you can view a
 > > >worst-case device-to-application latency of 4ms as "sufficiently
 > > >realtime".  "Realistically realtime" should offer latency of no more
 > > >than 1ms.  On a 100mhz PPC, running a substantial pile of software and
 > > >moving lots of data, an IP stack cannot reliably meet either limit.
 > 
 > What kind of measurement data is available to justify it (perhaps, we need 
 > to know what is this "substantial pile of software" and "moving lots of 
 > data" - to me they seem to be hand-waving statements.

The requirements for missions in flight now will provide that sort of
information.  "Substantial pile" is whatever load of flight software
the mission of your choice is running.  "Moving lots of data" is
vague, but I mean "the bus/network utilization is high enough that
transmit queueing is required to meet latency requirements".

 > 
 > I do not agree with this conclusion - particularly without a clearer 
 > picture of the network traffic.
 > Do we have any idea of what is a reasonable traffic model for onboard 
 > network? If we do, what is it (I do not know myself)? If we dont (at least 
 > even a reasonably close approximation), using a conventional model (as used 
 > in typical network evaluation - evolved from a variety of terrestrial 
 > applications) is grossly incorrect. As a simple example, in terrestrial 
 > applications, network congestion can happen because users are accessing 
 > network in an unpredictable way (how many users are trying to access a 
 > search engine like Google at any instant?) while in an onboard environment, 
 > it is highly predictable and ordered - implying even congestion might be 
 > predictable.

Hopefully the congestion is well predicted- as you say, the network
utilization onboard is typically well planned.  OTOH, one of the hopes
of these new faster buses/networks is that by increasing datarates by
a couple orders of magnitude, network scheduling can be made lots
simpler because transit time is so fast.  In such a case, the network
schedules may be much looser and more flexible.  And further in such a
case, then there must be safeguards so unforseen traffic spikes won't
disrupt realtime communications.

Gregm