[Sois-tcoa] [Sois-tcons] RE: reference model (2 levels ofSOIS-compliance)

Tue Mar 8 12:49:48 EST 2005

Keith Scott writes:
 > 
 > [Greg wrote]:
 > >We did simplistic measurements of selected UDP rx and tx transit time
 > >from app layer to hardware on a 233mhz PowerPC.  Packet traffic
 > >experienced delays up to 5ms when ambient traffic loads approached
 > >several megabits/second.  The latency SD was quite large, though the
 > >minimum was acceptable.  On a flight processor running 100mhz or so,
 > >possibly without L2 cache and with limited memory, the situation will
 > >be much worse.
 > 
 > [Keith Scott]
 > Why is it that we believe that TCONS is going to be able to meet much
 > tighter timing requirements while at the same time providing a more capable
 > service spec as ambient traffic loads pass several megabits per second, in a
 > box with significantly less resources?  Do we claim that the TCONS stack
 > will be much more tightly integrated than TCP/UDP/IP, and that that
 > integration will decrease latencies?  
 > 
 > In testing UDP transmission delays through the stack, it would seem likely
 > that the latency and jitter are introduced by interrupts (there's not a lot
 > to UDP processing, especially if you turn the checksums off).  Interrupts
 > will be present in the OS(s) running TCONS as well, right?  How is it that
 > TCONS is going to address this?  Do we say that to meet the latency
 > requirements, TCONS will be implemented in hardware/firmware and thus not be
 > subject to resource usage on the 'host' system?  Is the argument that TCONS
 > can control the interrupt rate by limiting the rate at which traffic is
 > allowed to impact the box?  This might be difficult to do with, say, IP
 > (since it could involve limiting the aggregate rate of a number of disparate
 > sources impinging on a single destination).

Interrupts are clearly part of the question, however the IP stack is
wholly asynchronous so there is no schedule to test latency against-
or any way to manage latency short of regulating application access to
it.

To limit per-pdu latency, some stacks will give priority (ours
didn't), so our qos and scheduling activities have to be done at the
app layer, with the app layer strictly regulating access to the IP
stack.  And once thats dealt with, then buffer utilization has to be
addressed.  Different stacks implement it differently, some have
per-socket pools, some have one pool shared by all.

TCONS is intended to set and track a schedule for realtime traffic,
defer buffering to the app layer so those apps which need special
buffering can manage it themselves, and it offers a well-defined
packet drop model so misbehaving apps cannot disrupt the ambient
communications of others.

 > 
 > >The previous generation of GSFC spacecraft have packet transit latency
 > >requirements of less than 4ms.  System designers are not going to
 > >accept an increase in maximum packet latency even after clock rates
 > >increase by a factor of 10.
 > >
 > >Once again, if a spacecraft does not need or have realtime networking
 > >requirements, then there is no reason to use TCONS.
 > 
 > [Keith Scott]
 > Right, and I think that the only realtime TCONS service is the schduled one.
 > I think that the best-effort and guaranteed services TCONS specifies will
 > end up similar to TCP/UDP (since they have essentially the same service
 > requirements), and that most applications, for non-scheduled service, could
 > use those instead.

However, the non-scheduled traffic STILL must be managed so it doesn't
disrupt the schedule traffic.  Generally that means its sent whenever
scheduled traffic isn't being sent.  However the decision of when
scheduled traffic is present or not is made by the TCONS and thus it
must control when other data units are transmitted.

 > >For a general measure, anything we implement should offer no worse
 > >realtime properties than existing systems, so you can view a
 > >worst-case device-to-application latency of 4ms as "sufficiently
 > >realtime".  "Realistically realtime" should offer latency of no more
 > >than 1ms.  On a 100mhz PPC, running a substantial pile of software and
 > >moving lots of data, an IP stack cannot reliably meet either limit.
 > 
 > [Keith Scott]
 > And neither will TCONS (intentionally inflammatory statement).  The point
 > is, without some benchmarks of what 'a pile of software' and 'lots of data'
 > are, we could argue this forever.  My fear is that we will continue to
 > reject alternate solutions because they are "too heavy, too slow under high
 > network load, goo slow under high CPU utilization, ..." without having any
 > firm notions of what these are.  When we have finally developed TCONS to the
 > point that it can be tested, if we find that it fails one or more
 > performance tests used to reject other approaches, proponents of those other
 > approaches are likely to be (justifiably) upset.  How do we know when we've
 > won?

TCONS can make realtime guarantees because the scheduled service
maintains a well-defined comms schedule.  If datacomms slop off the
end of their assigned slots, then it is a bug in the slot
configuration or a bug in TCONS.  Such an event is immediately
detected by TCONS and action can be taken.  An IP stack has no way at
all to approach that sort of control or monitoring.

 > 
 > >You <might> be able to tweak some IP stacks on some processors to do
 > >so, and limit application access to the IP stack to ensure packets are
 > >not lost and buffers not overloaded, but such a solution is inevitably
 > >idiosyncratic.
 > 
 > [Keith Scott]
 > I suppose TCONS can prevent packet loss in the stack by applying
 > backpressure on the application.  I'm not sufficiently famiar with the
 > lower-layer interface to OBL to know how it applies backpressure to the
 > network layer.
 > 

The user supplies the buffers in both directions, TCONS will not drop
packets and congestion will cause the app layer to delay while it
acquires buffers.

Greg