[Css-csts] Effects of varying network bandwidth on SLE service instance provision period

Mon Feb 6 13:58:14 EST 2012

CSTSWG and SMWG colleagues ---

Recently, the NASA Space Network Ground Segment Sustainment (SGSS)
Project has been considering the need to schedule terrestrial network
bandwidth for playback of telemetry data. In the legacy SN system, they
schedule both the playback process itself as well as reserve the network
bandwidth. The question was raised about the effects of knowing or not
knowing the available terrestrial bandwidth on providing SLE offline
services. In a note to SGSS Project personnel, I pointed out that there
is no single definitive answer, but that it depends on (among other
things) how service instance provision periods are "scheduled". For
illustrative purposes, I identified two ends of a spectrum. 

At one end, the network can configure offline service instances with
relatively-narrow service instance provision periods; e.g., a specific
45-minute period could be scheduled. If the terrestrial bandwidth that
is available during that window is sufficient to transfer the volume of
data requested by the user in the START operation, then everything is
fine, but if the bandwidth is not sufficient, the service instance will
peer-abort when the end of the period arrives and all requested data
have not yet been delivered. The legacy Space Network schedules data
playback in a mode that is toward this end of the spectrum. 

At the other end of the spectrum, the service instance provision period
is essentially boundless. The service instance is enabled for binding
whenever the user wants to "pull" the data. As long as the requested
data are still available in the data store, they can be transferred. The
service instance can take as long as necessary to transfer (because the
end of the service instance provision period is unbounded): slower
terrestrial links mean longer transfer times but they don't otherwise
inhibit the transfer. In my discussion with Wolfgang Hell on offline SLE
several years ago, I came away with the impression that this would be
the preferred approach given sufficient resources.

How do the today's offline SLE implementations (e.g., Estrack and DSN)
currently fall on this spectrum? Are the access windows tightly defined,
unbounded, or somewhere in between (e.g., a service instance is
scheduled for a given mission with the same 4-hour provision period
every day?

Of course, the unbounded-provision-period end of the spectrum implies
that transfer service instances are accessible to all offline users all
the time, which has implications for resources. I am under the
impression that many (if not most or all) SLE implementations must be
dedicated to a single user at a time. It seems to me that this doesn't
necessarily have to be the case - it should be possible to implement
offline SLE so that the resources are pooled so that N offline service
instances could be enabled (ready to bind) without having an SLE
"processor" dedicated to each of them. Does your network's
implementation dedicate SLE resources on a one-for-one basis, or is
there some degree of resource sharing?

While I'm asking about offline services, let me ask a related question
about complete online delivery mode. When a user (mission) schedules a
pass with complete online return SLE service, how is the end of the
service instance provision period determined? Is it (a) requested by the
mission in the service request (or, for rule-based scheduling,
pre-specified in the scheduling rules), or (b) calculated by the network
on a pass-by-pass basis and configured accordingly? 

As you may know, in the Blue-1 version of Service Management, the
mechanism for setting complete (as well as timely) online service
instance provision periods is through start and stop time relative
offsets specified in Transfer Service Profiles. The relative offsets are
with respect to the space link carriers with which they are attached.
This allows the service instance provision periods to "float" in the
scheduling process with the flexibilities applied to the scheduling of
the space link carriers. For example, profile RAF7 could have a
start-time offset of 0 seconds, and a stop-time offset of +300 seconds.
If profile RAF7 is applied to a space link carrier profile that is used
to schedule a return space link carrier from 1100 to 1115, the RAF
service instance associated with profile RAF7 will be scheduled from
1100 to 1120 (1115 plus 5 (300 seconds)). 

The Blue-1 approach may be too simplistic, and as we develop
requirements for the next generation of Service Management, I would like
to collect information on how it is actually done today, and more
importantly, how it might be better done in the future.

Thanks in advance for your help.

Best regards,

John

From: John Pietras 
Sent: Thursday, February 02, 2012 4:18 PM
To: Stephen.Bernsee at gdc4s.com; 'Douglas.Barnhart at gdc4s.com'
Cc: 'Degumbia, Jonathan D. (GSFC-444.0)[OMITRON]'; Gawne, Bill
(GSFC-444.0)[HONEYWELL TECH. SOLUTIONS]
Subject: RE: Considerations for SLE and uncertain terrestrial bandwidth

Steve and Doug,

As you know, one of the special topics that's being carried for the SM
Development WG is related to whether SM needs to know about terrestrial
bandwidth, especially with regard to playbacks. One facet that came up
in the discussion is how return SLE services are affected by bandwidth
considerations. Here are some thoughts on the affects of variable
bandwidth on SLE transfer services. 

Return SLE service have three delivery modes: timely online, complete
online, and offline.

1.       In the timely online mode, the SLE service instance attempts to
send the data units (e.g., transfer frames) across the TCP connection as
soon as it gets the data units (i.e., at the downlink rate). However, if
the link under the TCP connection doesn't have a matching data capacity,
the SLE service instance will discard data that can't be sent, based on
a service-management-configured parameter. If the terrestrial comm link
is running at slightly above the downlink rate (slightly above accounts
for SLE overhead) and the link is clean, all frames will normally get
transferred and only if the network experiences unanticipated
(temporary) congestion will frames be discarded. So not all frames are
guaranteed to get through, but those that do are guaranteed to be
delivered within a defined latency (hence "timely"). As far as SM is
concerned, it doesn't matter if the SLE service is given more time than
the contact - it couldn't transfer any more data if it had the extra
time. However, if the event is scheduled such that it competes for the
terrestrial bandwidth with other events, then the MOC will experience
data loss as the service instance routinely discards because of the
too-low available bandwidth.

2.       In the complete online mode, the SLE service instance also
attempts to send the data units across the TCP connection as soon as it
gets the data units, but if the link under the TCP connection doesn't
have a matching data capacity, the SLE service instance will buffer data
until it can be sent, as long as the service instance is enabled. 

In the SLE transfer service specifications, the time during which the
service instance is scheduled to be provided (and thus enabled) is the
service instance provision period. The SLE transfer service
specifications do not say how the service instance provision period is
to be specified in the Service Package. In SCCS-SM, the transfer service
profiles specify relative offsets from the space link carriers with
which they are attached. This allows the service instance provision
periods to "float" in the scheduling process with the flexibilities
applied to the scheduling of the space link carriers. For example,
profile RAF7 could have a start-time offset of 0 seconds, and a
stop-time offset of +300 seconds. If profile RAF7 is applied to a space
link carrier profile that is used to schedule a return space link
carrier from 1100 to 1115, the RAF service instance associated with
profile RAF7 will be scheduled from 1100 to 1120 (1115 plus 5 (300
seconds)). Note that we made the start and stop offsets respecifiable so
that, for example, for a particular Service Package Request the UM could
change the RAF 7 stop-time offset to +600 seconds for that particular
request.  

Note, too, that in SCCS-SM the online SLE transfer services are
considered part of the Service Package, so the Service Package does not
stop executing (i.e., "end") until the completion of the last-ending
space link carrier or SLE transfer service. 

So, how does this play when the terrestrial bandwidth available to the
Service Package is affected by other Service Packages? First let's
consider the case where neither the MOC nor the SM PSE has any knowledge
of the terrestrial data rate available to the RAF service instance. The
stop-time offset in the transfer service profile that is applied (e.g.,
RAF7) may or may not be sufficient to transfer all of the data,
depending on the competition for that terrestrial bandwidth. The profile
*could* be configured with a stop-time offset that is calculated to
transfer all of the data 95% of the time based on some network loading
analysis that factors in probabilities for competition for terrestrial
bandwidth, but in many Service Packages that could result in scheduling
and allocating the SLE transfer service resources to that service
instance much longer than is needed (e.g., there may be no competition
for bandwidth during the execution of a particular Service Package).

Now let's assume the case where the SM PSE *does* know the terrestrial
bandwidth that will be available. Off the top of my head, some options
could be:

a.       If the SM PSE can't schedule the Service Package such that all
transfer service instances would be able to transfer their data within
the specified stop-time offsets, the SM PSE could reject the Service
Package Request ("reject" could, of course, be preceded by conflict
resolution activities);

b.      If the SM PSE can't schedule the Service Package such that all
transfer service instances would be able to transfer their data within
the specified stop-time offsets, the SM PSE could accept/schedule the
Service Package and add some sort of annotation that the data on the
transfer service instance will not be able to be delivered, and perhaps
make a recommendation for the MOC to replace the request with one that
has a sufficiently-long transfer service instance. This would allow the
UM to hold the scarce resource (the space link carriers). It's also a
viable approach for when the UM is having the data simultaneously stored
for offline retrieval and is willing to use an offline service to get
whatever can't be transferred in near-real-time.

c.       The SM PSE could ignore the stop-time offset (or treat it as a
soft constraint)  and attempt to schedule the Service Package with the
transfer service instances running as long as necessary to get the data
transferred at the terrestrial bandwidth that will actually be available
(assuming that the SLE transfer service resources will also be available
for that time). If the MOC finds the resulting end-time(s) unacceptable
it is always free to delete the service package or attempt to modify
(replace) it.

There are probably other variations, too. Of the three above, I
personally lean toward (c); it seems to have the highest probability of
getting an acceptable solution the first time and avoiding necessary new
or replace requests.

3.       In the offline mode, the data are already stored in the data
store and the retrieval is unrelated to the Space Link Sessions (i.e.,
SN Events) during which the data were originally received. In SCCS-SM,
the offline transfer service instances are scheduled via a different
kind of Service Package request, the Retrieval Service Package request.
Unlike the transfer service profiles used for online services, the
offline transfer service profiles don't need to be offset with respect
to other entities such as space link carriers: the service provision
period of the transfer service instance is set directly by the retrieval
Service Package that schedules that transfer service instance. 

So how are offline transfer service instances affected by varying
terrestrial bandwidth? It depends on the duration of the Retrieval
Service Package. If Retrieval Service Packages are scheduled for narrow
time slices, then if a MOC attempts to retrieve more data (as specified
by the start-time and stop-time parameters of the START invocation that
is used to retrieve data from the data store) than can get through the
pipe, the service instance will abort at the end of the service instance
provision period without having transferred all of the data.

While SCCS-SM supports scheduling of short Retrieval Service Packages,
the preference of many CCSDS members appears to be to allow long-lived
Retrieval Service Packages (sometimes as long as the life of the
Mission). By practically eliminating the possibility of having the
transfer service provision period end while the data are being
transferred, the MOC requests the data and it simply gets delivered at
the available data rate, which may even vary during the course of the
data transfer. 

The downside of long-lived Retrieval Service Packages is the possibility
(depending on how the SLE service are implemented) of having to dedicate
SLE offline transfer service resources for long periods of time. I
believe that it is possible to implement an offline SLE transfer service
"server" that can be configured to support multiple mission
simultaneously, such that, for example, but I don't know if any existing
SLE implementations are set up to actually do so. This is something that
I will explore within CCSDS. Note that enabling more or fewer SLE
offline transfer service instances can be done independently of how much
terrestrial bandwidth is available - it's only when the transfer service
instances are bound and active that data flows and therefore impacts the
terrestrial bandwidth supply. 

John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ccsds.org/pipermail/css-csts/attachments/20120206/4ae908f0/attachment-0001.htm