[Css-csts] Effects of varying network bandwidth on SLE service instance provision period

Wed Feb 8 07:32:19 EST 2012

John,

Thank you very much for following up on this. It is always useful to get
feedback from a person not directly involved with the system under
discussion.

I did not have a chance to look at SCCS-SM with respect to scheduling of SLE
online and offline transfer services and I'm afraid that due to other
commitments I won't get the chance to do so in the next days. But regardless,
please find below some clarifications interspersed with your comments.

Best regards,

Wolfgang

  From:       "John Pietras" <john.pietras at gst.com>                                                                                           

  To:         <Wolfgang.Hell at esa.int>                                                                                                         

  Cc:         <css-csts at mailman.ccsds.org>, <css-csts-bounces at mailman.ccsds.org>, "Stoloff, Michael J (318H)"                                 
              <michael.j.stoloff at jpl.nasa.gov>, <smwg at mailman.ccsds.org>                                                                      

  Date:       07/02/2012 21:36                                                                                                                

  Subject:    RE: [Css-csts] Effects of varying network bandwidth on SLE service      instance provision period                               

Wolfgang,
Thanks very much for your response. My comments are interleaved below.

I don't know how much you've had a chance to look at how the SCCS-SM
Book handles scheduling of online and offline SLE transfer services.

Best regards,
John

-----Original Message-----
From: Wolfgang.Hell at esa.int [mailto:Wolfgang.Hell at esa.int]
Sent: Tuesday, February 07, 2012 5:35 AM
To: John Pietras
Cc: css-csts at mailman.ccsds.org; css-csts-bounces at mailman.ccsds.org; Stoloff,
Michael J (318H); smwg at mailman.ccsds.org
Subject: Re: [Css-csts] Effects of varying network bandwidth on SLE service
instance provision period

John,

Here is a brief summary of how the current ESA implementation works. As it is
the simpler case, let me start with addressing the offline case:

- Any mission for which we have taken the commitment to provide telemetry
also in offline delivery mode can activate an offline retrieval in parallel
with real-time support, i.e. during a scheduled pass at the given station,
because we include the Offline service instances in the what we refer to as
return spacecraft session on our SLE provider. In other words, a mission
having detected that they missed something in real time can perform the
retrieval during the next pass at that station. The risk is that the
retrieval might be aborted in case the data flow did not complete until the
end of the pass. If that risk is obvious from the beginning, the mission can
talk to the link operator during the pass and request that the offline SIs
will not be removed at the end of the pass. In the vast majority of cases the
extended accessibility of the offline SIs will be granted. We then expect the
mission to let us know when they are done and at that point the SIs will be
removed.

   <JP - Presumably, there are separate offline and online SIs, even though
   they
   are enabled (have their service instance provision periods) concurrently.

   WH: That is correct.

   When you say "remove the SI", are you talking about just disabling the
   ability to bind to the SI, or is the stored data also deleted then? If the
   SI is simply disabled, how is the stored data eventually removed? E.g.,
   is there an allocated amount of storage dedicated to that mission that is
   discarded FIFO, is there a time limit for storage, or does the mission
   tell
   network operations (by voice?) when it is okay to purge a period of data
   from the data store?

   WH: We are using 'permanent' SIs, i.e. the service provision period is
   undefined and the actual provision period is determined by the period
   during which the SIs are loaded. The loading (and removing) of the SIs is
   part of the station configuration activities. For the vast majority of our
   missions the configuration is performed by a pass timeline (that we call
   ground station schedule) which is generated by the ESTRACK Management
   System (EMS) and executed automatically by the station M&C system (the
   Station Computer). In general, the SIs are accessible by the client
   shortly after BoA (Begin of Activity) until EoA (End of Activity). We
   refer to the removing of the SIs as 'archiving'. As stated before, the SIs
   are 'permanent' and for the next pass of the given mission they are
   retrieved form the archive and loaded and then accessible by the client
   mission. As discussed in my previous email, there are some cases where the
   SIs are not archived when the EoA event is reached. We do not use the
   'end' reason in the UNBIND at all. The link operator is informed via voice
   loop that the mission data transfer is completed and will then archive the
   SIs of that mission.

   The telemetry storage is managed completely independent of the SIs. In
   essence, a kind of FIFO is implemented by means of a storage management
   script. When support is committed to a mission, we make sure that the data
   volume and retention period required by a given mission can be met in the
   light of the total storage requirements.

   Going back to a recent email exchange, when you say that you expect the
   mission to let you know when they are done so that the Sis can be removed,
   it seems like this could be done by UNBINDing with 'end'. Is that the way
   it is done, or is it by voice?

   WH: As stated above, by voice.

   To the level of detail provided in your description, it sounds like the
   mechanism in SCCS-SM Blue-1 is sufficient to support this mode of
   operation. Off the top of my head I don't know if SCCS-SM allows (or
   explicitly disallows) a user extending an offline SI while it is
   executing,
   but that is something that should be easy to allow (as always, it may be
   that the request to extend may be rejected because the SLE provider
   equipment has already been allocated elsewhere for the requested extended
   period).
   />

  WH: I concur.

- In case missing data are urgently needed or a pass with the station where
the missing data have been acquired is not scheduled for quite some time, the
offline retrieval can also be requested via voice loop outside a pass of that
mission. The link operator will then load the SIs manually and the mission
can perform the offline retrieval in parallel with real time support of
another mission. Again we expect the mission to tell us when they are done so
that the SIs can be removed again.

- It would also be possible to to schedule offline return session the same
way as real time passes are scheduled. However, at least for the missions we
are currently supporting, offline retrieval is the exception rather than the
rule used to recover from some anomalies encountered during the real time
support. Therefore this possibility is currently not used in practice.

   WH: Your observation is correct. There is a further perhaps even more
   important difference. In the DSN case, all return services are provided by
   a central facility which also encompasses a central telemetry storage. As
   a consequence, the user does not need to care at which antenna or complex
   the data of interested had been acquired. In the ESA case, the SLE
   providers including their telemetry storage for offline delivery mode are
   associated with a given antenna. Therefore in case of missing data the
   user needs to consider at which station the data had been acquired and
   perform the offline retrieval from the corresponding provider(s).

Let me address now the online complete delivery case.

- For most missions using the online complete delivery mode, this is done
only as a precaution. As long as things are working nominally, the rate at
which the mission control system reads the telemetry is at least as high as
the space link rate and therefore no backlog builds up on the provider side.

- Even if a small backlog built up, in general the post-pass or tear-down
time of the station will be long enough to transfer this backlog before the
SIs get removed. If the transfer is still ongoing, the process that would
normally remove the SIs runs into an error condition and the operator is
notified. At that point the operator will in general check with the mission
and permit the continuation of the data flow while the station is being
configured for the next mission. If the operator cannot talk to the mission
(e.g. unmanned operations), he will abort the SLE services and the mission
will have to retrieve the data later in offline mode.

   WH: The feature of extending an online provision period on the fly is
needed only in exceptional cases. I would however regard the capability to
schedule support such that provision periods of different missions can
overlap very important (my apologies for my ignorance of what SM supports in
this respect).

- We have a few cases where the space link rate is higher than what the
mission control system can digest and where as a consequence a backlog always
builds up. For those missions the pass timelines are made such that the the
SIs are not removed at the end of the pass and therefore the mission can
continue with the data flow. Once again we expect the mission to let us know
when they are done so that the SIs can be removed.

The current SLE provider implementation supports up to four concurrent return
sessions, i.e. up to four spacecraft can be services concurrently. This limit
dictates that at some point the SIs have to be removed so that there is room
for the next mission to be supported. The prime driver for this limit is that
we want to monitor individual service instances. When allowing even more
concurrent return sessions, the SI monitoring becomes unmanageable from a
human user perspective.

In terms of bandwidth management the offline retrieval is not regarded
critical. All such transfers fall into the so called default class which will
get the bandwidth not consumed by other higher priority traffic. What we
observe for our network is that in general the mission control system is
limiting the throughput rather than the comms line. The online complete
traffic uses a dedicated higher priority throughput class where the
scheduling is done such that we support concurrency of missions only up to
the point where the sum of the concurrent data streams fits into the
guaranteed bandwidth allocated to this class.

   WH: That is hard to say as it will depend on how cost for ground
communication bandwidth will evolve and to which extent agencies can afford
having ample bandwidth lifting the need to carefully manage that bandwidth.
Besides, even when combining online complete and offline to one delivery
mode, there is nothing that prevents us from having one SI assigned to a port
providing for access to a priority comms class and another using the default
class.

In conclusion, what we have today works and satisfies the missions' needs.
What is not really satisfactory is the dependency on voice loop interactions.
Improvements in that area are certainly conceivable. For instance, SIs could
be removed when for a TBD time no telemetry flow was observed. Likewise, one
could build into the system a mechanism that automatically loads the required
offline SI when a BIND attempt to such SI is being made and the required
resources are available. But for now we do not have concrete plans in that
regard.

Please let me know if you have any further questions on this matter.

Best regards,

Wolfgang

  From:       "John Pietras"
<john.pietras at gst.com>                                                                                          

  To:         <Wolfgang.Hell at esa.int>, "Stoloff, Michael J (318H)"
<michael.j.stoloff at jpl.nasa.gov>, <css-csts at mailman.ccsds.org>,

<smwg at mailman.ccsds.org>
  Date:       06/02/2012 19:58
  Subject:    [Css-csts] Effects of varying network bandwidth on SLE
service    instance provision period
  Sent by:
css-csts-bounces at mailman.ccsds.org                                                                                             

CSTSWG and SMWG colleagues ---
Recently, the NASA Space Network Ground Segment Sustainment (SGSS) Project
has been considering the need to schedule terrestrial network bandwidth for
playback of telemetry data. In the legacy SN system, they schedule both the
playback process itself as well as reserve the network bandwidth. The
question was raised about the effects of knowing or not knowing the available
terrestrial bandwidth on providing SLE offline services. In a note to SGSS
Project personnel, I pointed out that there is no single definitive answer,
but that it depends on (among other things) how service instance provision
periods are “scheduled”. For illustrative purposes, I identified two ends of
a spectrum.

At one end, the network can configure offline service instances with
relatively-narrow service instance provision periods; e.g., a specific
45-minute period could be scheduled. If the terrestrial bandwidth that is
available during that window is sufficient to transfer the volume of data
requested by the user in the START operation, then everything is fine, but if
the bandwidth is not sufficient, the service instance will peer-abort when
the end of the period arrives and all requested data have not yet been
delivered. The legacy Space Network schedules data playback in a mode that is
toward this end of the spectrum.

At the other end of the spectrum, the service instance provision period is
essentially boundless. The service instance is enabled for binding whenever
the user wants to “pull” the data. As long as the requested data are still
available in the data store, they can be transferred. The service instance
can take as long as necessary to transfer (because the end of the service
instance provision period is unbounded): slower terrestrial links mean longer
transfer times but they don’t otherwise inhibit the transfer. In my
discussion with Wolfgang Hell on offline SLE several years ago, I came away
with the impression that this would be the preferred approach given
sufficient resources.

How do the today’s offline SLE implementations (e.g., Estrack and DSN)
currently fall on this spectrum? Are the access windows tightly defined,
unbounded, or somewhere in between (e.g., a service instance is scheduled for
a given mission with the same 4-hour provision period every day?

Of course, the unbounded-provision-period end of the spectrum implies that
transfer service instances are accessible to all offline users all the time,
which has implications for resources. I am under the impression that many (if
not most or all) SLE implementations must be dedicated to a single user at a
time. It seems to me that this doesn’t necessarily have to be the case – it
should be possible to implement offline SLE so that the resources are pooled
so that N offline service instances could be enabled (ready to bind) without
having an SLE “processor” dedicated to each of them. Does your network’s
implementation dedicate SLE resources on a one-for-one basis, or is there
some degree of resource sharing?

While I’m asking about offline services, let me ask a related question about
complete online delivery mode. When a user (mission) schedules a pass with
complete online return SLE service, how is the end of the service instance
provision period determined? Is it (a) requested by the mission in the
service request (or, for rule-based scheduling, pre-specified in the
scheduling rules), or (b) calculated by the network on a pass-by-pass basis
and configured accordingly?

As you may know, in the Blue-1 version of Service Management, the mechanism
for setting complete (as well as timely) online service instance provision
periods is through start and stop time relative offsets specified in Transfer
Service Profiles. The relative offsets are with respect to the space link
carriers with which they are attached. This allows the service instance
provision periods to “float” in the scheduling process with the flexibilities
applied to the scheduling of the space link carriers. For example, profile
RAF7 could have a start-time offset of 0 seconds, and a stop-time offset of
+300 seconds. If profile RAF7 is applied to a space link carrier profile that
is used to schedule a return space link carrier from 1100 to 1115, the RAF
service instance associated with profile RAF7 will be scheduled from 1100 to
1120 (1115 plus 5 (300 seconds)).

The Blue-1 approach may be too simplistic, and as we develop requirements for
the next generation of Service Management, I would like to collect
information on how it is actually done today, and more importantly, how it
might be better done in the future.

Thanks in advance for your help.

Best regards,
John

This message and any attachments are intended for the use of the addressee or addressees only. The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its content is not permitted. If you received this message in error, please notify the sender and delete it from your system. Emails can be altered and their integrity cannot be guaranteed by the sender.

Please consider the environment before printing this email.