[Smwg] RE: [Css-csts] Effects of varying network bandwidth on SLE service instance provision period

John Pietras john.pietras at gst.com
Wed Feb 15 08:26:55 EST 2012


Marcin,
Thanks for the response. It will help us in identifying holes (if any) in our concepts for Next Generation Service Management.

Regards,
John

-----Original Message-----
From: Marcin.Gnat at dlr.de [mailto:Marcin.Gnat at dlr.de] 
Sent: Wednesday, February 15, 2012 8:22 AM
To: John Pietras
Cc: michael.j.stoloff at jpl.nasa.gov; css-csts-bounces at mailman.ccsds.org; smwg at mailman.ccsds.org; css-csts at mailman.ccsds.org; Wolfgang.Hell at esa.int
Subject: RE: [Smwg] RE: [Css-csts] Effects of varying network bandwidth on SLE service instance provision period

Hi John,

Just as an input for you from DLR, we do it principally very similar like ESA.

In most cases the online services run in either timely or complete mode (depending what is required) which does not make big difference (no backlog).
We've had only one mission up to now, where it has started with "timely" and quickly noticed that there are frequent drop-outs of data (the terrestrial link had not enough bandwidth to accommodate full TM stream (RAF)). Project could not accept that, as they used online service to feed their archive (where later on the scientific data was recovered from). So we switched to "complete" and we've really had a backlog, which build up to 1 minute or so. 

We do not use SLE offline services (we have principally such possibility, but just nobody uses that as far now). Offline data is transferred on the means of "good, old" FTP. Typically the offline data is available to be picked up in few/several minutes after the pass. There is some time period how long the data is available (varies from station to station, but several days at least)  - depending on data volume. All the "meta-information" about staring sooner/later and stopping the services is communicated via our network operator and voice-loops. This means, if some mission has SLE complete mode, and there is some significant backlog, it is being communicated to the operator, who will wait for the finalization. 

Also, mostly the RCF is used in online mode, and also in most cases the housekeeping only is send over the link (such way we can sustain pretty low bandwidth network). All other Virtual Channels (with for example scientific data) are part of the offline transfer (which in case of 64kbps link can take some time ;-). 

Our SLE user can support 8 service instances simultaneously, and we have anyway for each mission we support a separate instance of whole SLE user running permanently. Thus there is no real reason for strict planning of SLE resources (which would maybe require "hard" cut of service session). We also have "permanent" SI like Wolfgang says for ESA the case is. We just load and unload them as needed.

Best Regards
Marcin


-----Original Message-----
From: smwg-bounces at mailman.ccsds.org [mailto:smwg-bounces at mailman.ccsds.org] On Behalf Of Wolfgang.Hell at esa.int
Sent: Mittwoch, 8. Februar 2012 13:32
To: John Pietras
Cc: Stoloff, Michael J (318H); css-csts-bounces at mailman.ccsds.org; smwg at mailman.ccsds.org; css-csts at mailman.ccsds.org
Subject: [Smwg] RE: [Css-csts] Effects of varying network bandwidth on SLE service instance provision period

John,

Thank you very much for following up on this. It is always useful to get feedback from a person not directly involved with the system under discussion.

I did not have a chance to look at SCCS-SM with respect to scheduling of SLE online and offline transfer services and I'm afraid that due to other commitments I won't get the chance to do so in the next days. But regardless, please find below some clarifications interspersed with your comments.

Best regards,

Wolfgang


                                                                                                                                              
  From:       "John Pietras" <john.pietras at gst.com>                                                                                           
                                                                                                                                              
  To:         <Wolfgang.Hell at esa.int>                                                                                                         
                                                                                                                                              
  Cc:         <css-csts at mailman.ccsds.org>, <css-csts-bounces at mailman.ccsds.org>, "Stoloff, Michael J (318H)"                                 
              <michael.j.stoloff at jpl.nasa.gov>, <smwg at mailman.ccsds.org>                                                                      
                                                                                                                                              
  Date:       07/02/2012 21:36                                                                                                                
                                                                                                                                              
  Subject:    RE: [Css-csts] Effects of varying network bandwidth on SLE service      instance provision period                               
                                                                                                                                              





Wolfgang,
Thanks very much for your response. My comments are interleaved below.

I don't know how much you've had a chance to look at how the SCCS-SM Book handles scheduling of online and offline SLE transfer services.

Best regards,
John

-----Original Message-----
From: Wolfgang.Hell at esa.int [mailto:Wolfgang.Hell at esa.int]
Sent: Tuesday, February 07, 2012 5:35 AM
To: John Pietras
Cc: css-csts at mailman.ccsds.org; css-csts-bounces at mailman.ccsds.org; Stoloff, Michael J (318H); smwg at mailman.ccsds.org
Subject: Re: [Css-csts] Effects of varying network bandwidth on SLE service instance provision period

John,

Here is a brief summary of how the current ESA implementation works. As it is the simpler case, let me start with addressing the offline case:

- Any mission for which we have taken the commitment to provide telemetry also in offline delivery mode can activate an offline retrieval in parallel with real-time support, i.e. during a scheduled pass at the given station, because we include the Offline service instances in the what we refer to as return spacecraft session on our SLE provider. In other words, a mission having detected that they missed something in real time can perform the retrieval during the next pass at that station. The risk is that the retrieval might be aborted in case the data flow did not complete until the end of the pass. If that risk is obvious from the beginning, the mission can talk to the link operator during the pass and request that the offline SIs will not be removed at the end of the pass. In the vast majority of cases the extended accessibility of the offline SIs will be granted. We then expect the mission to let us know when they are done and at that point the SIs will be removed.

   <JP - Presumably, there are separate offline and online SIs, even though
   they
   are enabled (have their service instance provision periods) concurrently.

   WH: That is correct.

   When you say "remove the SI", are you talking about just disabling the
   ability to bind to the SI, or is the stored data also deleted then? If the
   SI is simply disabled, how is the stored data eventually removed? E.g.,
   is there an allocated amount of storage dedicated to that mission that is
   discarded FIFO, is there a time limit for storage, or does the mission
   tell
   network operations (by voice?) when it is okay to purge a period of data
   from the data store?

   WH: We are using 'permanent' SIs, i.e. the service provision period is
   undefined and the actual provision period is determined by the period
   during which the SIs are loaded. The loading (and removing) of the SIs is
   part of the station configuration activities. For the vast majority of our
   missions the configuration is performed by a pass timeline (that we call
   ground station schedule) which is generated by the ESTRACK Management
   System (EMS) and executed automatically by the station M&C system (the
   Station Computer). In general, the SIs are accessible by the client
   shortly after BoA (Begin of Activity) until EoA (End of Activity). We
   refer to the removing of the SIs as 'archiving'. As stated before, the SIs
   are 'permanent' and for the next pass of the given mission they are
   retrieved form the archive and loaded and then accessible by the client
   mission. As discussed in my previous email, there are some cases where the
   SIs are not archived when the EoA event is reached. We do not use the
   'end' reason in the UNBIND at all. The link operator is informed via voice
   loop that the mission data transfer is completed and will then archive the
   SIs of that mission.

   The telemetry storage is managed completely independent of the SIs. In
   essence, a kind of FIFO is implemented by means of a storage management
   script. When support is committed to a mission, we make sure that the data
   volume and retention period required by a given mission can be met in the
   light of the total storage requirements.

   Going back to a recent email exchange, when you say that you expect the
   mission to let you know when they are done so that the Sis can be removed,
   it seems like this could be done by UNBINDing with 'end'. Is that the way
   it is done, or is it by voice?

   WH: As stated above, by voice.

   To the level of detail provided in your description, it sounds like the
   mechanism in SCCS-SM Blue-1 is sufficient to support this mode of
   operation. Off the top of my head I don't know if SCCS-SM allows (or
   explicitly disallows) a user extending an offline SI while it is
   executing,
   but that is something that should be easy to allow (as always, it may be
   that the request to extend may be rejected because the SLE provider
   equipment has already been allocated elsewhere for the requested extended
   period).
   />

  WH: I concur.

- In case missing data are urgently needed or a pass with the station where the missing data have been acquired is not scheduled for quite some time, the offline retrieval can also be requested via voice loop outside a pass of that mission. The link operator will then load the SIs manually and the mission can perform the offline retrieval in parallel with real time support of another mission. Again we expect the mission to tell us when they are done so that the SIs can be removed again.

- It would also be possible to to schedule offline return session the same way as real time passes are scheduled. However, at least for the missions we are currently supporting, offline retrieval is the exception rather than the rule used to recover from some anomalies encountered during the real time support. Therefore this possibility is currently not used in practice.

   <JP - This is apparently a difference between Estrack and the NASA Space
   Network, where for some missions offline playback is the only mechanism
   Used for some of the data (e.g., science data).
   />

   WH: Your observation is correct. There is a further perhaps even more
   important difference. In the DSN case, all return services are provided by
   a central facility which also encompasses a central telemetry storage. As
   a consequence, the user does not need to care at which antenna or complex
   the data of interested had been acquired. In the ESA case, the SLE
   providers including their telemetry storage for offline delivery mode are
   associated with a given antenna. Therefore in case of missing data the
   user needs to consider at which station the data had been acquired and
   perform the offline retrieval from the corresponding provider(s).

Let me address now the online complete delivery case.

- For most missions using the online complete delivery mode, this is done only as a precaution. As long as things are working nominally, the rate at which the mission control system reads the telemetry is at least as high as the space link rate and therefore no backlog builds up on the provider side.

- Even if a small backlog built up, in general the post-pass or tear-down time of the station will be long enough to transfer this backlog before the SIs get removed. If the transfer is still ongoing, the process that would normally remove the SIs runs into an error condition and the operator is notified. At that point the operator will in general check with the mission and permit the continuation of the data flow while the station is being configured for the next mission. If the operator cannot talk to the mission (e.g. unmanned operations), he will abort the SLE services and the mission will have to retrieve the data later in offline mode.

   <JP - The rules for behavior in SCCS-SM B-1 don't support the capability
   to extend the service instance provision period of an online SI,
   but it is something that could be accommodated via the reconfiguration
   mechanism if the network is able to support that reconfiguration (as
   Estrack
   apparently is). However, the reconfiguration approach would be quite
   awkward
   (for reasons I won't go into here) and so if we want to support "on the
   fly"
   extension of the SI provision period then SMWG should address the issue
   directly.
   />

   WH: The feature of extending an online provision period on the fly is needed only in exceptional cases. I would however regard the capability to schedule support such that provision periods of different missions can overlap very important (my apologies for my ignorance of what SM supports in this respect).

- We have a few cases where the space link rate is higher than what the mission control system can digest and where as a consequence a backlog always builds up. For those missions the pass timelines are made such that the the SIs are not removed at the end of the pass and therefore the mission can continue with the data flow. Once again we expect the mission to let us know when they are done so that the SIs can be removed.

The current SLE provider implementation supports up to four concurrent return sessions, i.e. up to four spacecraft can be services concurrently. This limit dictates that at some point the SIs have to be removed so that there is room for the next mission to be supported. The prime driver for this limit is that we want to monitor individual service instances. When allowing even more concurrent return sessions, the SI monitoring becomes unmanageable from a human user perspective.

In terms of bandwidth management the offline retrieval is not regarded critical. All such transfers fall into the so called default class which will get the bandwidth not consumed by other higher priority traffic. What we observe for our network is that in general the mission control system is limiting the throughput rather than the comms line. The online complete traffic uses a dedicated higher priority throughput class where the scheduling is done such that we support concurrency of missions only up to the point where the sum of the concurrent data streams fits into the guaranteed bandwidth allocated to this class.

   <JP - in looking ahead to CSTS services, where complete online and offline
   are
   combined into the single complete service delivery mode, what changes (if
   any)
   do you see in operational approach? Will, for example, the bandwidth
   allocation
   system have to distinquish between the complete service instances that are
   actually carrying "playback" data (and thus relegate them to the default
   class)
   and the complete service instances that really are online and thus need
   the
   priority allocation?
   />

   WH: That is hard to say as it will depend on how cost for ground communication bandwidth will evolve and to which extent agencies can afford having ample bandwidth lifting the need to carefully manage that bandwidth.
Besides, even when combining online complete and offline to one delivery mode, there is nothing that prevents us from having one SI assigned to a port providing for access to a priority comms class and another using the default class.

In conclusion, what we have today works and satisfies the missions' needs.
What is not really satisfactory is the dependency on voice loop interactions.
Improvements in that area are certainly conceivable. For instance, SIs could be removed when for a TBD time no telemetry flow was observed. Likewise, one could build into the system a mechanism that automatically loads the required offline SI when a BIND attempt to such SI is being made and the required resources are available. But for now we do not have concrete plans in that regard.


Please let me know if you have any further questions on this matter.

Best regards,

Wolfgang


  From:       "John Pietras"
<john.pietras at gst.com>                                                                                          

  To:         <Wolfgang.Hell at esa.int>, "Stoloff, Michael J (318H)"
<michael.j.stoloff at jpl.nasa.gov>, <css-csts at mailman.ccsds.org>,

<smwg at mailman.ccsds.org>
  Date:       06/02/2012 19:58
  Subject:    [Css-csts] Effects of varying network bandwidth on SLE service    instance provision period
  Sent by:
css-csts-bounces at mailman.ccsds.org                                                                                             






CSTSWG and SMWG colleagues ---
Recently, the NASA Space Network Ground Segment Sustainment (SGSS) Project has been considering the need to schedule terrestrial network bandwidth for playback of telemetry data. In the legacy SN system, they schedule both the playback process itself as well as reserve the network bandwidth. The question was raised about the effects of knowing or not knowing the available terrestrial bandwidth on providing SLE offline services. In a note to SGSS Project personnel, I pointed out that there is no single definitive answer, but that it depends on (among other things) how service instance provision periods are “scheduled”. For illustrative purposes, I identified two ends of a spectrum.

At one end, the network can configure offline service instances with relatively-narrow service instance provision periods; e.g., a specific 45-minute period could be scheduled. If the terrestrial bandwidth that is available during that window is sufficient to transfer the volume of data requested by the user in the START operation, then everything is fine, but if the bandwidth is not sufficient, the service instance will peer-abort when the end of the period arrives and all requested data have not yet been delivered. The legacy Space Network schedules data playback in a mode that is toward this end of the spectrum.

At the other end of the spectrum, the service instance provision period is essentially boundless. The service instance is enabled for binding whenever the user wants to “pull” the data. As long as the requested data are still available in the data store, they can be transferred. The service instance can take as long as necessary to transfer (because the end of the service instance provision period is unbounded): slower terrestrial links mean longer transfer times but they don’t otherwise inhibit the transfer. In my discussion with Wolfgang Hell on offline SLE several years ago, I came away with the impression that this would be the preferred approach given sufficient resources.

How do the today’s offline SLE implementations (e.g., Estrack and DSN) currently fall on this spectrum? Are the access windows tightly defined, unbounded, or somewhere in between (e.g., a service instance is scheduled for a given mission with the same 4-hour provision period every day?

Of course, the unbounded-provision-period end of the spectrum implies that transfer service instances are accessible to all offline users all the time, which has implications for resources. I am under the impression that many (if not most or all) SLE implementations must be dedicated to a single user at a time. It seems to me that this doesn’t necessarily have to be the case – it should be possible to implement offline SLE so that the resources are pooled so that N offline service instances could be enabled (ready to bind) without having an SLE “processor” dedicated to each of them. Does your network’s implementation dedicate SLE resources on a one-for-one basis, or is there some degree of resource sharing?

While I’m asking about offline services, let me ask a related question about complete online delivery mode. When a user (mission) schedules a pass with complete online return SLE service, how is the end of the service instance provision period determined? Is it (a) requested by the mission in the service request (or, for rule-based scheduling, pre-specified in the scheduling rules), or (b) calculated by the network on a pass-by-pass basis and configured accordingly?

As you may know, in the Blue-1 version of Service Management, the mechanism for setting complete (as well as timely) online service instance provision periods is through start and stop time relative offsets specified in Transfer Service Profiles. The relative offsets are with respect to the space link carriers with which they are attached. This allows the service instance provision periods to “float” in the scheduling process with the flexibilities applied to the scheduling of the space link carriers. For example, profile
RAF7 could have a start-time offset of 0 seconds, and a stop-time offset of
+300 seconds. If profile RAF7 is applied to a space link carrier profile 
+that
is used to schedule a return space link carrier from 1100 to 1115, the RAF service instance associated with profile RAF7 will be scheduled from 1100 to
1120 (1115 plus 5 (300 seconds)).

The Blue-1 approach may be too simplistic, and as we develop requirements for the next generation of Service Management, I would like to collect information on how it is actually done today, and more importantly, how it might be better done in the future.



Thanks in advance for your help.

Best regards,
John




This message and any attachments are intended for the use of the addressee or addressees only. The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its content is not permitted. If you received this message in error, please notify the sender and delete it from your system. Emails can be altered and their integrity cannot be guaranteed by the sender.

Please consider the environment before printing this email.



More information about the Css-csts mailing list