[Moims-dai] Re: Minutes of today telecon (4th September 2014)

Mike Martin tahoe_mike at sbcglobal.net
Fri Sep 5 00:19:34 UTC 2014


Hi all

I would appreciate any comments you have on my PAIS wrap-up document.  I 
would like to send at least parts of it to NASA and CCSDS managers so 
they understand why NSSDC and PDS are rejecting PAIS even though it is 
based on earlier NSSDC work.

Thanks, Mike

On 9/4/14, 3:25 PM, Boucon Daniele wrote:
> Dear all,
> Please find below the minutes of today telecon (4th September), _with
> new actions__._
> Don't hesitate to correct or complete.
> _Main topics discussed_: ICP/LTDP terminology and ISEE test case.
> _Not discussed _(next telecon):
> *METS test case: first comments, organization (email from Daniele 13^th
> August)
>
>  2. * DAI registry
>     _http://www.sanaregistry.org/r/daixml/daixml.html_(link towards
>     CCSDS publication page)
>  3. * PAIS and ISO (20104)
>  4. * Next meeting preparation (topics, availability, constraints for
>     agenda)
>  5. * Review of actions (email from John 10/08)
>
> _New topic to __be __discussed:_
> *schedule of CWE projects
>
> 6.
>
> _Next telecons_*: *
> ** **Monday 22nd**September**(teleco**n**with LTDP on
> terminology)****(**TBC: **9h US time, 15h EU time)*
> ** Tuesday 23**^rd **September**(9h US time, 15h EU time)*
> ** Thursday 9**^th **October****(9h US time, 15h EU time)*
> ** Thursday 30 October **(9h US time, 15h EU time)*
> _If available, I propose to use Dave's number:_
> Dave Williams
> Telecon information
> +1-877-954-3555      USA Toll free
> +1-517-224-3191      Others
> Passcode:    8506950
> _New__number__, if the older one is no more avilable____(be careful, new
> number__):_
> Dave Williams
> Telecon information
> +1-720-259-6462      USA Toll free
> +1-844-467-6272      Others
> Passcode: 841727
> Best regards,
> Daniele
> ___________________________________________________________________________________________________________
> *4 **September**minutes*
> DB: Daniele Boucon
> DG: David Giaretta
> DS: Don Sawyer
> JG: John Garrett
> MM: Mike Martin
> SM: Stephane Mbaye
> WG: all
> *D=Decision, A=action**(other = discussion)*
>
>  1. *Discussion on LTDP/CCSDS terminology****(**see**emails
>     **DG****3rd****September**, **JG 4**^th **September**).*
>
> LTDP Curation/stewardship and Preservation: curation is not defined in
> the CCSDS terminology.
> LTDP Data valorization: doesn’t heard about this term before in this
> context. WG prefers the concept of “adding value”, “data enhancement” …
> DS: propose to use the dictionary definition for curation.
> Also look at the DCC glossary
> _http://www.alliancepermanentaccess.org/index.php/consultancy/dpglossary_
> In a generic process (ICP), the term “information” should be used
> instead of “Data”.
> Strategy to go on (and to prepare the 22^nd September meeting with LTDP):
>
>  1. Identification of the new terms used by LTDP and that should be
>     added in the CCSDS terminology
>
>   * Curation
>   * Adding value
>   * Data records
>   * Preserved data set content
>
>  2. For the other terms, do a mapping between terms and OAIS existing
>     definitions by using as input the existing LTDP terminology
>
> *=> A-(**JG+**DB):***John extracts from the IPC document the terminology
> in a separate document, adds the new terms to be defined in a generic
> way with first proposal. Exchanges with Daniele. Document to be sent to
> the group around the 16^th for comments by the WG.
> *=>A-(MM):* analyses the Architecture Space Information Management
> terminology.
>
>  2. *ISEE test case****(**see**emails **DS **3rd**September**).*
>
> DS: tabular form? indentation should be improved.
> DS: proposes to have a separate section dealing with software:
> prototypes/software developed as a basis to support modeling and
> generation of SIPs.
> *Next meeting preparation*
> A=>A-(DB): initiates an agenda for the next CCSDS fall meeting.
> *Review of actions*
> DB and JG review the list and actions for update (will be sent by John).
> *End of **4 September **minutes*
> ___________________________________________________________________________________________________________
> *Part of historical minutes to be reminded*
> ___________________________________________________________________________________________________________
>   2. *_GREEN BOOK_*
> *2.1 **_TOC for test cases_*
> ** on the content of the sections,*
> ** on the title of the sections*
> ** on the following question:*is that better to describe the SIP
> constraints with the MOT (in section 6.1.3), or in a global section on
> SIPs (section 6.1.4)?
> Discussion:
> Brief introduction about the sub-TOC by Stephane.
> John and Don: /6.1.4.1 SIP constraints, included in 6.1.3/
> *Current Structure:*
> *6*... *Use Cases. 6-1*
> 6.1/NAME/– /TITLE/.. 6-1
> Description
> 6.1.1 /TEST_CASE_//NAME/DATA SET
> 6.1.2 /TEST_CASE_//NAME/MOT AND SIP CONSTRAINTS
> 6.1.4 /TEST_CASE_//NAME/SIPS
> *Decision on the Structure:*
> *6*... *Use Cases*
> 6.1/NAME/– /TITLE/.
> 6.1.1 Context And Benefits
> /Contains description + what this test case shows/
> /                give explanation at the beginning of each test case of
> the specificity of the test case, and how the method applies/
> 6.1.2 Objects to be transferred
> same as /TEST_CASE_//NAME/DATA SET/: contains the description of all the
> information (data, documents, …) that have to be transferred, and how
> this  information is organized on the producer side/
> 6.1.3 Model OF OBJECTS For transfer and sip constraints
> /contains the description of the mot and the sip types and sequencing
> constraints/
> 6.1.4 sips
> /Contains the description if  SIP IMPLEMENTATION AND TRANSFER/
> /Contains an example of SIP manifest./
> ___________________________________________________________________________________________________________
> *_2. GREEN BOOK_*
> *2.2 **_Test cases review_*
> *_2.2.3 ESA-SAFE test case_*
> The 2 models have been shown during the 19-20 March LTDP meeting.
> Exchanges between CNES, Gael and ESA on data sets.
> First version of the green book part -> to be first reviewed by Daniele
> and Stephane, then send to the group.
> ==> _Action Daniele (with Stephane)_: review the ESA SAFE test case,
> then send it to the group.
> LTDP actions (from 7th November meeting)-> *_closed_*during the 19-20
> March meeting
> 25.3 	CNES/GAEL 	LTDP_WG#26 	Model the SIP Data Objects at the level of
> SAFE Metadata and Data Object in order to allow the transfer of
> sub-parts of SAFE products in different SIPs
> 25.4 	ESA/PS 	End November 2013 	Gather any useful documents to allow
> the prototype development/tailoring and provide the EO-SIP of ERS for Test.
>
> *_2.3 ISEE1-2  test case_*
> Test case nearly finished.
> Mail sent by Don 5th May "Partial update for Section 6.1": material for
> "ISEE - a typical use case"
> MM: the history of the test case is not useful in the Green Book (was ok
> in the yellow book). Needs to show clear test cases -> highlight what
> the test case brings.
> DB: this test case shows the main features of the PAIS. Could have
> interest showing the differences between how the data are organized on
> the Producer side, how the data are modeled in the MOT (intermediary
> repositories may not be modeled with groups), how they are gathered in SIPs?
> SM: the attached files (descriptors …) could be put at the end of each
> specific test case part, and then at the end we'll see how to organize
> the whole document gathering the different test cases.
> *_2.4 COROT  test case_*
> Daniele explains that CNES is pushing to get the CoRoT L0 data use case
> (from Daniele and Stephane) – for information this has to be finished by
> 13 May to get a chance to be used for L1 data.
> *_2.5  METS  test case_*
> Daniele had a meeting (9 April) with BNF that could help building a METS
> implementation of SIPs (documents).
> Daniele introduces the need from BnF of transferring references to
> objects instead of the target objects themselves.
> Stephane: this is ok for XFDU
> => _Action Stephane_:   Provide example of XFDU with referenced data
> objects (remote URL or URI)
> Don finds a priori relevant
> Decision: Nothing to do on that topic until further inputs from Bn
> *_2. PDS4  test case _*
> Mail sent by Mike (5h May): 2 figures and text for section 6.3
> "Planetary Data System - a non XFDU implementation"
> Discussion on figure "PDS4 Bundle and Collection linkages": this is the
> current PDS4 Bundle description.
> Data are localized via URN.
> MM: this figure is an overview of what the PDS4 bundle looks like.  Same
> for every PDS archive. The test data sent before conform to this generic
> figure.
> Figure "PDS4 SIP Organization"
> MM: for NSSDC, show that Manifest is better than bundle level.
> Mike has sent a complete email to Daniele concerning the CNES PAIS
> software (issues, context and associated files).
> For the moment, Daniele is looking for a MAC machine to re-play the
> scenario in order to be able to undrestand the problems.
> Mike will try it on a PC.
> => _Action Daniele_: deliver the CNES proto software to John (ASAP)
> There was a brief discussion of Mike’s e-mail (4th December) addressing
> the PDS4 Bundle organization and Don’s e-mail response (19th December)
> with possible modeling options.  Mike agreed that a hierarchy is a
> requirement for a bundle.  Don gave two options.  The first is a single
> Collection Descriptor holding a single Transfer Object, where the
> ‘group’ capability of the TO is used to completely model the bundle with
> the option of cutting off the modeling at any point by use of the
> ‘undescribed’ attribute.  This approach would be consistent with past
> PDS/NSSDC practice of transferring a PDS volume to NSSDC with the only
> requirement on NSSDC being the ability to return the volume unaltered.
> In other words, PDS did not expect NSSDC to look into the volume and
> provide any services, such as replacement, at a lower level of
> granularity.  The second option would be to model the bundle as composed
> of multiple Transfer Objects under multiple Collection Descriptors. For
> example, there would be a set of Collection Descriptors corresponding to
> the PDS4 standard named collections, and under each of these there would
> be Transfer Objects and possibly more Collection Descriptors.   There
> are a lot of possibilities for how the resultant model might look.  The
> advantage of modeling a bundle as consisting of multiple TOs is that it
> facilities replacement at the TO level, rather than the whole bundle, if
> that becomes a requirement.  Regardless, it should not be too difficult
> to have PDS4/NSSDC specific software what would start from a basic
> bundle model and then create a complete MOT by examining an existing
> bundle.  It could be integrated with a SIP builder, but it would
> probably be preferable to have the MOT exchange with NSSDC prior to any
> SIP exchange to reduce confusion and ensure both parties were ‘on the
> same page’.
> Don agreed to look at Mike's example Descriptors to see how they might
> fit into the above options.
> ==> _action Mike_: send samples of data
> What could really help in the use of the PAIS by the PDS, is a proposal
> of implementation without use of XFDU. To be done.
> Suggestion from John: use XFDU, and then an XSLT style sheet to
> translate into another language (to be confirmed, not sure my
> understanding was correct).
> A PDS product = a Transfer Object.
> A bundle = PDS tree data set. The size of the bundle is 1.7 Go, but
> files inside are quite small, around 250 ko).
> ==> _Action Stephane_: propose a SIP implementation without XFDU
> ==> _action all_: send comments on Mike's information on PDS4 sample
> (email 30th October)
> 22 April telecon: From file : met_abstract_sip_manifest.xml
> MM: The file is a _flat_ list of PAIS elements from SIP Model (PAIS BB
> §6) and some elements deriving from XMAN (PDS4).
> SM: sent a tc5-pds4-20140401.pdf providing an example of Non-XFDU SIP
> implementation also reusing the SIP Model elements from PAIS BB §6, but
> with a structure of nested elements.
> DS: reminds that nesting is not mandatory with respect to the PAIS BB §5
> abstract SIP model. The structure of the manifest may depend from the
> context and what the Producer and the Archive actually want to convey.
> SM: agrees that the provided example(s) are not implementing the PAIS BB
> §5 from scratch but are all reusing the SIP Model elements built for
> XFDU but getting rid of the XFDU elements. This may be confusing.
> SM: the elements of the SIP Model built for XFDU require a nested
> structure, at least to identify to which Group occurrence a Data Object
> belongs to, or to which Transfer Object occurrence a Group instance
> belongs to. Alternate elements could have allowed a flat structure as
> for XMAN.
> WG: The WG agrees that this in-between implementation may be confusing
> in the GB and a fully independent implementation should be preferred.
> WG: agrees that a table, a checklist or any other means that could help
> a user validating./tracking the level of implementation could be helpful
> in the GB e.g. is the Goup ID identified, is the SIP ID provided, other
> IDs, MIME types, etc.
> __________________________________________________________________________________________________________
>   2. *_GREEN BOOK_*
> The Green Book has been split into several files, one for the core, and
> one per test case.
> => _Action Stephane_: provide a description of the document breakdown
> and links to the shared repository
> *2.3 **_Core document_*
> Section 4.1.1 reviewed (ok)
> Tabular representation is welcome but question remains about their
> systematic use (need an XML version in annex ?)
> CCSD0014: equivalent to TO Descriptor, and CCSD0015: equivalent to
> Collection Descriptor
> ==> _Action Stephane_ from MM: explain more the CCSD0015: how it is
> registered and reference where more information could be found on that
> subject
> Don:  descriptorModelID has to be changed on any XSD change
> (specialization), the Archive has to maintain the versions
> Section 4.6.1 about PAIS XSD description should remain here for the time
> being
> John: The SANA registry is supposed to reference the (latest ?) PAIS XSD
> only (TBC). It is however sure that it should not contain any other
> resource e.g software, XML examples, etc.
> « open » enumeration technique is cumbersome (TBC)
> ==> _Action (All):_ table in section 4.6.4 should be reviewed for
> (during) next telecon
> Comment from previous telecons on method
> * (Daniele): there are steps that conform to the PAIMAS process (first
> model, SIP constraints, then transfer and validation).
> * Link between the data base on the Archive side, and the PAIS XML
> elements: example on how to match both (core document, test case?).
> The following paragraph will be suppressed if ok:
> *_XML namespace for PAIS _*
> John suggested that we pass the proposal on namespaces by SANA and
> Nestor and Peter as XML Co-chairs to make sure they agree.
> ==> _action Stephane (20130821)_*/: /*send proposal on namespace to
> SANA, Nestor and Peter, and ask if they have any objection to our proposal.
> Use previous email sent by John to introduce Stephane in the group.
> It is agreed that not all positions where a pais :any element are
> possible have to be documented in the GB. Only a few example are
> necessary or even one.
> More concrete example should be provided than the abstract
> « foo »/« bar » currently proposed. Typical example would be a
> Collection holding a pais :any with the author of the descriptor, the
> Collection name/ID in the Producer semantic, or anything else that could
> be specific to the Producer or the Archive side but not provided in the
> PAIS definitions.
> For time constraints, the WG jumped to the section 4.6.4 of the draft GB.
> SM: Reminded that « true » restrictions of XML Schema guarantees that
> the original PAIS XML Schema’s rule are still applicable. Any instance
> following a restriction follows the original ones.
> SM: The use of restrictions does not impose any system to use the
> derived PAIS schemas. Therefore, the restricted schemas have not to be
> shared with any user of the produced SIPs. The project specific schemas
> could even be discarded without losing control of the produced SIPs.
> => D – (DS) The rightmost column of the table in §4.6.4 shall be renamed
> « Restriction » instead of « Content »
> MM: It is not clear that this table should be kept in that form or
> discarded at the end, but the target should not spend too much pages on
> that topic.
> WG: restrictions may be interesting for implementers and as such should
> be documented, but it is not clear if this should be proposed as a
> recommendation or a best practice.
> SM: reminded that restricting elements such as the maxOccurrence’s
> should be a recommended practice since it can be very difficult to
> implement interoperable software exchanging elements of
> xs:nonNegativeInteger type.
> SM: proposed to add prepared templates of restricting XML Schemas in
> annex. Something that could help implementers to quickly setup the
> restrictions of their needs.
> WG: adding XML Schema’s in annex may not be so helpful because cut and
> paste from PDF may be very cumbersome.
> => D – (JG) these XSD shall be placed side to the originals.
> SM: The problem is similar to other GB resources as the use case
> descriptors or the software prototypes.
> ___________________________________________________________________________________________________________
> *_Preservation process _*
> DB: LOTAR representative is ok with "preservation", but not with
> "curation" (this word is not used in the community).
> MM: links should be made between MTDP and warehousing. Not clear with
> the structure of the process for the moment.
> Discussion on terminology: preservation vs curation.
> LTDP definitions:
>
>   * *Preservation:* aims at the generation of a single, consistent,
>     consolidated and validated */“EO Missions/Sensors Dataset” /*and at
>     ensuring its long term integrity, discovery, accessibility and
>     usability. It is focused on an individual Mission/Sensor or on a
>     multi-mission Dataset (when one Master Dataset is made up of data
>     coming from different missions/sensors) and tailored according to
>     its specific preservation/curation requirements. It consists of all
>     activities needed to ensure “EO Missions/Sensors Dataset” bit
>     integrity over time and to optimize (in terms of format and
>     coverage) its (re)use in the long term (e.g. through metadata and
>     catalogue improvement, algorithms evolutions and related
>     (re)processing, linking and improvement of context/provenance
>     information).
>   * *Curation: *aims at establishing and increasing the value of */“EO
>     Missions/Sensors Datasets” /*over their lifecycle, at favouring
>     their exploitation through the combination with other Datasets and
>     at extending their user base. It includes the activities for the
>     definition of the preservation objectives, for the coordination and
>     management of Data Time Series and Collections (e.g. from similar
>     sensor family) in support to specific applications. It includes
>     international cooperation activities
>
> OAIS definitions:
> *Long Term Preservation*: The act of maintaining information,
> Independently Understandable by a Designated Community, and with
> evidence supporting its Authenticity,
> over the Long Term.
> *Authenticity*: The degree to which a person (or system) regards an
> object as what it is purported to be. Authenticity is judged on the
> basis of evidence.
> Authenticity: in the sense of "original". This is not crucial in the
> domain of scientific data, but is an issue. Integrity could be a way to
> prove authenticity.
> We note that the LTDP definitions are not clear nor completely coherent.
> There is a mixture of both preservation/curation concepts in both
> definitions.
> Furthermore, the  group thinks that the term "curation" used in the LTDP
> definition does not fit the usual usage, another term should be used.
> The sens is nearer of knowledge management.
> Daniele will send tomorrow a summary of her comments.
> => _Action Daniele_: ask David for preservation/curation definition.
> => _Action all_: give comments and proposals as input for the LTDP
> terminology and steps of workflow. -> *_for Monday at the latest._*
> _Previous discussion:_
> To be done: analyse and suggest, at different stages of the project
> (during data lifecycle), what should be done on an archiving point of view.
> Example of main issue: keep documentation up to date (when changes in
> formats, processing, … are made on the data).
> The LTDP has written a document "PDSC" (Preservation Data Set Content),
> explaining what kind of information (data, software, documents, …)
> should be collected and at what step of the project.
> The subject is wide, Mike asks to focus on specific parts.
> Daniele explains her point of view: develop a "model" (magenta book)
> gathering all the basic components of the preservation activity
> (selection and appraisal, data and metadata preparation, access,
> maintenance …) that should cover all the data lifecycle (even when data
> don't exist), in order to be able to answer the following question: what
> should be done from the beginning till the end to be able to preserve
> data, and at what moment? This should be done in a generic way, making
> links on standards related to basic components when they exist.
> Suggestion to ask Barbara Sierman of existing works concerning this
> topic. Mail sent the 7/02 by Daniele.
> ==> _Action Daniele_: write and send more precise elements on this
> process to the group.
> ==> _Action all_: send comments.
> Nestor will send for the new project approval once the ¨PAIS BB has been
> published.
> Daniele underlines the need for CNES and LTDP group. CCSDS expertise
> will be very important.
> The PDS has its internal process ; for other agencies (particularly the
> LTDP member agencies) this process doesn't exist and is required.
> Question on how to make the link between a global process and the PAIMAS
> phases.
> 20131030 Don's email and following discussions:
> Don: The process could be focussed on the Archive point of view, and
> seen an internal OAIS issue for workflow, using then more OAIS concepts.
> The "provenance" is practically a big issue, going back to the original
> information.
> "Reprocessing, curation, stewardship" could be maped with OAIS
> migrations, new versions, …
> Update procedures exist at NSSDC.
> Daniele: appraisal should be the starting point. Workflow, on the
> Archive side, could begin early in a space project, even when data don't
> yet exist ; link with the Archive side of the PAIMAS.
> => _action all_: exchange on high level process (3 main steps
> preparation/preservation/maintenance) and links with PAIMAS phases for
> return to the LTDP group.
> ==> _action Daniele_: follow the work of LTDP group on preservation
> process, and send all available information to the DAI group on this
> subject.
> Need for return on the preservation process from all.
> ___________________________________________________________________________________________________________
> *_Other subjects _*
>
> *_OAIS Magenta Book (French version)_***
> Daniele has received the complete updated French version.
> Complete validation to be performed now (text and figures)
> This version will be also validated by the French National Archives.
> Due to the amount of work on the PAIS, and priority to the PAIS green
> book, this will be treated after.
> *_XML schema for DEDSL _*
> Seems possible for CNES to write the document on the model on the
> existing other DEDSL standards. At CNES, XML schema for DEDSL is already
> created and implemented.
> Prototypes: CNES has already tested it on operational tool. This could
> play the role of prototype, and this could be enough.
> Nestor explained John that a 2nd prototype is required.
> A possibility could be to produce not a blue book, but a document that
> won't required 2 prototypes (orange book). To be more discussed.
> ____________________________________________________________________________________________




More information about the MOIMS-DAI mailing list