[Moims-dai] RE: Agenda for tomorrow DAI telecon (4th September)

Wed Sep 3 19:48:10 UTC 2014

What time?

From: Boucon Daniele [mailto:Daniele.Boucon at cnes.fr]
Sent: 03 September 2014 11:26
To: MOIMS-Data Archive Ingestion
Cc: Gilles; D or C Sawyer; Tavernier
Subject: [Moims-dai] Agenda for tomorrow DAI telecon (4th September)

Dear all,

Please find below the proposed agenda for tomorrow telecon:

*ISEE Test case
* PAIS PDS-NSSDC Wrapup Document (email from Mike, 27th August)

* METS test case: first comments, organization (email from Daniele 13th August)
2.      * Information Curation Process: comments on documents sent by John (email from John, 11th August), terminology (email from Daniele, 3rd September)
3.      * DAI registry http://www.sanaregistry.org/r/daixml/daixml.html (link towards CCSDS publication page)
4.      * PAIS and ISO (20104)
5.      * Next meeting preparation (topics, availability, constraints for agenda)
6.      * Review of actions (email from John, 21st August)
7.
Next telecons:

* Tuesday 23rd September
* Thursday 9th October

If available, I propose to use Dave's number (be careful, new number):
Dave Williams
Telecon information
+1-720-259-6462      USA Toll free
+1-844-467-6272      Others
Passcode:    841727

Best regards,

Daniele
___________________________________________________________________________________________________________
14 August minutes

DB: Daniele Boucon
DS: Don Sawyer
JG: John Garrett
MM: Mike Martin
SM: Stephane Mbaye
WG: all

D=Decision, A=action (other = discussion)

1.    Comments on the SAFE green book section (from emails MM 22nd July, DB 13th August).

1.      MM: Might want to include a url for SAFE documentation.
DB: agreement on:
"SAFE (Standard Archive Format for Europe, see reference [X] ) is an Earth Observation data archiving format standardized through the efforts of several European national, institutional and industrial space stakeholders. It aims at covering the role of OAIS AIP"

·        Action DB:  ask ESA for the reference to the SAFE 2.0 standard documentation.

2.      MM: E1 first paragraph, I don't think many people will understand the second sentence.  "It aims ..."  AIP should be spelled out.  Maybe something like "It provides a specification for the organization and content of an OAIS compatible archive information package."

DB: ok, with the exact meaning "Archival Information Package (AIP)"

3.      MM: E1 second paragraph, "These data are from the European Remote Sensing Satellite (ERS) Synthetic Aperture Radar (SAR)."

DB: change to "These data are samples and subset from the European Remote Sensing Satellite (ERS) Synthetic Aperture Radar (SAR), and tailored for the scope of this test case."

4.  MM: E1 third paragraph.  "In the context of this use case," is kind of complicated. How about "In this use case" instead.  Also, 'submission to an archive" instead of "submission into".  "The approach followed in this tutorial intends to show the possibility of:"  should be "This tutorial illustrates".

DB: agree with these proposals.

MM: As you can see, I am finding grammatical issues with every single paragraph.  Someone needs to carefully edit all the text in this section, but I don't have time to do in now.

·        Action all (at the end): overall reading of the document (consistency, coherence, grammatical issues, ...)

5.      MM: The diagram Figure E-4 shows the three sub-collections of the root collection.  I would expect to see a relation-association in the root that "contains" each of these sub collections.?  So I guess the arrows in the figures are generated by the parentCollection specification in each child collection.

DB: (not sure I correctly  understand the issue). You're right: In the CNES tool, the plain arrows are designed from the root towards the leaves, while actually the link in the XML files are specified into the sub collections or the leaves towards the parent element (the element "parentCollection").

MM: I brought this issue of downward and upward pointers or associations up in our discussions last year and Stephane seemed to think the upward pointers were the appropriate ones.  I am concerned that having two ways of specifying the same thing may be dangerous.  Anyway, I think there should be some guidance on this issue.  The question is: why does Don include "contains" associations from the root collection to the transfer objects in his example when Daniele doesn't include them between the collection and sub collections in her example?

DB: the "contains" association ("relationType) is not mandatory at all. I guess it is for better description and understanding. The "parentCollection = NASA_ESA_CNES_Test_Data_Exchange_02" in the Transfer Object is sufficient to explain the parent-child relationships between both nodes.
That's why in the ESA SAFE case this association has not been put over the parent-child relationship.

6.      MM: Why are the relations in the SAFE test case at the dataObject level and not at the transfer object level as they are in the ISEE test case?

DB: Up to the Producer and Archive to decide what makes sense.

In the "Simple case" where there are 4 group types, the association is described from the exact data object inside the right group type.
In the "Detailed case", the association could have been drawn from the Transfer Object, because there is only one group type and one data object inside. The choice of the data object as source of the association is to draw the parallel between simple and detailed case. In this case, both descriptions are possible.

Discussion on "open" possibilities in the standard, such as the way to express associations.
MM: informal expression of elements in the MOT -> practical use?
DS: different ways to model the same situation -> not easy to understand.
DS: Try to normalize the diagram?
MM: make the documentation clearer.

7.      MM: Why are the groupTypeStructureName entries "SET" in the simple case and "Directory" in the detailed case?

DB:  In both cases the "physical" high level group for the products on the Producer side is a directory and could have been viewed as "directory" "EO_PRODUCTS".  In fact this level has not been modeled.  So in the simple case we have a group (mandatory) of  one zip file that is the product itself -> viewed as a "set", in the detailed case we have a group (mandatory) of one directory that is the product itself containing different files -> viewed as a "directory".

DS: give more explanation in the tutorial (see point 2 below).

·        Action DB: give explanation on the reason why using group structure name = "directory" in the detailed case, and "set" in the simple case (from Minutes 14th August).
·        Action DB: updated version of the ESA SAFE test case with comments from the 14th August telecon.

2.    Overall comments on Green Book (from emails MM 22nd July, DB 13th August).

1.      MM: Should show one example of a hierarchy of collections.

DB: agree, with advice on the choice of collections: link between the physical organization on the Producer side, the physical organization of the delivery, the design of the MOT.

·        Action SM: check there is a section about the hierarchy of collections in the tutorial. If not, add it with example and explanation (from 14th August telecon)

2.      MM: Tutorial should provide a short demonstration of every type of group structure: directory, set, sequence or undescribed structure. [Note: the word "undescribed" does not show up in some spelling dictionaries]

DB: agree. From discussion with other people, seems not so easy to understand, particularly the fact that there is no direct link with the SIP model (delivery), except the "transferObjectGroupInstanceName in the case of "directory" in the sipTranferObjectGroupType), and that, a "Set" is physically included in a directory.
Questions that occur: What should be modeled? Which levels? Links or not with the delivery (way data are delivered to the Archive)?

MM:tutorial should contain precise examples (same case/SIP with the different possibilities, when to use "directory" or not, "sequence" ...).

·        Action SM: check there is a section in the tutorial about the type of group structure (directory, set, sequence, undescribed). If not, add it with the same case showing the different possibilities and why use one or the other (from 14th August telecon).

3.      MM: Tutorial should demonstrate the use of dataObjectTypeFileOccurrence.

DB: agree, also to avoid potential confusion with dataObjectTypeOccurrence.

·        Action SM: check there is a section in the tutorial about the dataObjectTypeOccurrence (from 14th August telecon).

3.    Comments on ISEE section (from emails MM 22nd July, DB 13th August, DS 14th August).

1.  MM: Page 1, second paragraph, this sentence doesn't logically follow the preceding sentence.  "This gives the Archive the ability to apply some automation in reviewing the received SIPs so they can be checked for conformance to the agreed plans, and this helps to reduce errors."
DS: Yes, I've changed the order of the last 2 sentences.

2.   MM: Page 2, first paragraph, I am uncomfortable with this wording.
"The tree hierarchical levels"

DB: the hierarchical tree structure?
DS: I've deleted 'tree'

4.      MM: Page 2, last line, "how it should be broken into", I would prefer "divided into" or some other wording.

DS: Don:  I've made it 'divided'

5.      Page 3, first paragraph, "Since the Transfer Object is the smallest unit of data that can be sent in a SIP and Transfer", I don't think this is worded quite right.  The transfer object is the smallest unit that can be deleted or replaced, but not the smallest unit that can be sent.

DS: the Transfer Object can't actually be divided for sending. Another wording?
DS: The key is that a Transfer Object can't be split across SIPs.  I've updated the wording.

6.      Page 3, third paragraph, "extensive validation requirement", should be "requirements".

DD and DS: ok

7.      Page 3.  I got lost reading paragraph 5.  It seems like it repeats a lot of the previous paragraph but adds a few details.  I think it needs to be simplified, building on the previous paragraph.

DS:  The 4th paragraph talks about the types of Transfer Objects, while the 5th paragraph talks about the actual results of matching the Descriptor with the directory.  However I've made a few updates to the 4th paragraph for clarity, and I've deleted the 5th paragraph because it is now redundant with the next section discussion the MOT.

8.      Page 4. The MOT section is redundant to what was on page 3.  Is all the redundancy required?
DS:  No, see above.

9.      Page 5, paragraph 2, The words "data" and "metadata" preceding "Transfer Objects" should be in bold type to match earlier paragraphs.
DS:  I've taken away the bold attribute.

10.   Page 5, section 6.1.2, why the underlined words?
DS:  They are not there when track changes is turned off, unless I missed something.

11.   Discussion on the SIP Manifest (email from DS 14th August).

The problem is that the CU organization among the SIP Manifest is not coherent with the organization of the Groups in the Descriptor.
·        SIP Manifest should be corrected, and see why not correctly generated by the SIP builder?
·        Validation on the SIP Manifest not done at the organizational level (perhaps only at the ID level) in the CNES proto.

·        Action SB: correct the generation of the ISEE SIP Manifest from the SIP Builder

·        Action DB: understand the validation at the group structure level from the CNES proto.

CU organization among the SIP should be:

XFDU CU Transfer Object for data
        XFDU CU Satellite Group ISEE1
              XFDU CU Yearly Group 1978
                  XFDU CU data file 0001
                  XFDU CU data file 0002
                  XFDU CU data file 0003 (The above is consisted with the new manifest, but then the manifest continues with a new XFDU CU Satellite Group.  Instead it should continue with the next satellite, ISEE2, as a group  with the same year, as follows.)

        XFDU CU Satellite Group ISEE2
              XFDU CU Yearly Group 1978
                  XFDU CU data file ...
                  XFDU CU data file ...
                  XFDU CU data file ...
(Then a new Transfer Object)

XFDU CU Transfer Object for data
        XFDU CU Satellite Group ISEE1
              XFDU CU Yearly Group 1979
                  XFDU CU data file ...
                  XFDU CU data file ...
                  XFDU CU data file ...
        XFDU CU Satellite Group ISEE2
              XFDU CU Yearly Group 1979
                  XFDU CU data file ...
                  XFDU CU data file ...
                  XFDU CU data file ...

(Then a third Transfer Object)
  XFDU CU Transfer Object for data
        XFDU CU Satellite Group ISEE1
              XFDU CU Yearly Group 1980
                  XFDU CU data file ...
                  XFDU CU data file ...
                  XFDU CU data file ...
        XFDU CU Satellite Group ISEE2
              XFDU CU Yearly Group 1980
                  XFDU CU data file ...
                  XFDU CU data file ...
                  XFDU CU data file ...

·        Action DS: generate a new ISEE test case version with the updated SIP Manifest

4.    Other topics

JG: RAC standard for bodies now ISO standard. First training class planned first week of October (John is one of the teachers).

End of 14 August minutes
___________________________________________________________________________________________________________
Part of historical minutes to be reminded
___________________________________________________________________________________________________________
 2. GREEN BOOK
        2.1 TOC for test cases

* on the content of the sections,
* on the title of the sections
* on the following question: is that better to describe the SIP constraints with the MOT (in section 6.1.3), or in a global section on SIPs (section 6.1.4)?

Discussion:
Brief introduction about the sub-TOC by Stephane.
John and Don:         6.1.4.1 SIP constraints, included in 6.1.3

Current Structure:
6... Use Cases. 6-1
6.1    NAME - TITLE.. 6-1
Description
6.1.1    TEST_CASE_NAME DATA SET
6.1.2    TEST_CASE_NAME MOT AND SIP CONSTRAINTS
6.1.4    TEST_CASE_NAME SIPS

Decision on the Structure:
6... Use Cases
6.1    NAME - TITLE.
6.1.1    Context And Benefits
         Contains description + what this test case shows
                give explanation at the beginning of each test case of the specificity of the test case, and how the method applies
6.1.2    Objects to be transferred
         same as TEST_CASE_NAME DATA SET: contains the description of all the information (data, documents, ...) that have to be transferred, and how this  information is organized on the producer side
6.1.3    Model OF OBJECTS For transfer and sip constraints
         contains the description of the mot and the sip types and sequencing constraints
6.1.4    sips
         Contains the description if  SIP IMPLEMENTATION AND TRANSFER
         Contains an example of SIP manifest.

___________________________________________________________________________________________________________
2. GREEN BOOK

        2.2 Test cases review

2.2.3 ESA-SAFE test case
The 2 models have been shown during the 19-20 March LTDP meeting.
Exchanges between CNES, Gael and ESA on data sets.
First version of the green book part -> to be first reviewed by Daniele and Stephane, then send to the group.

==> Action Daniele (with Stephane): review the ESA SAFE test case, then send it to the group.

        LTDP actions (from 7th November meeting)-> closed during the 19-20 March meeting
25.3

CNES/GAEL

LTDP_WG#26

Model the SIP Data Objects at the level of SAFE Metadata and Data Object in order to allow the transfer of sub-parts of SAFE products in different SIPs

25.4

ESA/PS

End November 2013

Gather any useful documents to allow the prototype development/tailoring and provide the EO-SIP of ERS for Test.

2.3 ISEE1-2  test case
Test case nearly finished.

Mail sent by Don 5th May "Partial update for Section 6.1": material for "ISEE - a typical use case"

MM: the history of the test case is not useful in the Green Book (was ok in the yellow book). Needs to show clear test cases -> highlight what the test case brings.

DB: this test case shows the main features of the PAIS. Could have interest showing the differences between how the data are organized on the Producer side, how the data are modeled in the MOT (intermediary repositories may not be modeled with groups), how they are gathered in SIPs?

SM: the attached files (descriptors ...) could be put at the end of each specific test case part, and then at the end we'll see how to organize the whole document gathering the different test cases.

2.4 COROT  test case

Daniele explains that CNES is pushing to get the CoRoT L0 data use case (from Daniele and Stephane) - for information this has to be finished by 13 May to get a chance to be used for L1 data.

2.5  METS  test case

Daniele had a meeting (9 April) with BNF that could help building a METS implementation of SIPs (documents).
Daniele introduces the need from BnF of transferring references to objects instead of the target objects themselves.
Stephane: this is ok for XFDU

=> Action Stephane:   Provide example of XFDU with referenced data objects (remote URL or URI)

Don finds a priori relevant

Decision: Nothing to do on that topic until further inputs from Bn

2. PDS4  test case

Mail sent by Mike (5h May): 2 figures and text for section 6.3 "Planetary Data System - a non XFDU implementation"

Discussion on figure "PDS4 Bundle and Collection linkages": this is the current PDS4 Bundle description.
Data are localized via URN.

MM: this figure is an overview of what the PDS4 bundle looks like.  Same for every PDS archive. The test data sent before conform to this generic figure.

Figure "PDS4 SIP Organization"
MM: for NSSDC, show that Manifest is better than bundle level.

Mike has sent a complete email to Daniele concerning the CNES PAIS software (issues, context and associated files).
For the moment, Daniele is looking for a MAC machine to re-play the scenario in order to be able to undrestand the problems.
Mike will try it on a PC.

=> Action Daniele: deliver the CNES proto software to John (ASAP)

There was a brief discussion of Mike's e-mail (4th December) addressing the PDS4 Bundle organization and Don's e-mail response (19th December) with possible modeling options.  Mike agreed that a hierarchy is a requirement for a bundle.  Don gave two options.  The first is a single Collection Descriptor holding a single Transfer Object, where the 'group' capability of the TO is used to completely model the bundle with the option of cutting off the modeling at any point by use of the 'undescribed' attribute.  This approach would be consistent with past PDS/NSSDC practice of transferring a PDS volume to NSSDC with the only requirement on NSSDC being the ability to return the volume unaltered.  In other words, PDS did not expect NSSDC to look into the volume and provide any services, such as replacement, at a lower level of granularity.  The second option would be to model the bundle as composed of multiple Transfer Objects under multiple Collection Descriptors.  For example, there would be a set of Collection Descriptors corresponding to the PDS4 standard named collections, and under each of these there would be Transfer Objects and possibly more Collection Descriptors.   There are a lot of possibilities for how the resultant model might look.  The advantage of modeling a bundle as consisting of multiple TOs is that it facilities replacement at the TO level, rather than the whole bundle, if that becomes a requirement.  Regardless, it should not be too difficult to have PDS4/NSSDC specific software what would start from a basic bundle model and then create a complete MOT by examining an existing bundle.  It could be integrated with a SIP builder, but it would probably be preferable to have the MOT exchange with NSSDC prior to any SIP exchange to reduce confusion and ensure both parties were 'on the same page'.

Don agreed to look at Mike's example Descriptors to see how they might fit into the above options.

==> action Mike: send samples of data

What could really help in the use of the PAIS by the PDS, is a proposal of implementation without use of XFDU. To be done.
Suggestion from John: use XFDU, and then an XSLT style sheet to translate into another language (to be confirmed, not sure my understanding was correct).

A PDS product = a Transfer Object.
A bundle = PDS tree data set. The size of the bundle is 1.7 Go, but files inside are quite small, around 250 ko).

==> Action Stephane: propose a SIP implementation without XFDU

                ==> action all: send comments on Mike's information on PDS4 sample (email 30th October)

22 April telecon: From file : met_abstract_sip_manifest.xml

MM: The file is a flat list of PAIS elements from SIP Model (PAIS BB §6) and some elements deriving from XMAN (PDS4).

SM: sent a tc5-pds4-20140401.pdf providing an example of Non-XFDU SIP implementation also reusing the SIP Model elements from PAIS BB §6, but with a structure of nested elements.

DS: reminds that nesting is not mandatory with respect to the PAIS BB §5 abstract SIP model. The structure of the manifest may depend from the context and what the Producer and the Archive actually want to convey.

SM: agrees that the provided example(s) are not implementing the PAIS BB §5 from scratch but are all reusing the SIP Model elements built for XFDU but getting rid of the XFDU elements. This may be confusing.

SM: the elements of the SIP Model built for XFDU require a nested structure, at least to identify to which Group occurrence a Data Object belongs to, or to which Transfer Object occurrence a Group instance belongs to. Alternate elements could have allowed a flat structure as for XMAN.

WG: The WG agrees that this in-between implementation may be confusing in the GB and a fully independent implementation should be preferred.

WG: agrees that a table, a checklist or any other means that could help a user validating./tracking the level of implementation could be helpful in the GB e.g. is the Goup ID identified, is the SIP ID provided, other IDs, MIME types, etc.

__________________________________________________________________________________________________________
 2. GREEN BOOK

The Green Book has been split into several files, one for the core, and one per test case.

=> Action Stephane: provide a description of the document breakdown and links to the shared repository

        2.3 Core document

Section 4.1.1 reviewed (ok)
Tabular representation is welcome but question remains about their systematic use (need an XML version in annex ?)

CCSD0014: equivalent to TO Descriptor, and CCSD0015: equivalent to Collection Descriptor

==> Action Stephane from MM: explain more the CCSD0015: how it is registered and reference where more information could be found on that subject

Don:  descriptorModelID has to be changed on any XSD change (specialization), the Archive has to maintain the versions

Section 4.6.1 about PAIS XSD description should remain here for the time being

John: The SANA registry is supposed to reference the (latest ?) PAIS XSD only (TBC). It is however sure that it should not contain any other resource e.g software, XML examples, etc.

« open » enumeration technique is cumbersome (TBC)

==> Action (All): table in section 4.6.4 should be reviewed for (during) next telecon

Comment from previous telecons on method
* (Daniele): there are steps that conform to the PAIMAS process (first model, SIP constraints, then transfer and validation).
* Link between the data base on the Archive side, and the PAIS XML elements: example on how to match both (core document, test case?).

The following paragraph will be suppressed if ok:
XML namespace for PAIS

John suggested that we pass the proposal on namespaces by SANA and Nestor and Peter as XML Co-chairs to make sure they agree.

==> action Stephane (20130821): send proposal on namespace to SANA, Nestor and Peter, and ask if they have any objection to our proposal.
Use previous email sent by John to introduce Stephane in the group.

It is agreed that not all positions where a pais :any element are possible have to be documented in the GB. Only a few example are necessary or even one.

More concrete example should be provided than the abstract « foo »/« bar » currently proposed. Typical example would be a Collection holding a pais :any with the author of the descriptor, the Collection name/ID in the Producer semantic, or anything else that could be specific to the Producer or the Archive side but not provided in the PAIS definitions.

For time constraints, the WG jumped to the section 4.6.4 of the draft GB.

SM: Reminded that « true » restrictions of XML Schema guarantees that the original PAIS XML Schema's rule are still applicable. Any instance following a restriction follows the original ones.

SM: The use of restrictions does not impose any system to use the derived PAIS schemas. Therefore, the restricted schemas have not to be shared with any user of the produced SIPs. The project specific schemas could even be discarded without losing control of the produced SIPs.

=> D - (DS) The rightmost column of the table in §4.6.4 shall be renamed « Restriction » instead of « Content »

MM: It is not clear that this table should be kept in that form or discarded at the end, but the target should not spend too much pages on that topic.

WG: restrictions may be interesting for implementers and as such should be documented, but it is not clear if this should be proposed as a recommendation or a best practice.

SM: reminded that restricting elements such as the maxOccurrence's should be a recommended practice since it can be very difficult to implement interoperable software exchanging elements of xs:nonNegativeInteger type.

SM: proposed to add prepared templates of restricting XML Schemas in annex. Something that could help implementers to quickly setup the restrictions of their needs.
WG: adding XML Schema's in annex may not be so helpful because cut and paste from PDF may be very cumbersome.

=> D - (JG) these XSD shall be placed side to the originals.

SM: The problem is similar to other GB resources as the use case descriptors or the software prototypes.

___________________________________________________________________________________________________________

Preservation process

DB: LOTAR representative is ok with "preservation", but not with "curation" (this word is not used in the community).

MM: links should be made between MTDP and warehousing. Not clear with the structure of the process for the moment.

Discussion on terminology: preservation vs curation.

LTDP definitions:
·        Preservation: aims at the generation of a single, consistent, consolidated and validated "EO Missions/Sensors Dataset" and at ensuring its long term integrity, discovery, accessibility and usability. It is focused on an individual Mission/Sensor or on a multi-mission Dataset (when one Master Dataset is made up of data coming from different missions/sensors) and tailored according to its specific preservation/curation requirements. It consists of all activities needed to ensure "EO Missions/Sensors Dataset" bit integrity over time and to optimize (in terms of format and coverage) its (re)use in the long term (e.g. through metadata and catalogue improvement, algorithms evolutions and related (re)processing, linking and improvement of context/provenance information).
·        Curation: aims at establishing and increasing the value of "EO Missions/Sensors Datasets" over their lifecycle, at favouring their exploitation through the combination with other Datasets and at extending their user base. It includes the activities for the definition of the preservation objectives, for the coordination and management of Data Time Series and Collections (e.g. from similar sensor family) in support to specific applications. It includes international cooperation activities

OAIS definitions:
Long Term Preservation: The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity,
over the Long Term.
Authenticity: The degree to which a person (or system) regards an object as what it is purported to be. Authenticity is judged on the basis of evidence.

Authenticity: in the sense of "original". This is not crucial in the domain of scientific data, but is an issue. Integrity could be a way to prove authenticity.

We note that the LTDP definitions are not clear nor completely coherent. There is a mixture of both preservation/curation concepts in both definitions.

Furthermore, the  group thinks that the term "curation" used in the LTDP definition does not fit the usual usage, another term should be used. The sens is nearer of knowledge management.
Daniele will send tomorrow a summary of her comments.

=> Action Daniele: ask David for preservation/curation definition.

=> Action all: give comments and proposals as input for the LTDP terminology and steps of workflow. -> for Monday at the latest.

Previous discussion:

To be done: analyse and suggest, at different stages of the project (during data lifecycle), what should be done on an archiving point of view.

Example of main issue: keep documentation up to date (when changes in formats, processing, ... are made on the data).

The LTDP has written a document "PDSC" (Preservation Data Set Content), explaining what kind of information (data, software, documents, ...) should be collected and at what step of the project.

The subject is wide, Mike asks to focus on specific parts.

Daniele explains her point of view: develop a "model" (magenta book) gathering all the basic components of the preservation activity (selection and appraisal, data and metadata preparation, access, maintenance ...) that should cover all the data lifecycle (even when data don't exist), in order to be able to answer the following question: what should be done from the beginning till the end to be able to preserve data, and at what moment? This should be done in a generic way, making links on standards related to basic components when they exist.

Suggestion to ask Barbara Sierman of existing works concerning this topic. Mail sent the 7/02 by Daniele.

==> Action Daniele: write and send more precise elements on this process to the group.

==> Action all: send comments.

Nestor will send for the new project approval once the ¨PAIS BB has been published.

Daniele underlines the need for CNES and LTDP group. CCSDS expertise will be very important.

The PDS has its internal process ; for other agencies (particularly the LTDP member agencies) this process doesn't exist and is required.

Question on how to make the link between a global process and the PAIMAS phases.

20131030 Don's email and following discussions:
Don: The process could be focussed on the Archive point of view, and seen an internal OAIS issue for workflow, using then more OAIS concepts.
The "provenance" is practically a big issue, going back to the original information.
"Reprocessing, curation, stewardship" could be maped with OAIS migrations, new versions, ...

Update procedures exist at NSSDC.

Daniele: appraisal should be the starting point. Workflow, on the Archive side, could begin early in a space project, even when data don't yet exist ; link with the Archive side of the PAIMAS.

=> action all: exchange on high level process (3 main steps preparation/preservation/maintenance) and links with PAIMAS phases for return to the LTDP group.

==> action Daniele: follow the work of LTDP group on preservation process, and send all available information to the DAI group on this subject.

Need for return on the preservation process from all.

___________________________________________________________________________________________________________
Other subjects

OAIS Magenta Book (French version)

Daniele has received the complete updated French version.
Complete validation to be performed now (text and figures)
This version will be also validated by the French National Archives.
Due to the amount of work on the PAIS, and priority to the PAIS green book, this will be treated after.

XML schema for DEDSL
Seems possible for CNES to write the document on the model on the existing other DEDSL standards. At CNES, XML schema for DEDSL is already created and implemented.
Prototypes: CNES has already tested it on operational tool. This could play the role of prototype, and this could be enough.

Nestor explained John that a 2nd prototype is required.

A possibility could be to produce not a blue book, but a document that won't required 2 prototypes (orange book). To be more discussed.

____________________________________________________________________________________________

-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ccsds.org/pipermail/moims-dai/attachments/20140903/eda2946c/attachment.html>