[Moims-dai] Todays Telecon: PDI on PDI

John Garrett garrett at his.com
Tue Jun 12 05:39:27 UTC 2018


Hi,

 

I think we are converging on an understanding of preservation of PDI objects.

 

I still support SC#222 which we had previously agreed on.  Some comments on Don’s comments below.

 

Peace and joy,

-JOhn

 

From: MOIMS-DAI [mailto:moims-dai-bounces at mailman.ccsds.org] On Behalf Of D or C Sawyer
Sent: Monday, June 11, 2018 11:57 PM
To: MOIMS DAI List <moims-dai at mailman.ccsds.org>
Subject: Re: [Moims-dai] Todays Telecon: PDI on PDI

 

Hi David,

 

I’m pleased that you like the ‘Preserved Data Object’ modeling view (not surprised as it is much of what you’ve been pushing),  but I can’t agree to SC222 for the reasons given below.  I think my proposal below combines the best of both and should eliminate the Content Information controversy discussed in the last 2 telecoms.  

 





On Jun 9, 2018, at 4:47 PM, David Giaretta <david at giaretta.org <mailto:david at giaretta.org> > wrote:

 

Hi Don

 

You almost have it but, to be logically consistent, it needs another step.

 

The “Preserved Data Object (PDO)” term would be very useful, especially as an intermediate term between the Information Object and AIP.

 

 

Yes, I think I expressed this when I mentioned that successive application should lead to an AIP,  except that it doesn’t include the Packaging Information and the Descriptive Information which are an important part of our defined AIP.

 

At first look, I like the idea of PDO.  But of course will need more time to think it through.

One concern I have is that there could be lots of places where the concept could be used and it could cascade at this late date into many needed updates in OAIS.   So I will reserve my agreement until I can get an idea of how many updates are needed and whether and how quickly we can finish them.  Our deadline for completing OAIS was our last CCSDS meeting.





You wrote 

“The only provision I would add is that both the Rep. Info. and the PDI would need to be optional in the sense that a ‘naked’ or ‘partially dressed’ Data Object needs to be allowed in order to stop the recursion.  Of course the stopping criteria are different between the Rep. Info. and the PDI.” 

Yes, that is why in the Information Object diagram the RepInfo recursion shows 1 to * since * can be anything from 0 upwards. The link to PDI is 1:1 currently but we would have to change this to 1:*.  We would need some text to discuss how the recursion ends. 

 

 

I fully agree.

 

I disagree.  I think an Information Package always requires PDI.  Otherwise what is the difference between an Information Object and an Information Package?





So far so good.

 

But, in order to actually preserve something, we really do need all the components of an AIP i.e. we need to know how the Data Object, RepInfo and PDI are connected together, and have some overall description. In other words, whatever you want to call it, it is an AIP.

 

 

Yes, we need the full AIP and the current modeling provides this, apart from showing PDI can be recursive, which as you note could easily be added.

 

I don’t agree that we need all the components of an AIP in each and every instance to preserve something.  It is definitely useful to have it all, but is not necessary in every situation for every Information Object.  As we’ve all agreed, we need to break the recursion as some point by not having all of PDI for everything. 





In terms of an Annex – that would be helpful to contain the bulk of the discussion and examples. If we move the discussion to a Normative Annex, then the only change we need is that agreed in  <http://review.oais.info/show_bug.cgi?id=222> http://review.oais.info/show_bug.cgi?id=222 but there would have to be a small amount of additional explanatory text in the body of the standard.

 

 

I can not agree with SC222.  I note that SC222 is titled “Change PDI to be describing CDO rather than Content Information”.  This is stated to be a ‘significant’ change and I would say it is a radical, unnecessary, and counter productive change, for the following reasons:

 

I on the other hand feel it is a necessary change.  I think conceptually the change makes more sense than the current situation. 

 

1.  SC222 provides the following rationale : “While discussing other SCs, we are often confronted with situations where applying PDI to Representation Information raises significant problems. I think this change may make it easier to resolve some of the other SCs.” 

 

This statement clearly states that discussions leading to this proposal were concerned with implementations of particular views.  Further text discusses issues with possible approaches to applying the Fixity component of PDI to Representation Information and to the combined CDO with Representation Information.  All of the text clearly expresses concerns with a set of implementation approaches and therefor should immediately be suspect as a rationale for changes.  In fact it appears that not only were the approaches being considered too rigid, but the view of the information models was also too rigid because there are implementation approaches that can work.  The OAIS RM presents CONCEPTUAL MODELS and not IMPLEMENTATION MODELS.  The OAIS RM was generated to provide a common framework of terms and concepts to facilitate communication. This is easy to forget as it can easily be viewed, and often has been, as an implementation model. For example, Section 2.2 is stated to contain the only information models that an OAIS implementation needs to support in order to be an OAIS.  But what do we say in 2.2, and specifically regarding Figure 2-3, in this context?  We take a top-down approach and give a very  high level (i.e.,with little detail) view of an Information Package involving four basic types of information (Content Information, PDI, Packaging Information, and Descriptive Information) and their relationships.  We show Packaging Information as a simple container with a small divider, one side holding Content Information and the other PDI, and we state that PDI is needed to preserve the Content Information, to ensure it is clearly identified, and to understand the environment in which it was created.   We show Descriptive Information as being associated with this Information Package to facilitate finding the Content Information of interest.

 

I think the Conceptual Model was designed to be and should reflect real implementations.

Note what it does NOT say. It does NOT say how PDI is supposed to be related to Content Information, which is defined to be the Content Data Object and its Representation Information.  It could be implemented as only  applied to the Content Data Object (as SC222 proposes), it could also be implemented as applied individually to the Content Data Object and the Representation Information. For example, a registry of Representation Information objects must perform preservation and must be concerned with source, version history, and fixity (i.e., PDI) and therefore needs to maintain some level of PDI.  One would expect PDI applied to a Content Data Object and to differ from that applied to its Representation Information. There is no OAIS requirement other than being able to describe an OAIS implementation as using these high level concepts, and this can be done regardless of the implementation approach as long as these information types can be identified. On the other hand, adopting SC222 says that a conforming OAIS is expected to be able to relate PDI to its Content Data Object, AND there is no reason to expect any PDI to be applied to the Representation Information.  This is limiting the concept of PDI applicability that is contrary to actual implementations (e.g., Representation Information Registries) while such a limitation is not there now and thus would be a step backward.

 

I think this is stretching understanding of normal readers.  I think a normal person would interrupt a statement that PDI is associated with Content Information to mean that PDI applied to the total Content Information and was not applied sometimes to part of the Content Information and sometimes to all of it.  And if I can apply PDI to only part of the Content Information, why can’t I just provide PDI for the Representation Information and not the CDO?

 

Also changing the relationship of PDI to only the CDO instead of the whole Content Information, does not mean that we can’t have PDI-like information applied to Representation Information.  We’ve just been agreeing that we can preserve and applied PDI-like information to the components of PDI Information.  What is the difference with doing the same thing for RepInfo?

 

2.  SC222 proposes that PDI is to be associated only with the Content Data Object.  The only way this can be reasonable (given that clearly some Representation Information will have PDI), is to view Content Information and its Content Data Object as ANY Information Object and ANY Data Object. 

I missed last week so I guess I’m missing something here.  I don’t understand the problem here.

This would be a very major and clearly controversial, (as per your proposal and 2 recent telecons), revision to what most everyone understands Content Information to be.   It is defined as the ‘original target of preservation’, and widely understood to refer to the primary information that external providers are submitting to the OAIS for preservation.  I believe it would be a major step backward to loose the ability to clearly refer to this information category.

I think that in most cases the Producers (and the DC) think to “target of preservation” is the CDO (not the CDO as well as the RepInfo). They expect RepInfo to understand the CDO, but I don’t think most people are really considering that to be the target of the preservation.

Some examples of this, if you migrate the CDO from a particular format (which has an associated standards document describing that base format) to a new, more modern format (which has an associated standards document describing that base format), I don’t think most people would care if the standards document (or maybe the software decoding it) describing the original format is preserved.

 

 

 

3. As you’ve agreed, there is utility in a model that takes a Data Object central approach to it’s preservation as I described below and which I called a “Preserved Data Object” model.  Adding this to an Annex with a proper discussion, and as normative if you’d like, together with maintaining the current information model views augmented with making PDI recursive, provides retention of all the good and historic OAIS information modeling and terminology while allowing for the addition of this alternative view for an enhanced perspective.  It is usually productive to have more than one way to view a topic. I see this as a win-win situation and it should remove the Content Information controversy.





Cheers-

Don









Regards

 

..David

 

From: MOIMS-DAI <moims-dai-bounces at mailman.ccsds.org <mailto:moims-dai-bounces at mailman.ccsds.org> > On Behalf Of D or C Sawyer
Sent: 07 June 2018 16:35
To: MOIMS DAI List <moims-dai at mailman.ccsds.org <mailto:moims-dai at mailman.ccsds.org> >
Subject: Re: [Moims-dai] Todays Telecon: PDI on PDI

 

Hi David,

 

I believe I now understand your concept on how you want to approach the PDI on PDI issue.  It came to me when I realized that you do not disagree that showing that PDI can be recursive solves the central issue of PDI on PDI, and further that it does not require any changes in any of the existing information modeling of the current (2012) OAIS Reference Model. However you make clear in your concluding comment that this is not your preferred approach and you feel your approach is to ‘re-use the existing concepts cleanly’.  Coupling this with your past insistence on removing PDI from being applied to Representation Information, as is implied by the current view of the Information Package ( Figure 2-3 and Figure 4-13 ), led me to reeximine this changed modeling view in light of your past arguments including that this view can be applied to ANY information.  This led me to an epiphany of sorts, which I think captures the essence of what you have been arguing and that I will describe now in a generic way.

 

I want to focus on the information modeling aspects only and avoid, at least for now, any consideration of what the current OAIS RM figures and text allow or don’t allow. 

 

If one simply creates a model where PDI is associated only with the Data Object, and no longer with Representation Information, the result looks like our Information Object (Figure 4-10) with the addition of PDI being associated with the Data Object.  This new figure would show a PDI object being contained in the Information Object (and say positioned to the left of the Data Object in Figure 4-10), with an arrow from the Data Object to the PDI object labeled ‘further described by’ in analogy to Figure 4-13.  Since this is a new figure, let’s call this Information Object Plus (or IO+) to avoid controversy.  The beauty of this figure,  which I think you have in mind, is that it can be viewed as a generic view of how to preserve a Data Object.  In other words, a properly preserved Data Object should have both Representation Information and PDI.  In this sense, I think a better term for this model would be a Preserved Data Object (PDO), which term I’ll now use. From this perspective, I can understand its proposed role of being applied to ANY Data Object that the Archive is intending to preserve.  If that Data Object is a NASA generated image, it says that the preserved version should have both Rep. Info. and PDI.  If that Data Object is a Representation Information Data Object, then it should have its own Representation Information and its own PDI.  If that Data Object is a PDI Data Object, then is should also have its own Representation Information and its own PDI.  This of course leads to recursion of both Rep. Info. and PDI, both of which need to be limited and this would need to be addressed in any discussion of such a PDO model.  The only provision I would add is that both the Rep. Info. and the PDI would need to be optional in the sense that a ‘naked’ or ‘partially dressed’ Data Object needs to be allowed in order to stop the recursion.  Of course the stopping criteria are different between the Rep. Info. and the PDI.  

 

I see this as an alternative way to look at the preservation environment of Data Objects being preserved within the Archive.  I see this as a Data Object centered view, rather than an Information Package view.  This PDO model appears as a tool that can be used to look at each Data Object’s environment. Its successive application should lead to the same result as the AIP, but without the AIP’s associated Packaging Information and its Package Description (Figure 4-13).  

 

I find this PDO to be a useful view and assuming this is at least close to what you are seeing, and I have to believe it is as it seems to fit most of what you have been saying from a modeling perspective, I would recommend it be addressed in an Annex as a recursive model to be applied to Data Objects being preserved within the Archive.  It has a ‘bottom up’ aspect (starting with Data Objects) versus a ‘top down’ information aspect, and  that could be useful in the auditing efforts.  However I don’t see it as a replacement for the current models and approach which is why I recommend it be in an Annex.

 

Below I have just a few minor comments on your comments.











On Jun 5, 2018, at 2:10 PM, David Giaretta < <mailto:david at giaretta.org> david at giaretta.org> wrote:

 

Just a few corrections where Don has slightly mistaken what I said.

 

" David raises the issue that, well, the high level model view of figure 2-3 doesn’t say how the PDI is preserved"

No I did not say that; I simply asked how PDI is preserved - not referencing any particular figure. It is a general question that we need to ask.

 

I was not intending to put words in your mouth, and thanks for the clarification.  We can consider it a question I might ask.











"David mentioned that he finds most archives are not applying PDI to Representation Information, but from a conceptual model perspective this is no justification as I, for one, have given several examples where it is relevant"

It is true that I said that. The reason for my saying it was to point out that making the change proposed in  <http://review.oais.info/show_bug.cgi?id=222> http://review.oais.info/show_bug.cgi?id=222 would not cause any problems for existing archives, because it describes what they currently do while at the same time would allow a cleaner and more consistent model where OAIS concepts apply to any and all information which an OAIS requires to preserve.

 

If my understanding of your view is captured by my PDO model example, then Rep. Info. could have PDI as well. 






@Mark – how does NARA do, say, Fixity of a Content Information Object, in particular how does it deal with the Representation Information?

@Bob – ditto

@Steve - ditto

 

One thought - many things have multiple roles or labels. My dog Monty is at the same time a dog, a mammal, a friend and a living thing; in different contexts I know that any living thing needs food to continue, so the same applies to him. We use the same idea in OAIS where the OAIS can also be a Producer, and a member of a Designated Community can also be a Producer. The crux of my argument is that by simply re-using the term Content Information one can immediately re-use all the OAIS concepts. 

 

Don says

" They are clearly distinct categories of information with likely different sources and with the PDI being subservient to the Content Information."

That is true. But the same can be said of many pieces of information, even that supplied by external Producers, for example a CAD design for an aircraft and the test results for various of its components; the latter would probably not exist without the former. Should we make a distinction about difference sources and subservience i.e. one would not exist without the other and so they have to be dealt with (in this case preserved) differently, or should we just say that the reference model is good for everything?

 

I struggled to understand your last sentence until I came up with the PDO view above. It now seems to me that making these distinctions could be viewed as taking away from the perspective of being able to use a PDO like model in a recursive manner. I think the Preserved Data Object (PDO) view  is useful in addition to the current view, as I’ve stated above.

 

My point in making the statement, which you acknowledge is true, is that people think of them differently and deal with them differently in terms of the amount and types of effort applied.   In practice the different types we’ve identified will often get managed by the Archive in different ways.  For example, some Content Data is very large while Representation Information tends to be very much smaller, and this is managed in different ways.  I would say this is a different dimension to the preservation discussion from that involving the recursive use of a model focused on Data Objects. 






 

Also

" In fact, we should always try to avoid reusing the same terminology in different contexts where the meaning is different to some degree "

Surely not true!  One of the key ideas of the OAIS Information Model was to identify the commonalities between different concepts, in particular identifying Fixity, Reference, Provenance etc all as Information objects - so that we did not need to invent a plethora of nearly identical terms such as " secondary PDI" and why we did not introduce the term “secondary Representation Information”.

 

This response seems to miss my point. The terms you mention are defined to be as unambiguous  as we can make them. When we use them, we want them to mean precisely the same thing wherever they are used. Your reference to ‘secondary PDI’, for example, is simply  a way of talking one layer of abstraction down from the PDI when PDI is acknowledged to be recursive.   The same is true for Representation Information.  These are not examples in contradiction to my statement.  So I believe the statement is true.  Is there a case where we use the same OAIS concept in different places but mean something slightly different as to what the concept is? 






 

He also says

" This would be shown as a recursive relationship, but all still within the same AIP.  We could include a diagram showing this recursion of PDI on PDI."

Why show an incomplete view? Should we show all the individual components of PDI? Why not simply re-use the existing concepts cleanly?

 

I believe my PDO model example is at the heart of your concept, if not how you see instantiating it.   If not, then I’m at a loss to understand.

 

Cheers-

Don






Regards

 

..David

 

-----Original Message-----
From: MOIMS-DAI < <mailto:moims-dai-bounces at mailman.ccsds.org> moims-dai-bounces at mailman.ccsds.org> On Behalf Of D or C Sawyer
Sent: 05 June 2018 18:14
To: MOIMS DAI List < <mailto:moims-dai at mailman.ccsds.org> moims-dai at mailman.ccsds.org>
Subject: [Moims-dai] Todays Telecon: PDI on PDI

 

Dear All,

 

This is my third attempt to try to make my points clear and I think I’m improving.  You will be the judge.

 

——

OAIS provides the view, clearly shown in Figure 2-3, of the Producer role submitting SIPs to the Archive.  In figure 2-4 the SIPs are used to create the Content Information, and as Mark notes, then the PDI for that Content Information can be obtained/created and associated.  In this sense, the PDI is supplementary to the Content Information and would not exist but for the Content Information.  The result is an AIP for storage in the Archive. In the above scenario, which is at the heart of OAIS, there is no implication that the PDI described above would be taken to be Content Information.  They are clearly distinct categories of information with likely different sources and with the PDI being subservient to the Content Information.  This is the top level view that sets the context for understanding the role played by the Producer and the relationships among the types of information objects.

 

David raises the issue that, well, the high level model view of figure 2-3 doesn’t say how the PDI is preserved and this is a concern.  But note that it doesn’t say how the Content Information is preserved either.  It simply says they are preserved as part (contained) of the AIP.  So at this high level, they are all understood to be preserved as part of a specific AIP.  I think this is a fair understanding that most people will agree with.

 

Expanding the model of an AIP, we get figure 4-10 which defines an Information Object.  This applies to the Content Information because it is defined to be an Information Object.  We see that it consists of a Data Object and its Representation Information. 

 

Further expanding inside the AIP with figure 4-11 we see the Representation Information detailed as being composed of three types of Representation Information and with a recursion relation that ends at some point (not germane for this discussion). 

 

Still inside the AIP, and further clarifying the nature of PDI and Representation Information, we have figure 4-12 that describes both of them as Information Objects, which means they also have Data Objects and Representation Information.   The result at this point is a set of data objects, all logically inside a single AIP, whose number is limited because the Representation Information recursion is limited. They are preserved by virtue of their logical inclusion inside the AIP.

 

Now the question is asked, since we thought that PDI was important in the preservation of the Content Information, which is the Content Data Object and its Representation Information, why isn’t it also important to associate secondary PDI with the original PDI?  Well the answer is, it might or might not be depending on the Archive and the specific original PDI.  If it is important, how should that be conceived and shown?  Original PDI was associated with the Content Information.  By analogy, secondary PDI would be associated with original PDI.  This would be shown as a recursive relationship, but all still within the same AIP.  We could include a diagram showing this recursion of PDI on PDI.  This recursion ends when the Archive decides that it no longer makes sense to continue it.  It may be different for the different components of PDI, which are shown in table 4-1 and listed in the text.  The PDI diagram could be broken out further with these components, but I’m not sure it is worth it.  Since they are classed as Information Objects, they could also be broken out in the Representation Information dimension but I see no need and this recursion is limited anyway.

 

The point that I hope is becoming clear is that there is no need to reuse Content Information with all its implications from the high level model of an Information Package and apply it to PDI.  This only promotes communication difficulties as the past two telcons and intervening messages have demonstrated.  In fact, we should always try to avoid reusing the same terminology in different contexts where the meaning is different to some degree. That is certainly the case when wanting to apply Content Information to PDI.  In short, there is no need.  

 

I recommend we put a simple diagram of PDI recursion into Section 4.2 and briefly talk about limiting the recursion by what the Archive finds is practical in each case.

 

As a corollary, there is no need to remove PDI from being applied to Representation Information as is currently being proposed.  David mentioned that he finds most archives are not applying PDI to Representation Information, but from a conceptual model perspective this is no justification as I, for one, have given several examples where it is relevant.  If one want to see PDI on PDI made more explicit, then at least the possibility of PDI on Representation Information needs to be retained.  There is absolutely no good reason to remove it from the conceptual model now that (I Hope!) were not trying to apply the AIP to PDI.  There are instead very relevant reasons to keep it.

 

Cheers-

Don

 

 

 

 

_______________________________________________

MOIMS-DAI mailing list

 <mailto:MOIMS-DAI at mailman.ccsds.org> MOIMS-DAI at mailman.ccsds.org

 <https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai> https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai

_______________________________________________
MOIMS-DAI mailing list
 <mailto:MOIMS-DAI at mailman.ccsds.org> MOIMS-DAI at mailman.ccsds.org
 <https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai> https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai

 

_______________________________________________
MOIMS-DAI mailing list
 <mailto:MOIMS-DAI at mailman.ccsds.org> MOIMS-DAI at mailman.ccsds.org
 <https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai> https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ccsds.org/pipermail/moims-dai/attachments/20180612/fa86f305/attachment.html>


More information about the MOIMS-DAI mailing list