[Moims-dai] A Critique of draft OAIS RMversion650x0w2x1JGG20181009.doc

John Garrett garrett at his.com
Wed Oct 17 20:47:51 UTC 2018

Comments on your comments below.

From: D or C Sawyer
Sent: Wednesday, October 17, 2018 6:14 AM
Subject: Re: [Moims-dai] A Critique of draft OAIS RMversion650x0w2x1JGG20181009.doc

Hi John,

Thanks for the response.  I do have a couple questions and several comments below.

On Oct 16, 2018, at 7:28 PM, John Garrett <garrett at his.com> wrote:

Hi Don,
Thanks for taking the time and effort to put this critique together.
I must say that I (and I believe most of the DAI WG) agree fully with the vast majority of your points.
Probably the only significant difference I have is with your conclusion that harm we are doing with the changes we were decided to make.
I also think the points we are arguing are pretty detailed niches of our abstract understanding of archives and preserved information.
I’m not saying that we shouldn’t be interested in this, but I do think these distinctions may be beyond the understanding of many users of the OAIS standard.
Let me address your three topics – PDI, AIP fixity, and role of OAIS RM in Auditing.
I’ll skip PDI initially and come back to it.
AIP fixity – I totally agree with the value of adding the concept of fixity for the AIP.  While talking to people over the years we have consistently mentioned this as important.  Additionally our auditing practice has always been to ask about this and even to ask how they know the fixity value for the AIP (or CDO or Content Information) has not changed. Not only do I totally agree with AIP fixity, I would push to go even further.  We should not only have AIP fixity, we do require that we have representation like info for the AIP (we do require that a description of the AIP format exist with sufficient detail that we know how to open it and get to its pieces).
We also require that we have provenance like info for the AIP (we do require that logs be kept of the  changes made to an AIP including when they were made and like to see who made the changes, and why the changes were made, etc.)
We do require reference like information for the AIP, i.e. one or more names or designations for it to identify it and likely to use it to allow us to retrieve it).
We also should be looking at access rights information for the overall AIP.  As examples, there may be differences in who you will allow to see (or who you are licensed to show) the CDO as opposed to the documents that make up the Representation Information Network, or the logs showing who and when and why changes were made or perhaps alternate fixity information
So a question now (and that I expect to come back to later) is what objects could or should have representation-like info/Reference-like info/Provenance-like info/Context-like info/Fixity-like info/access rights-like info?  Should it be available for the CDO?  For the RepInfo? For PDI objects? For the AIP as a whole?   My answer is yes to all of these.

When you say ‘should be available’, that is certainly true for an implementation as there is no restriction to say otherwise.  I can argue that some of that is not practical for most archives, but that is a different concern and not relevant at this point.  The question is, when one is using the OAIS terms and concepts, in an unqualified way, what concepts are available?  The proposed revision does not recognize applying PDI components to anything other than the CDO as stated in their definitions. Therefore other applications of PDI are not available using unqualified OAIS terminology.  Additional definitions and discussion would be needed.

JGG: Agree we PDI-like info at all those levels is often not practical.  But it is conceptually possible.
We could add a few sentences someplace or several places in OAIS RM to make it clear that PDI-like info can be applied to other things than CDO.
You argue that change means you can’t talk about PDI-like information in an unqualified way for anything other than CDO.
Similar argument could be made that you can’t talk about PDI-like information in an unqualified way for anything other than Content Information (including talking about PDI-like information for the CDO).
So it comes down to a decision about which one we prefer to talk about in an unqualified way.
I think it is more important and more useful to be able to talk about PDI related to CDO. 

Role of OAIS RM in Auditing – Again I totally agree with what I see are your points.  OAIS should develop the concepts.  Auditing should test whatever it decides to test, possibly based on the concepts and frameworks in OAIS. 
Some theoretical questions.  Can every concept be tested? If concept can’t be tested, is it really correct? If yes, is an untestable concept useful?  Interesting questions to discuss over a beer or two, but should not trouble us here.  Because we certainly could test if the PDI is available whether we define it as being applicable to the CDO or if we define it as being applicable to the Content Information.
So finally back to PDI -  I totally agree with the value and importance of all the PDI (and the value and importance of RepInfo).
Now we are back to the question of what content it should be applied to.  Should it be available for the CDO?  For the RepInfo? For PDI objects? For the AIP as a whole?   My answer is yes to all of these.
Perhaps this point should have been emphasized more in OAIS that PDI-like information is important at all those levels.

You suggest it should be emphasized more, but the definitions of PDI rule it out. The resulting information model is very clear on this.  An implementer could, in communicating with others, state that although Provenance is (in the revision) only defined with respect to the CDO, our particular archive also applies it to our RepInfo.  This would be using the revised OAIS in a qualified way - talking about the reality of an Archive by extending the revised OAIS concepts.  

So if you think PDI should be applicable to RepInfo, why have you supported removing this from the OAIS formal model?  In the current OAIS, it was a given of the AIP.  With the revision it has to be addressed as an extension.  This is of course true for all the PDI components which are now not associated with the RepInfo.  I have a major objection to this. As for your view that you think PDI ‘should be available’ to PDI objects, it is hard to discern your context. Any Archive can do this to whatever extent it wants.  Auditors could insist on it, although I would not recommend it (apart from fixity in all cases and provenance sometimes; I use lower case because Fixity and Provenance are only defined with respect to the CDO in the revision and with respect to the Content Information in the current publication). When I think about it in detail I come to the conclusion that PDI on PDI is generally not worth the effort - but that is a judgement call and some Archives would certainly reach other conclusions.  The OAIS document does not address this possibility (unless I missed it in the text somewhere) and it is certainly not included in the AIP model.  (David has argued that if one thinks of the PDI components, or any component, as Content Information, or CDOs, then one could argue that they would have associated PDI. But OAIS does not conflate Content Information with PDI components or RepInfo components  and if it did it would make clear communication a nightmare.)

I think a useful way to see the issue is to consider how an implementor would use the OAIS terms and concepts in a written document to describe an implementation to others familiar with the OAIS terms and concepts.  All aspects not covered by the OAIS, as commonly understood, would need to be augmented by additional terminology and discussion. With the revision, that includes any application of PDI to other than the CDO. To be really clear, using the proposed revision, this should include reserving upper case Provenance, for example, only when talking about the CDO.  Lower case provenance is not defined in OAIS but this would need be covered in the expanded discussion of its application to RepInfo, for example.  Of course this is not needed with the current version of OAIS because, as noted, PDI application to RepInfo has long been a covered concept.

JGG: Again it comes down to a choice of what we want to use in an unqualified way.  By your arguments, we currently cannot talk about Fixity Information for the CDO, since the term only applies to Content Information (as a whole).  And according to your argument in the current state, we cannot talk about Reference Information for AIP since it is something other than the Content Information.    So in either case to be strictly correct we need to come up with appropriate conventions to talk about applying PDI to other things.  One way is the use the “PDI-like” construct as I have been doing.  Another think is that in current issue, we were fairly lax with for example using “Fixity”  many places rather than saying “Fixity Information”.  I’ve tried to  ensure that in all cases in draft issue to use “Fixity Information” when using it as a term and use the term only when talking about the CDO.  I think you could then also use “Fixiity Information-like” or fixity (lower case) to talk about fixity for other things that should also have fixity like the AIP.

 We could work on a whole expansion of all these terms, but that needs to be moved to the next update (although we could start it immediately).  We need to finish this issue and get it out the door.  We are already almost 2 years past when we were supposed to finish the update.  Not having a new issue with the other updates is much more of a problem for community.

Now to another “How many angels can fit on the point of a pin” type of question.  What is a more catastrophic loss, the loss of PDI information objects, the loss of RepInfo, or the loss of the CDO?  In my personal view, the loss of the CDO is the worst.  My reasoning is that once that is lost, there is no hope of ever getting the Information back.  If RepInfo is lost, in my view, it is possible (though not necessarily likely) that the information can be recovered (but process to recover it may be very costly or may involve incredible luck).  But it is possible.  We did discover/redevelop the “RepInfo” for the Rosetta Stone after it was lost/forgotten.  It was more important for the Rosetta Stone itself to not be destroyed than the RepInfo that detailed the fact that it contained the same text in different languages.

I fail to see a context in which the relative importance of CDO and RepInfo is significant. The integrity of the subject Information requires the integrity of the CDO and the integrity of the RepInfo. This is clearly supported by the Information modeling of the current OAIS but not in the proposed revision.  Actual implementations would likely vary widely in their approach to the integrity of these 2 components and a lot would depend on the nature of the Content Information and its RepInfo. It would surely be irresponsible for an Archive to base its approach to the integrity of these components on an assumption about the ease of doing ‘archeology’ on the CDO to try to recover the proposed RepInfo.

JGG: Agree both levels are important.  

So all this results in acknowledging that PDI-like info is valuable at all levels.

But it is not equally valuable at all levels or the recursion would be infinite (e.g., PDI on PDI on PDI etc).  So the reality is PDI like info should be applied as is practical. This is a view from the perspective of actual implementations. 

JGG: Agree.  I also think actual implementations focus PDI on the CDO.

So the discord comes down just what level of content we define the terms.
I personally don’t feel defining the PDI terms to refer to any particular level automatically makes application of those same concepts at any other level less important.

However this is not about personal feelings but about  the conceptual view that people will understand from reading the document, the issues that are presented as significant, and ultimately their ability to use those terms and concepts in a way that is consistent with the understanding that  others will also derive from the document.

JGG: I agree that it is not about my personal feelings.  I was attempting to express my belief about perceptions of the OAIS RM community.  

If we go back to the Report 
content:  It consists of multiple levels of abstraction. At the lowest level of abstraction, digital information objects consist of strings of 0’s and 1’s. At higher levels of abstraction there are the issues of characters and higher organizations of layout and structure, and ultimately including the knowledge or ideas they contain.  
We could apply PDI-like info at the lowest level of abstraction mentioned “strings of 0’s and 1’s” roughly the as applying PDI-like info  to CDO.
We could apply PDI-like info at the higher level of abstraction mentioned “the issues of characters” or “higher organizations of layout and structure“ either or both of which would roughly be as applying PDI-like info  to Content Info.
We could apply PDI-like info at the highest mentioned level of abstraction “the knowledge or ideas they contain” which would roughly be as applying PDI-like info  to perhaps the highest possible conception of Content Info or something even beyond Content Info.
Defining PDI to refer to any of these levels, does not make the value of PDI-like any more or less valuable or make the value of the item that the PDI-like info was applied to any more or less valuable.
Originally our definition of PDI elements made them applicable to Content Information.
We’ve changed that to make our definition of PDI elements applicable to the CDO only instead.
Making this change does not decrease the inherent value of the Content Info or the RepInfo portion of the Content Info.  In my view, it is merely a definitional change that has limited effect.

It removes RepInfo from the integrity provided by the PDI components, as a concept. It will suggest to readers that the integrity of the RepInfo is not something to be concerned about.  In contrast, the explicit formalism of RepInfo as a concept is something we all feel is very important.  As I’ve said before, I find downgrading its integrity at this point to be a huge step backward.

JGG: Again, we are not attempting to lessen the need for integrity of the RepInfo by the redefinition as applicable to the CDO.  If that were true, the current definition of PDI as applicable to the Content Information suggests  to the reader that the integrity of the AIP is not something to be concerned about.

There is value in not making the change as it maintain more consistency from issue to issue.
However there is also value in making the change.  
Some of those advantages are:
• Most people (In my personal view) already think of PDI primarily in terms of the CDO.  I think in general only those at least moderately well versed in OAIS really understand the distinction between CDO and Content Info.
• We’ve already established that the idea of applying RepInfo-like and PDI-like information to many different objects.  In my personal view explaining the importance of this is eased by changing the definition to the content data portion of the object.
• Although this should not be a major motivator, I do think it eases auditing.
• I think most repositories now tend to apply PDI more often to the CDO than to the Content Info.  And although this should not be a controlling concern, it should still count for something.  And even when current repositories do apply the concept to the Content Info, they usually apply PDI individually to the CDO and then separately to each individual RepInfo network component.  Again in my view having the PDI definition applicable to just the CDO maps more efficiently and clearly to this practice.
So to conclude, I believe either conceptual construction is equally valid and can be defended.
So it comes down to what in our collective experience do we feel is the best fit for how we use the OAIS suite of standards and how best to persuade the OAIS Standard user community to move in the best direction for long-term preservation.
It basically comes down to a decision that we collectively need to make.  
In this case it comes down to an either-or decision about how we define the term.  However as I often try to do, I try to view things not so much as an either-or but rather see things as a both-and.  I do want to make the change to define the PDI terms in respect to the CDO, and I also want to emphasize the importance of PDI-like information at many different levels.  I think I will find the use of PDI terms in relation to the CDO will make it easier to discuss the need for PDI-like info at both the CDO and the Content Info level (and indeed at many/all other levels).  I think the change in definition will make it easier to emphasize the importance of the PDI-like info at both/all levels.

I will only add a couple points to the rest of the comments above.  What some implementations are doing can hardly be considered the basis for what they should be doing and what auditors should be addressing. There should be no reason that the auditing requirements have to be tied to all the details of the OAIS conceptual model. The model is conceptual and for clear communication and it has not been developed as an implementation architecture.   Using limited auditing experience to constrain a very valid preservation concept does not seem wise and is not something I can support.  It suggests an effort, conscious or not, to turn it into an implementation architecture. 

JGG: I agree that what implementations are doing should not be the only factor in what is conceptualized.  However, it could be considered as a factor, especially since it provides a clue to how implementers understand the concepts.

I can accept either way the group decides it would like to go with the definitions. As discussed, I think the definition in regards to the CDO would be better conceptually and would be more useful to the community (and to the auditor portion of the community).

;^) And personally, I would not like to have to back out all the changes that I already made to implement the decision the group previously made to make the change.
Best Wishes,



I hope this helped to clarify my/our intentions with this change.

Peace and joy,
From: D or C Sawyer
Sent: Tuesday, October 16, 2018 7:53 AM
Subject: [Moims-dai] A Critique of draft OAIS RM version650x0w2x1JGG20181009.doc
Dear All,
In this critique I address three topics for consideration: Preservation Description Information, AIP fixity, and the Role of OAIS RM in Auditing.
Preservation Description Information:
The most recent version of the draft revised OAIS RM still has the problem that it has removed a long standing preservation concept despite the concepts continuing to be of critical importance.  This concept involves the roles that Fixity, Reference, Provenance, Context, and Rights Information (collectively called Preservation Description Information) play in supporting the long term preservation of the Content Information, including both the Content Data Object and the Representation Information. The removal of Representation Information from this concept in the current draft revision is an extradorinaiy development that is a major deviation not only from the original and long standing information modeling in OAIS, but also from the prior seminal 1996 report “Preserving Digital Preservation” by the 21 member Task Force on Archiving of Digital Information co-chaired by Don Waters and John Garrett (a different John Garrett than out CCSDS colleague). The report can be found at:  
The terminology and information object concepts presented in the above report (hereafter Report) played a major role in the development of our more formalized AIP concept.  I believe it is useful to briefly review those information object concepts from the Report and to compare them with the original AIP concept.  Key information objects from the Report, along with some highlights discussing them that I’ve extracted in very brief summaries, are given in italics below:
content:  It consists of multiple levels of abstraction. At the lowest level of abstraction, digital information objects consist of strings of 0’s and 1’s. At higher levels of abstraction there are the issues of characters and higher organizations of layout and structure, and ultimately including the knowledge or ideas they contain.  
>From this, and our collective experience, we formalized the concept of Content Information, in the digital case, as consisting of the digital Content Data Object (CDO) and its Representation Information (RepInfo). An understanding of the RepInfo and its application to the CDO is necessary to unlock the information inherent in the bits of the CDO. Corruption of bits in the CDO or corruption of bits or of understanding of the RepInfo results in corruption of the Content Information.
fixity: Addresses how the content is fixed as a discrete object.
>From this, and our collective experience, we formalized Fixity Information as a mechanism applicable to the Content Information. Since the CDO and RepInfo may each be composed of many separable digital objects, the concept implies that they all need the application of Fixity.
reference: Information objects must have a consistent source of reference. One must be able to locate it definitely and reliably over time. URLs and URNs are some examples given.
>From this, and our collective experience, we formalized the concept of Reference Information to be an identifier of the Content Information. Thus there may be any number of actual identifiers depending on the nature of the CDO and the RepInfo.
provenance: Provenance has become one of the central organizing concepts of modern archival science.  The assumption underlying the principle of provenance is that the integrity of an information object is partly embodied in tracing from where it came.  Digital archives must preserve a record of its origin and chain of custody, including within the archive itself.  They note that the archival concern with provenance is intimately related to the notion of context as a matter of information integrity.
>From this, and our collective experience, we formalized the concept of Provenance Information as the information that documents the source and history of the Content Information.  This is fully consistent with the traditional use of Provenance with non-digital materials and reflects the fact that both the digital CDO and the RepInfo are essential components whose source and history are equally relevant to assessing the authenticity of the resulting Content Information.
context:  This addresses the ways in which information objects interact with elements in the wider digital environment. The Report see it as involving 4 dimensions: technical, linkage to other objects, communication, and a wider social dimension.  The technical dimension is about hardware and software dependencies, the linkage dimension is concerned with information objects that have links to other objects and how to preserve this, the communications dimension is about how the influence of communication network features, such as physical media and bandwidth, will affect features of the digital objects, and finally the social dimension is about the purpose of the information objects such as whether they are intended, for  example, for informal or formal communication of information.
>From this, and our collective experience, we formalized the concept of Context Information as the Information that documents the relationships of the Content Information to its environment. While this is very broad I believe we have treated their technical dimension using the perspectives of the RepInfo and hardware/software to display or otherwise present the Information.  The linkage issue has been treated from the perspective of needing to clearly define the CDO and then the RepInfo so that the Content Information is clearly defined. I believe their issues of communications have been viewed in OAIS as external factors not directly affecting  the Archive’s ability to preserve identified Content Information.  Their wider social dimension seems most relevant as this understanding, or lack thereof, could significantly affect an understanding of the intent of the information.
The Report refers to this supporting information, collectively, as providing integrity to the preservation of the content information objects, and in our modeling that clearly includes the RepInfo.  As OAIS is a conceptual model for the purpose of communication, there is no basis for removing RepInfo from the concept of needing the supporting information discussed above unless it is unimplementable, which surely is not the case unless one demands a perfect, fool proof, implementation.  Its removal leaves a clear integrity hole in the OAIS concepts. Therefore I believe it would be wise to fill this hole by returning to the original concept with Preservation Description Information applicable to the Content Information.
AIP Fixity:
The current and the draft revision of the OAIS RM do not address the concept of applying fixity mechanisms to the AIP itself. I recommend augmenting the information modeling to recognize the utility of applying fixity mechanisms to the AIP as a whole, but not including the associated Descriptive Information. By inheritance his would facilitate communication about applying fixity to the Packaging Information, and to the components in the AIP that do not currently have associated fixity: Provenance Information, Reference Information, Context Information, Rights Information, and even to Fixity Information. I believe this would be widely understood as a conceptual improvement in the integrity of the AIP.
Role of OAIS RM in Auditing:
The advent of the ISO auditing process quite naturally makes use of the concepts and terminology in the OAIS RM.  However it seems clear the proper place to address constraints on Archive implementations is in auditing documents, not in a conceptual communication model such as the OAIS RM.  I believe the OAIS RM should continue as a conceptual framework by which to discuss preservation, and particularly digital preservation, issues and implementations that are seen to be of general interest. It can not be expected to address the details of specific implementations but should facilitate communication by allowing the context of a discussion to be narrowed.  
For example, consider an Archive that will preserve a number of historical works of fiction expressed in a standard format such as PDF/A. Such as Archive may reasonably argue, for some such works, that it does not need to maintain additional Context Information for these documents because the Provenance Information already provides sufficient context (Provenance is a particular type of Context as the Report also recognized) and Consumers might easily find additional Context Information in the historical record, should it be desired. I believe an auditor would be wise to consider such a statement as a possible exception to providing explicit Context Information.  In this example, a case is made that the auditing requirements should not be rigidly tied to every aspect of the conceptual communications framework.  Some flexibility is needed to meet the vast range of implementation circumstances.
Further I believe the best way to ensure a robust future for the auditing effort is to ensure that each auditing requirement make clear what intent is to be achieved and that it be one that the Archive, and therefore the auditor, can see has value.  
MOIMS-DAI mailing list
MOIMS-DAI at mailman.ccsds.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ccsds.org/pipermail/moims-dai/attachments/20181017/a792294c/attachment.html>

More information about the MOIMS-DAI mailing list