[Moims-dai] A Critique of draft OAIS RM version650x0w2x1JGG20181009.doc
John Garrett
garrett at his.com
Tue Oct 16 23:28:25 UTC 2018
Hi Don,
Thanks for taking the time and effort to put this critique together.
I must say that I (and I believe most of the DAI WG) agree fully with the vast majority of your points.
Probably the only significant difference I have is with your conclusion that harm we are doing with the changes we were decided to make.
I also think the points we are arguing are pretty detailed niches of our abstract understanding of archives and preserved information.
I’m not saying that we shouldn’t be interested in this, but I do think these distinctions may be beyond the understanding of many users of the OAIS standard.
Let me address your three topics – PDI, AIP fixity, and role of OAIS RM in Auditing.
I’ll skip PDI initially and come back to it.
AIP fixity – I totally agree with the value of adding the concept of fixity for the AIP. While talking to people over the years we have consistently mentioned this as important. Additionally our auditing practice has always been to ask about this and even to ask how they know the fixity value for the AIP (or CDO or Content Information) has not changed. Not only do I totally agree with AIP fixity, I would push to go even further. We should not only have AIP fixity, we do require that we have representation like info for the AIP (we do require that a description of the AIP format exist with sufficient detail that we know how to open it and get to its pieces).
We also require that we have provenance like info for the AIP (we do require that logs be kept of the changes made to an AIP including when they were made and like to see who made the changes, and why the changes were made, etc.)
We do require reference like information for the AIP, i.e. one or more names or designations for it to identify it and likely to use it to allow us to retrieve it).
We also should be looking at access rights information for the overall AIP. As examples, there may be differences in who you will allow to see (or who you are licensed to show) the CDO as opposed to the documents that make up the Representation Information Network, or the logs showing who and when and why changes were made or perhaps alternate fixity information
So a question now (and that I expect to come back to later) is what objects could or should have representation-like info/Reference-like info/Provenance-like info/Context-like info/Fixity-like info/access rights-like info? Should it be available for the CDO? For the RepInfo? For PDI objects? For the AIP as a whole? My answer is yes to all of these.
Role of OAIS RM in Auditing – Again I totally agree with what I see are your points. OAIS should develop the concepts. Auditing should test whatever it decides to test, possibly based on the concepts and frameworks in OAIS.
Some theoretical questions. Can every concept be tested? If concept can’t be tested, is it really correct? If yes, is an untestable concept useful? Interesting questions to discuss over a beer or two, but should not trouble us here. Because we certainly could test if the PDI is available whether we define it as being applicable to the CDO or if we define it as being applicable to the Content Information.
So finally back to PDI - I totally agree with the value and importance of all the PDI (and the value and importance of RepInfo).
Now we are back to the question of what content it should be applied to. Should it be available for the CDO? For the RepInfo? For PDI objects? For the AIP as a whole? My answer is yes to all of these.
Perhaps this point should have been emphasized more in OAIS that PDI-like information is important at all those levels.
Now to another “How many angels can fit on the point of a pin” type of question. What is a more catastrophic loss, the loss of PDI information objects, the loss of RepInfo, or the loss of the CDO? In my personal view, the loss of the CDO is the worst. My reasoning is that once that is lost, there is no hope of ever getting the Information back. If RepInfo is lost, in my view, it is possible (though not necessarily likely) that the information can be recovered (but process to recover it may be very costly or may involve incredible luck). But it is possible. We did discover/redevelop the “RepInfo” for the Rosetta Stone after it was lost/forgotten. It was more important for the Rosetta Stone itself to not be destroyed than the RepInfo that detailed the fact that it contained the same text in different languages.
So all this results in acknowledging that PDI-like info is valuable at all levels.
So the discord comes down just what level of content we define the terms.
I personally don’t feel defining the PDI terms to refer to any particular level automatically makes application of those same concepts at any other level less important.
If we go back to the Report
content: It consists of multiple levels of abstraction. At the lowest level of abstraction, digital information objects consist of strings of 0’s and 1’s. At higher levels of abstraction there are the issues of characters and higher organizations of layout and structure, and ultimately including the knowledge or ideas they contain.
We could apply PDI-like info at the lowest level of abstraction mentioned “strings of 0’s and 1’s” roughly the as applying PDI-like info to CDO.
We could apply PDI-like info at the higher level of abstraction mentioned “the issues of characters” or “higher organizations of layout and structure“ either or both of which would roughly be as applying PDI-like info to Content Info.
We could apply PDI-like info at the highest mentioned level of abstraction “the knowledge or ideas they contain” which would roughly be as applying PDI-like info to perhaps the highest possible conception of Content Info or something even beyond Content Info.
Defining PDI to refer to any of these levels, does not make the value of PDI-like any more or less valuable or make the value of the item that the PDI-like info was applied to any more or less valuable.
Originally our definition of PDI elements made them applicable to Content Information.
We’ve changed that to make our definition of PDI elements applicable to the CDO only instead.
Making this change does not decrease the inherent value of the Content Info or the RepInfo portion of the Content Info. In my view, it is merely a definitional change that has limited effect.
There is value in not making the change as it maintain more consistency from issue to issue.
However there is also value in making the change.
Some of those advantages are:
• Most people (In my personal view) already think of PDI primarily in terms of the CDO. I think in general only those at least moderately well versed in OAIS really understand the distinction between CDO and Content Info.
• We’ve already established that the idea of applying RepInfo-like and PDI-like information to many different objects. In my personal view explaining the importance of this is eased by changing the definition to the content data portion of the object.
• Although this should not be a major motivator, I do think it eases auditing.
• I think most repositories now tend to apply PDI more often to the CDO than to the Content Info. And although this should not be a controlling concern, it should still count for something. And even when current repositories do apply the concept to the Content Info, they usually apply PDI individually to the CDO and then separately to each individual RepInfo network component. Again in my view having the PDI definition applicable to just the CDO maps more efficiently and clearly to this practice.
So to conclude, I believe either conceptual construction is equally valid and can be defended.
So it comes down to what in our collective experience do we feel is the best fit for how we use the OAIS suite of standards and how best to persuade the OAIS Standard user community to move in the best direction for long-term preservation.
It basically comes down to a decision that we collectively need to make.
In this case it comes down to an either-or decision about how we define the term. However as I often try to do, I try to view things not so much as an either-or but rather see things as a both-and. I do want to make the change to define the PDI terms in respect to the CDO, and I also want to emphasize the importance of PDI-like information at many different levels. I think I will find the use of PDI terms in relation to the CDO will make it easier to discuss the need for PDI-like info at both the CDO and the Content Info level (and indeed at many/all other levels). I think the change in definition will make it easier to emphasize the importance of the PDI-like info at both/all levels.
I hope this helped to clarify my/our intentions with this change.
Peace and joy,
-JOhn
From: D or C Sawyer
Sent: Tuesday, October 16, 2018 7:53 AM
To: MOIMS DAI List
Subject: [Moims-dai] A Critique of draft OAIS RM version650x0w2x1JGG20181009.doc
Dear All,
In this critique I address three topics for consideration: Preservation Description Information, AIP fixity, and the Role of OAIS RM in Auditing.
Preservation Description Information:
The most recent version of the draft revised OAIS RM still has the problem that it has removed a long standing preservation concept despite the concepts continuing to be of critical importance. This concept involves the roles that Fixity, Reference, Provenance, Context, and Rights Information (collectively called Preservation Description Information) play in supporting the long term preservation of the Content Information, including both the Content Data Object and the Representation Information. The removal of Representation Information from this concept in the current draft revision is an extradorinaiy development that is a major deviation not only from the original and long standing information modeling in OAIS, but also from the prior seminal 1996 report “Preserving Digital Preservation” by the 21 member Task Force on Archiving of Digital Information co-chaired by Don Waters and John Garrett (a different John Garrett than out CCSDS colleague). The report can be found at:
https://www.oclc.org/content/dam/research/activities/digpresstudy/final-report.pdf
The terminology and information object concepts presented in the above report (hereafter Report) played a major role in the development of our more formalized AIP concept. I believe it is useful to briefly review those information object concepts from the Report and to compare them with the original AIP concept. Key information objects from the Report, along with some highlights discussing them that I’ve extracted in very brief summaries, are given in italics below:
content: It consists of multiple levels of abstraction. At the lowest level of abstraction, digital information objects consist of strings of 0’s and 1’s. At higher levels of abstraction there are the issues of characters and higher organizations of layout and structure, and ultimately including the knowledge or ideas they contain.
>From this, and our collective experience, we formalized the concept of Content Information, in the digital case, as consisting of the digital Content Data Object (CDO) and its Representation Information (RepInfo). An understanding of the RepInfo and its application to the CDO is necessary to unlock the information inherent in the bits of the CDO. Corruption of bits in the CDO or corruption of bits or of understanding of the RepInfo results in corruption of the Content Information.
fixity: Addresses how the content is fixed as a discrete object.
>From this, and our collective experience, we formalized Fixity Information as a mechanism applicable to the Content Information. Since the CDO and RepInfo may each be composed of many separable digital objects, the concept implies that they all need the application of Fixity.
reference: Information objects must have a consistent source of reference. One must be able to locate it definitely and reliably over time. URLs and URNs are some examples given.
>From this, and our collective experience, we formalized the concept of Reference Information to be an identifier of the Content Information. Thus there may be any number of actual identifiers depending on the nature of the CDO and the RepInfo.
provenance: Provenance has become one of the central organizing concepts of modern archival science. The assumption underlying the principle of provenance is that the integrity of an information object is partly embodied in tracing from where it came. Digital archives must preserve a record of its origin and chain of custody, including within the archive itself. They note that the archival concern with provenance is intimately related to the notion of context as a matter of information integrity.
>From this, and our collective experience, we formalized the concept of Provenance Information as the information that documents the source and history of the Content Information. This is fully consistent with the traditional use of Provenance with non-digital materials and reflects the fact that both the digital CDO and the RepInfo are essential components whose source and history are equally relevant to assessing the authenticity of the resulting Content Information.
context: This addresses the ways in which information objects interact with elements in the wider digital environment. The Report see it as involving 4 dimensions: technical, linkage to other objects, communication, and a wider social dimension. The technical dimension is about hardware and software dependencies, the linkage dimension is concerned with information objects that have links to other objects and how to preserve this, the communications dimension is about how the influence of communication network features, such as physical media and bandwidth, will affect features of the digital objects, and finally the social dimension is about the purpose of the information objects such as whether they are intended, for example, for informal or formal communication of information.
>From this, and our collective experience, we formalized the concept of Context Information as the Information that documents the relationships of the Content Information to its environment. While this is very broad I believe we have treated their technical dimension using the perspectives of the RepInfo and hardware/software to display or otherwise present the Information. The linkage issue has been treated from the perspective of needing to clearly define the CDO and then the RepInfo so that the Content Information is clearly defined. I believe their issues of communications have been viewed in OAIS as external factors not directly affecting the Archive’s ability to preserve identified Content Information. Their wider social dimension seems most relevant as this understanding, or lack thereof, could significantly affect an understanding of the intent of the information.
The Report refers to this supporting information, collectively, as providing integrity to the preservation of the content information objects, and in our modeling that clearly includes the RepInfo. As OAIS is a conceptual model for the purpose of communication, there is no basis for removing RepInfo from the concept of needing the supporting information discussed above unless it is unimplementable, which surely is not the case unless one demands a perfect, fool proof, implementation. Its removal leaves a clear integrity hole in the OAIS concepts. Therefore I believe it would be wise to fill this hole by returning to the original concept with Preservation Description Information applicable to the Content Information.
AIP Fixity:
The current and the draft revision of the OAIS RM do not address the concept of applying fixity mechanisms to the AIP itself. I recommend augmenting the information modeling to recognize the utility of applying fixity mechanisms to the AIP as a whole, but not including the associated Descriptive Information. By inheritance his would facilitate communication about applying fixity to the Packaging Information, and to the components in the AIP that do not currently have associated fixity: Provenance Information, Reference Information, Context Information, Rights Information, and even to Fixity Information. I believe this would be widely understood as a conceptual improvement in the integrity of the AIP.
Role of OAIS RM in Auditing:
The advent of the ISO auditing process quite naturally makes use of the concepts and terminology in the OAIS RM. However it seems clear the proper place to address constraints on Archive implementations is in auditing documents, not in a conceptual communication model such as the OAIS RM. I believe the OAIS RM should continue as a conceptual framework by which to discuss preservation, and particularly digital preservation, issues and implementations that are seen to be of general interest. It can not be expected to address the details of specific implementations but should facilitate communication by allowing the context of a discussion to be narrowed.
For example, consider an Archive that will preserve a number of historical works of fiction expressed in a standard format such as PDF/A. Such as Archive may reasonably argue, for some such works, that it does not need to maintain additional Context Information for these documents because the Provenance Information already provides sufficient context (Provenance is a particular type of Context as the Report also recognized) and Consumers might easily find additional Context Information in the historical record, should it be desired. I believe an auditor would be wise to consider such a statement as a possible exception to providing explicit Context Information. In this example, a case is made that the auditing requirements should not be rigidly tied to every aspect of the conceptual communications framework. Some flexibility is needed to meet the vast range of implementation circumstances.
Further I believe the best way to ensure a robust future for the auditing effort is to ensure that each auditing requirement make clear what intent is to be achieved and that it be one that the Archive, and therefore the auditor, can see has value.
==========
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ccsds.org/pipermail/moims-dai/attachments/20181017/ff2d2938/attachment.html>
More information about the MOIMS-DAI
mailing list