[Moims-dai] Tuesday's marathon DAI telecon
david at giaretta.org
Sun Jun 3 20:10:38 UTC 2018
I think Don has fairly expressed my views - up to a point.
Don's alternative view (1) in a sense misses the major point and (2) makes things unnecessarily complicated by talking about AIPs within AIPs to describe my views.
For point (1) the issue is that in order to know that ANY piece of information is being preserved, following OAIS concepts, one needs to know what the Data Object is and what the associated Representation Information is as a start, and also one needs to know, if one wants to have evidence to support claims of Authenticity:
* How one can be sure the object has not been altered in an undocumented manner – otherwise how can we know what we are preserving – in other words we need Fixity Information
* How was the object created etc – in other words its Provenance
* How to locate the object i.e. Reference
* And so on for the other PDI elements
I hope this is generally agreed.
We would all agree that the NASA image should be in an AIP, which of course is a logical construct. Our divergence is only that they do not want to say that, for example, a piece of Provenance Information is preserved “in” an AIP.
However if, to preserve the Provenance one has the data object, its RepInfo, its Fixity Info etc etc – then I would say that just as, if something walks like a duck and quacks like a duck then it is a duck – in a similar way if we can, as we must be able to, point to the Provenance’s Data Object, RepInfo, Fixity, Reference, etc then we can, if we want to, say we have an AIP.
I prefer to use the term AIP, others might want to invent a new term and write a whole lot of extra text – but why bother?
There is no need for Don’s complicated view of AIPs in AIPs etc. All one need to do is to ask, for any particular piece of information, do we have all the things required for its preservation? If one does not, then we are missing something vital. If one does have everything needed then one is preserving that piece of information and, if we want to do so, we can say we have an AIP.
One does not have to draw more and more complex diagrams – for the same reason that we do not draw out Representation Information as a Data Object with its own RepInfo, instead we just talk about a Representation Information Network and leave it at that.
Just to repeat, one does not need to talk about AIPs in AIPs to draw my view. All one has to do is to ask how we know we have everything needed to preserve any particular thing.
This view has the added benefit that it will not require any changes, other than perhaps some explanatory text, to ISO 16363 and it allows audits to be conducted without needing new concepts.
While it is true, as Don says, that “Nowhere in the OAIS RM do we say that PDI should be treated as Content Information which would have its own PDI and its own Descriptive Information” – well sure, there are lots of things that OAIS does not explicitly say, otherwise it would be 10,000 pages long. The point is that we called it Preservation Description Information and its components are also pieces of Information. Thus, all the concepts of Information to apply, including Information Packages etc.
One error I would like to point out, since this was mentioned in an earlier email and I let it pass then, is that he writes “associate Representation Information (assuming the Data Object is digital)”. This is an incorrect view of the Information Model – just look at Fig 4-10 – the diagram shows that both the Physical Object as well as the Digital Objects are sub-types of Data Objects, and anything that is a Data Object has Representation Information. As a specific example, imagine that I write “01001110 01001101 01010001 01001101 01010000 01001010 00100000 00100000” on a piece of paper. This is a physical object which is a Data Object which encodes some information. It needs Representation Information which is that what I have written is encoded as eight 7-bit ASCII characters etc. This view also, by the way, allows a consistent view of information encoded on other physical media such as tapes, labels on tapes, disks, kanji characters etched on Silicon Carbide plates or images of “1”s and “0” on photographic film, if needed.
In terms of Don’s conclusions, I have no disagreement except to add the recommendation that we make the change agreed in SC222 and add a small amount of explanatory text after Fig 4-12 about the appropriate Designated Community(ies) and perhaps a few examples. This may then require that a small amount of additional explanation be inserted into ISO 16 363.
From: MOIMS-DAI <moims-dai-bounces at mailman.ccsds.org> On Behalf Of D or C Sawyer
Sent: 30 May 2018 14:24
To: MOIMS DAI List <moims-dai at mailman.ccsds.org>
Subject: [Moims-dai] Tuesday's marathon DAI telecon
I found Tuesday's marathon DAI telecon (over 2 hours) to be very interesting on at least 2 levels. The first is the recognition that despite working on the OAIS RM from its inception, David and I had developed a significant difference in how we interpreted certain inferences which could be implied from the written word. We made different projections about things that were not explicitly stated. That this is possible, despite long histories of communication and working on past revisions of OAIS, should serve to remind us how difficult clear communication can be.
I’m going to try to fairly characterize both points of view as the issue was not resolved during the telecon. The participants were Terry, Mark, John, David, myself (Don). Terry had to depart after about 1.5 hours.
The issue under discussion was triggered by my objections to dropping Reference Information from having PDI directly associated as shown in OAIS RM section 2.2, figure 2-3. Responding to a point about PDI that David made, I pointed out that PDI doesn’t have its own PDI as is shown in Figure 2-3. However David responded that yes, PDI does have PDI, because by the process of recursion PDI is information that clearly the Archive wants to preserve and thus it should be viewed as information that conceptually should be taken as the Content Information of an Archival Information Package. The Information Package (IP) model is given in Figure 2-3 and is fundamental to being a conforming OAIS. When all its components are present, it is called an AIP and is the preservable entity focused on in the OAIS RM. It turned out that John and Mark viewed it as I did. I’m not sure about Terry as he left before I was clear on how he felt. We discussed some pros and cons of these conflicting views with no resolution.
I believe David’s approach is to recognize that PDI associated with, say an image from NASA, will be put together by the Archive and the Archive needs to preserve it. Since the Archive is required to associate Representation Information (assuming the Data Object is digital), and to associate PDI with the Content Information (current OAIS), in order to preserve the NASA image information, therefore it should do the same with the PDI that it has prepared. After all, if this is required for preservation of the Image information, it is logical to argue it should also be required for preservation of the PDI information that has been created or prepared. Further, unless this is done in some way, the PDI that the Archive is preserving may not be reliable (for any number of reasons) and this will impact the Authenticity view of the Image information as the Archive will not be in a position to validate that the original PDI came, for example, from a good source and/or that it has not been corrupted in some way. Therefore he argues that it makes sense to treat the image PDI as Content Information in its own Information Package (actually an AIP for preservation) and thus one see that this PDI/Content Information must have its own PDI. He did note that one issue with this approach is how to define the Designated Community. In response to the objection that PDI having PDI starts a potentially endless recursion, he believes that in practice it will be cut off by some very simple or perhaps general Archive policy that can be readily applied. He feels that it is important for OAIS implementers to be aware and concerned about what I might call this second level (beyond the NASA Image information) information preservation. Presumably how many levels this is taken will be Archive dependent. There is logic to this view, of course, or David wouldn’t hold it. If I’ve mischaracterized David’s view I apologize and I’m sure he will correct this.
Neither Mark, John, nor I have the view that PDI should be considered to be Content Information despite clearly needing to be preserved within the OAIS. I’m going to give my views which may differ in some details from those of Mark or John, or they may not. I have long recognized, or course, that PDI needs to be preserved within the OAIS. I’ve also recognized that there can be some interest in the origin and history of PDI, but I’ve not wanted to enter into the recursion explicitly because I’ve felt this is getting too complicated. It has seemed sufficiently difficult to get OAIS readers to understand and adopt the Information Package as an AIP without entering into an explicit recursion discussion. In this sense, my personal view has been that the Archive will address this with some best practices to ensure that PDI is properly preserved and can be relied upon for Authenticity concerns. For example, the Context Information and Provenance Information can address themselves as well as the NASA image, when the Archive thinks their own Context and Provenance is important. The Archive can maintain this in a controlled data base to guard against corruption. In my view it has not been necessary to formerly view preserving PDI in a recursive manner and to do so brings up a number of issues. Many of these were discussed during the telecon.
David stated that his view is supported by, or at least not inconsistent with, the wording in OAIS. That may or may not be fully true, but I think the bigger issue is what should we be communicating that will assist readers and implementers of an OAIS? Clearly what we say needs to create a whole view that is logical and understandable. We can not take out some parts for examination without considering the overall context that is the OAIS RM. Another way of saying this is that most readers, ‘reading between the lines’, should come to nearly the same view of what is meant as best we can arrange this. It seems clear to me that we need to reach a consensus on this issue as it is having an impact on assumptions and leading to disagreements in other areas of discussion. At the very least, I believe we need to address this explicitly in OAIS to remove this ambiguity. For me, the question is what should we say.
David’s approach raises some issue for me, as follows:
1. As the OAIS RM now stands, the overall view of OAIS is that SIPs are ingested, AIPs are formulated from the SIPs for preservation, Descriptive Information is associated with the AIPs to facilitate Consumer searches for the information (see Figure 2-3) of interest, and DIPs are created from the AIPs and provided to Consumers. This is the high level view of OAIS given in Section 2. Nowhere in the OAIS RM do we say that PDI should be treated as Content Information which would have its own PDI and its own Descriptive Information. When PDI is digital, we acknowledge that it will have explicit Representation Information. However I doubt very few, if any, readers of the OAIS RM come to the view that PDI should become a new Content Information in its own AIP. Of course we could change this, but should we?
2. If we want to promote the view that PDI, and for consistency all other information (e.g. Representation Information components) that OAIS may be preserving should be treated as its own conceptual AIP, then to be consistent we need to show that the top level AIP contains a Content Data Object, any number of Representation data objects, and any number of PDI data objects. Each of these data objects would need to be shown as its own AIP. The top level AIP containing multiple internal AIPs would be the minimum to be shown but at least one more level of breakdown would be needed or there would be no point in going to the trouble of showing the internals as AIPs in their own right. Consistency is not achieved unless each data object is a Content Data Object in an AIP. Perhaps all this nesting of AIPs in the original AIP could be shown in a reasonable way, but I think it would look quite complex. I think it would make the Representation Network look simple by comparison, if only because it would also contain such networks. In any event, I think the Descriptive Information aspect of the internal AIPs would need to be foregone, at least in most cases, as I don’t see the utility in promoting Consumer search information for second and third levels of PDI. However it could be useful internally depending on implementation. In any event, if only for clarity, I think lower level AIPs would probably need to be identified as a special type of IP, much as the SIP and DIP are specialized. I think Representation data object IPs would need a special name as would a PDI data object IP. However the fixity part of PDI would probably need to have its own IP as it is quite different than the other components which might be seen as a single data object. Actual usage seems very much case specific. I’ve tried to be as accommodating to my understanding of David’s view as I can. It would have to be fleshed out with appropriate figures and text to convince me it was workable, and I believe it would have to start in Section 2 and expanded upon in Section 4.
David brought up the issue of how a Designated Community would be assigned to address who would need to understand the original PDI, and then the PDI of the PDI, possibly etc. I believe the original PDI needs to be understood by the Designated Community for the original Content Information (e.g., NASA Image) as, for example, the Context and Provenance can have a significant impact on the understanding that is conveyed by the Content Information - at least in some cases like experimental observation data. The PDI (on this PDI) Designated Community would probably be Archive staff with a certain natural language ability. This does not seem like a big issue to me.
3. I find that nested AIPs (or nested specialized IPs) raises the abstraction level of the information modeling and may work against continuing wide adoption. It would certainly cause a significant revision of many figures and much text. At this point I do not see that it would be a net benefit to the clarity of the OAIS RM or its adoption.
Barring a clear demonstration that my concerns above are not valid, I think the best approach is to continue with the use of Figure 2-3 in its current OAIS form, together with a discussion in Section 4 addressing how an OAIS might go about preserving PDI while maintaining reasonable Authenticity. There is, of course, nothing to prevent an OAIS from implementing a somewhat recursive approach to data object preservation as long as it can be conceptualized as shown in Figure 2-3. After all, it is data objects that are being preserved but there are mandatory special types with mandatory relationships along with no required implementation mechanisms. For me, it comes down to what is the best way to continue communicating preservation perspectives recognizing a very broad base of current OAIS RM adopters as well as supporting future adopters with increased clarity.
= = = =
MOIMS-DAI mailing list
<mailto:MOIMS-DAI at mailman.ccsds.org> MOIMS-DAI at mailman.ccsds.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the MOIMS-DAI