[Moims-dai] Tuesday's marathon DAI telecon

David Giaretta david at giaretta.org
Tue Jun 5 11:54:26 UTC 2018

Dear all


Maybe there are two main points of clarification needed.

1.	OAIS defines Content Information as the “original target for preservation”. Don claims that “It is definitely not designed to apply to ANY information in the Archive to be preserved.“ I cannot see where OAIS says this – and, to quote Don, “were it the case, it certainly would have been stated”. I guess in support of his claim Don states “This is the information provided by (or on behalf of) the Producer” but since the Producer is defined as “The role played by those persons or client systems that provide the information to be preserved. This can include other OAISes or internal OAIS persons or systems” clearly the OAIS itself can provide the information to be preserved. Therefore, it seems clear that any information an OAIS says it is preserving can be regarded as Content Information.

If this part of Don’s argument falls then the bulk of Don’s case falls.

By the way, I struck out the “or on behalf of” since that is a bit misleading – it would imply that the Producer could be the role of creator, which it is not. Of course, the same entity could both create the information and also be the Producer. I did this in part to make the point that we must stick to the definitions.

2.	There is an interesting point about the dividing line between what is Representation Information and what is in the Knowledge Base. OAIS carefully leaves this a bit fuzzy but defines the Designated Community (to be specified by the archive) to embody, so to speak, the dividing line. So when Don writes “we clarify that Representation Information does not include information in the human brain (who has taken this view?)” the answer is that OAIS already recognises this separation and no-one should ever have thought otherwise.

I should clarify that when I use the term “Representation Information” I tend to talk about the case that the Designated Community member’s Knowledge Base does not have everything needed to understand/use the Data Object and so some Representation Information would be needed to supplement that Knowledge Base. I thought that this approach makes the text easier to read, but certainly there will be cases where, if the Designated Community were defined differently, then no Representation Information would be needed. So when Don writes “I can read and understand a hardcopy Harry Potter book or a printed copy of the ASCII standard without the aid of additional information that maps the physical object into more meaningful concepts” he is right. But if the OAIS has defined the Designated Community as people who do not understand English then additional information, such as a dictionary, would be needed.


Don makes the case that looking at everything as AIPs is far too confusing. And yet somehow one does need to explain how an OAIS preserves the things it needs to preserve. The AIP is a logical construct and, in a sense, is just there to make sure that the OAIS has all the information needed to preserve what it claims to be preserving, hence my use of the “if it quacks like a duck…” argument.


Maybe I could be accused of being too legalistic in my reading of OAIS. On the other hand, if the standard is of any use it surely must be read in that way as far as possible; the authors certainly will not always be around to express their opinions! This is especially true since what is written in the OAIS standard must impact ISO 16363 on which audit and certification is based and so auditors should be able to ask “How do we know that the Provenance is being preserved” and should be able to make a judgement about the answer.  


A final point which I should make is that OAIS is a Reference Model. We have often argued about whether some particular wording is straying too far towards implementation issues and then we often change the wording to avoid that. On the other hand, we must be reasonably sure that what is in the reference model is potentially implementable. It seems to me that, for the most part, we have achieved that. We have left the things to do with implementation to ISO 16363 – at least in terms of guidance of how to judge a particular implementation. So the argument about where the recursion of PDI about PDI ends is one which must involve judgement and we must make sure that the standards provide adequate guidance.

While I agree that some further explanation will be needed, especially in ISO 16363, we do need to have a firm, logically consistent, basis provided by OAIS and one which can provide guidance in difficult cases. 

I claim that my viewpoint is entirely consistent with OAIS and requires very little additional text. I believe that Don’s view is incomplete, at odds with OAIS, and would require significant changes/additions to the standard.






From: MOIMS-DAI <moims-dai-bounces at mailman.ccsds.org> On Behalf Of D or C Sawyer
Sent: 05 June 2018 06:06
To: MOIMS DAI List <moims-dai at mailman.ccsds.org>
Subject: Re: [Moims-dai] Tuesday's marathon DAI telecon


Dear All,


After reviewing David’s response, I’ve reached two main conclusions:


1.  David’s view of recursively applying the AIP concept does not work and is not a valid alternative view.


2.  David’s view that ALL Data Objects must have Representation Information is not true.




My reading of David’s arguments, as contained in his first 6 paragraphs, is that it is simply a summary of his position that ANY piece of information being preserved in the Archive means that it should be conceived of, and preserved as, an AIP because this is how OAIS says information is to be preserved.  On the surface this has a simple sounding logic.  However the  AIP has been designed to support preservation of a particular type of information, called the Content Information, that is the original target of preservation.  This is the information provided by (or on behalf of) the Producer. It is definitely not designed to apply to ANY information in the Archive to be preserved.  When one says ANY, it also means EACH piece of information in the Archive to be preserved.  When one attempts to apply it to EACH pieces of information, it quickly devolves to the AIP being a shell ‘containing’ only a Data Object with relationships to other Data Object shells such as Representation Data Object and PDI Data Objects.  This complexity comes from the simple application of the concept to ANY (or to EACH) and is unavoidable based the position of reuse of the AIP. One can not support both the current AIP model and the statement that it applies to ANY or EACH piece of information that the Archive intends to preserve.


The AIP is an abstraction that hides some underlying complexity in order to highlight some important relationships.  It establishes a hierarchy: decide on the Content Data Object (not ANY Data Object), identify the Representation Information that applies to this Content Data Object (could be a complex, partially recursive, network of underlying Data Objects), then associate PDI with the conceptual package consisting of the Content Data Object and Representation Information.  What one can recognize, at this point, is that the PDI association can be expanded to appear as two different sets of PDI, one for the Content Data Object and one for the Representation Information.  In practice it may be useful to further break down the PDI application to each Representation Information Data Object in the network, and for some it may be very minimal or non-existent depending on Archive’s assessment of its need.  This is all subjective, as it must be in practice.  But the key is the concept that PDI is important for the Content Information.  At this point one can recognize that the PDI is itself composed of possibly multiple Data Objects, and each may have (David would say ‘must have’, but I’ll address this later) Representation Information. If it is felt to be necessary, some or all of the Data Objects of this Representation Information may be given PDI information.  The process can continue ever further down the hierarchy of networks.  All of this process is fully consistent with the AIP model given in section 2.2., but it would no longer work if PDI is only shown to be applicable to the Content Data Object as is currently being proposed.  Perhaps it would be useful to expand along the above lines in section 4.  It could even be given some level of modeling to show that it works.


Note that the above process involves ONLY ONE AIP, as it should, and yet it can cover the full range of preservation concerns David has raised.  There is no need to try to recursively apply the AIP concept to EACH information (Data Object), and in any event it doesn’t work until the AIP becomes just  a shell around the Data Object, and then there is no unique Content Data Object  - just Data Objects.  


To the point I made about the OAIS RM not ever stating that PDI should be treated as Content Information, David responds that there are lots of things that OAIS does not say. Obviously true, but it should be almost as obvious that were it the case, it certainly would have been stated.  In any event, it doesn’t work.


David also points out, and I acknowledge, that the following statement I made while relating my view of his position, is in error:


“Since the Archive is required to associate Representation Information (assuming the Data Object is digital), and to associate PDI with the Content Information (current OAIS), in order to preserve the NASA image information, therefore it should do the same with the PDI that it has prepared.”


The parenthetical should be removed as it is clearly wrong by virtue of the following OAIS RM logic:


    - Content Information is information that is the ORIGINAL TARGET of preservation; it is an Information Object consisting of its Content Data Object and its Representation Information.

    - Figure 4-10 is the model we provide for an Information Object


However what is now at issue is David’s statement that Figure 4-10, more generally, specifies that a Data Object MUST have Representation Information. What it really says is that an Information Object MUST have Representation Information, and that a Data Object is classified as either Physical or Digital.


Further, a Physical Object is defined as:


     - An object (such as a moon rock, bio-specimen, microscope slide) with physically observable properties that represent information that is considered suitable for being adequately documented for preservation, distribution, and independent usage.


Clearly there are Physical Objects that encode information and therefore need Representation Information for the decoding process.  An example is printed material containing coded messages from World War II.  However it does not follow that ALL Physical Objects must have Representation Information associated with them in order to ‘map a Data Object into more meaningful concepts’ (Representation Information definition).  I can read and understand a hardcopy Harry Potter book or a printed copy of the ASCII standard without the aid of additional information that maps the physical object into more meaningful concepts.


On the other hand, the OAIS RM does state that such hardcopy materials, mentioned above, can NOT be the ORIGINAL TARGET of preservation in an OAIS UNLESS they do have additional information to map the physical object into more meaningful concepts. For such materials I find it hard to identify additional information that could be added that realistically maps the physical object into more meaningful concepts.  While naming a Moon rock ‘a Moon rock’ could be considered mapping it to a more meaningful concept (it could also be considered Context Information), the same doesn’t hold true for the hardcopy documents which already carry their ‘names’.  I find this to be an OAIS RM weak point for the inclusion of Archives preserving a collection of hardcopy materials as the original target of preservation. 


At the same time, the OAIS RM does NOT say that physical objects that are NOT the original target of preservation must have Representation Information to map them into more meaningful concepts. This means that a hardcopy of the ASCII standard, which is normally considered to be Representation Information for some Digital Object, does not need additional Representation Information for 2 reasons:


  1.  It is not the ‘original target of preservation’, so it is NOT Content Information and thus does NOT have to be an Information Object (i.e., have Representation Information).


  2.  It can be understood by humans without any additional information to map it to more meaningful concepts.


I believe the above interpretation of the OAIS RM is one that most people, reading it for the first or second time, would agree with.  However I suspect David will be strongly objecting and might state that the better way to view the OAIS RM is to recognize the following:


   1.  While a human can read the ASCII standard and understand it, this is only because he/she already has the necessary Representation Information in his/her brain (base of knowledge).  So the Representation Information should be considered to be present, but in the brains of the readers.


   2.  If you just recognize that ALL information to be preserved by the Archive, not just that which is the original target of preservation, is really Content Information, then it follows that ALL Physical Objects being preserved within the Archive MUST have Representation Information. It is just that sometimes that Representation Information resides in the human brain.  So there is no problem for OAIS to include an Archive of old hardcopy manuscripts where there is no Representation Information.  



I believe, as stated, that my narrative above expresses the way that most people understand the OAIS RM based on reading the document.  They are led to think about Representation Information as either physical or digital, but always something that can be materialized for viewing by humans and that can be managed and preserved within the Archive.  They do not think of it as, alternatively, residing in human brains outside the Archive.  They also recognize that Content Information, as the ‘original target of preservation’, is a necessary construct to be able to distinguish information that is being submitted by a Producer for preservation from other information that is supporting and would not be present without the Content Information. They are not led to believe that ALL the information in the OAIS should be considered Content Information. The hierarchy is an important concept.


I recommend we clarify that Representation Information does not include information in the human brain (who has taken this view?) even though it plays a similar role in mapping the presentation of information to human senses into more meaningful concepts when that is the objective.  This allows the Archive to continue using the term ‘Representation Information’ for information that it must be concerned with acquiring, relating, managing, and preserving.  


I also recommend we add some material in section 4 explaining  how to hierarchically extend the existing AIP model to cover David’s concerns about properly preserving not only Representation Information but also PDI.







On Jun 3, 2018, at 4:10 PM, David Giaretta <david at giaretta.org <mailto:david at giaretta.org> > wrote:


Dear all


I think Don has fairly expressed my views - up to a point.


Don's alternative view (1) in a sense misses the major point and (2) makes things unnecessarily complicated by talking about AIPs within AIPs to describe my views.


For point (1) the issue is that in order to know that ANY piece of information is being preserved, following OAIS concepts, one needs to know what the Data Object is and what the associated Representation Information is as a start, and also one needs to know, if one wants to have evidence to support claims of Authenticity:

*	How one can be sure the object has not been altered in an undocumented manner – otherwise how can we know what we are preserving – in other words we need Fixity Information
*	How was the object created etc – in other words its Provenance
*	How to locate the object i.e. Reference
*	And so on for the other PDI elements


I hope this is generally agreed.

We would all agree that the NASA image should be in an AIP, which of course is a logical construct. Our divergence is only that they do not want to say that, for example, a piece of Provenance Information is preserved “in” an AIP. 


However if, to preserve the Provenance one has the data object, its RepInfo, its Fixity Info etc etc – then I would say that just as, if something walks like a duck and quacks like a duck then it is a duck – in a similar way if we can, as we must be able to, point to the Provenance’s Data Object, RepInfo, Fixity, Reference, etc then we can, if we want to, say we have an AIP. 


I prefer to use the term AIP, others might want to invent a new term and write a whole lot of extra text – but why bother?


There is no need for Don’s complicated view of AIPs in AIPs etc. All one need to do is to ask, for any particular piece of information, do we have all the things required for its preservation? If one does not, then we are missing something vital. If one does have everything needed then one is preserving that piece of information and, if we want to do so, we can say we have an AIP. 

One does not have to draw more and more complex diagrams – for the same reason that we do not draw out Representation Information as a Data Object with its own RepInfo, instead we just talk about a Representation Information Network and leave it at that.


Just to repeat, one does not need to talk about AIPs in AIPs to draw my view. All one has to do is to ask how we know we have everything needed to preserve any particular thing. 


This view has the added benefit that it will not require any changes, other than perhaps some explanatory text, to ISO 16363 and it allows audits to be conducted without needing new concepts.  


While it is true, as Don says, that “Nowhere in the OAIS RM do we say that PDI should be treated as Content Information which would have its own PDI and its own Descriptive Information” – well sure, there are lots of things that OAIS does not explicitly say, otherwise it would be 10,000 pages long. The point is that we called it Preservation Description Information and its components are also pieces of Information. Thus, all the concepts of Information to apply, including Information Packages etc.


One error I would like to point out, since this was mentioned in an earlier email and I let it pass then, is that he writes “associate Representation Information (assuming the Data Object is digital)”. This is an incorrect view of the Information Model – just look at Fig 4-10 – the diagram shows that both the Physical Object as well as the Digital Objects are sub-types of Data Objects, and anything that is a Data Object has Representation Information. As a specific example, imagine that I write “01001110 01001101 01010001 01001101 01010000 01001010 00100000 00100000” on a piece of paper. This is a physical object which is a Data Object which encodes some information. It needs Representation Information which is that what I have written is encoded as eight 7-bit ASCII characters etc. This view also, by the way, allows a consistent view of information encoded on other physical media such as tapes, labels on tapes, disks, kanji characters etched on Silicon Carbide plates or images of “1”s and “0” on photographic film, if needed.


In terms of Don’s conclusions, I have no disagreement except to add the recommendation that we make the change agreed in SC222 and add a small amount of explanatory text after Fig 4-12 about the appropriate Designated Community(ies) and perhaps a few examples. This may then require that a small amount of additional explanation be inserted into ISO 16 363.



Images of 



-----Original Message-----
From: MOIMS-DAI <moims-dai-bounces at mailman.ccsds.org <mailto:moims-dai-bounces at mailman.ccsds.org> > On Behalf Of D or C Sawyer
Sent: 30 May 2018 14:24
To: MOIMS DAI List <moims-dai at mailman.ccsds.org <mailto:moims-dai at mailman.ccsds.org> >
Subject: [Moims-dai] Tuesday's marathon DAI telecon




I found Tuesday's marathon DAI telecon (over 2 hours) to be very interesting on at least 2 levels.  The first is the recognition that despite working on the OAIS RM from its inception, David and I had developed a significant difference in how we interpreted certain inferences which could be implied from the written word. We made different projections about things that were not explicitly stated.  That this is possible, despite long histories of communication and working on past revisions of OAIS, should serve to remind us how difficult clear communication can be.


I’m going to try to fairly characterize both points of view as the issue was not resolved during the telecon.  The participants were Terry, Mark, John, David, myself (Don).  Terry had to depart after about 1.5 hours.  


The issue under discussion was triggered by my objections to dropping Reference Information from having PDI directly associated as shown in OAIS RM section 2.2, figure 2-3.  Responding to a point about PDI that David made, I pointed out that PDI doesn’t have its own PDI as is shown in Figure 2-3.  However David responded that yes, PDI does have PDI, because by the process of recursion PDI is information that clearly the Archive wants to preserve and thus it should be viewed as information that conceptually should be taken as the Content Information of an Archival Information Package.  The Information Package (IP) model is given in Figure 2-3 and is fundamental to being a conforming OAIS. When all its components are present, it is called an AIP and is the preservable entity focused on in the OAIS RM. It turned out that John and Mark viewed it as I did.  I’m not sure about Terry as he left before I was clear on how he felt.  We discussed some pros and cons of these conflicting views with no resolution.


I believe David’s approach is to recognize that PDI associated with, say an image from NASA, will be put together by the Archive and the Archive needs to preserve it.  Since the Archive is required to associate Representation Information (assuming the Data Object is digital), and to associate PDI with the Content Information (current OAIS), in order to preserve the NASA image information, therefore it should do the same with the PDI that it has prepared. After all, if this is required for preservation of the Image information, it is logical to argue it should also be required for preservation of the PDI information that has been created or prepared.  Further, unless this is done in some way, the PDI that the Archive is preserving may not be reliable (for any number of reasons) and this will impact the Authenticity view of the Image information as the Archive will not be in a position to validate that the original PDI came, for example, from a good source and/or that it has not been corrupted in some way.  Therefore he argues that it makes sense to treat the image PDI as Content Information in its own Information Package (actually an AIP for preservation) and thus one see that this PDI/Content Information must have its own PDI.  He did note that one issue with this approach is how to define the Designated Community.  In response to the objection that PDI having PDI starts a potentially endless recursion, he believes that in practice it will be cut off by some very simple or perhaps general Archive policy that can be readily applied. He feels that it is important for OAIS implementers to be aware and concerned about what I might call this second level (beyond the NASA Image information) information preservation.  Presumably how many levels this is taken will be Archive dependent. There is logic to this view, of course, or David wouldn’t hold it. If I’ve mischaracterized David’s view I apologize and I’m sure he will correct this.


Neither Mark, John, nor I have the view that PDI should be considered to be Content Information despite clearly needing to be preserved within the OAIS.  I’m going to give my views which may differ in some details from those of Mark or John, or they may  not.  I have long recognized, or course, that PDI needs to be preserved within the OAIS. I’ve also recognized that there can be some interest in the origin and history of PDI, but I’ve not wanted to enter into the recursion explicitly because I’ve felt this is getting too complicated. It has seemed sufficiently difficult to get OAIS readers to understand and adopt the Information Package as an AIP without entering into an explicit recursion discussion.  In this sense, my personal view has been that the Archive will address this with some best practices to ensure that PDI is properly preserved and can be relied upon for Authenticity concerns. For example, the Context Information and Provenance Information  can address themselves as well as the NASA image, when the Archive thinks their own Context and Provenance is important. The Archive can maintain this in a controlled data base to guard against corruption. In my view it has not been necessary to formerly view preserving PDI in a recursive manner and to do so brings up a number of issues.  Many of these were discussed during the telecon.


David stated that his view is supported by, or at least not inconsistent with, the wording in OAIS. That may or may not be fully true, but I think the bigger issue is what should we be communicating that will assist readers and implementers of an OAIS? Clearly what we say needs to create a whole view that is logical and understandable.  We can not take out some parts for examination without considering the overall context that is the OAIS RM.  Another way of saying this is that most readers, ‘reading between the lines’, should come to nearly the same view of what is meant as best we can arrange this.  It seems clear to me that we need to reach a consensus on this issue as it is having an impact on assumptions and leading to disagreements in other areas of discussion.  At the very least, I believe we need to address this explicitly in OAIS to remove this ambiguity.  For me, the question is what should we say.


David’s approach raises some issue for me, as follows:


1.  As the OAIS RM now stands, the overall view of OAIS is that SIPs are ingested, AIPs are formulated from the SIPs for preservation, Descriptive Information is associated with the AIPs to facilitate Consumer searches for the information (see Figure 2-3) of interest, and DIPs are created from the AIPs and provided to Consumers.  This is the high level view of OAIS given in Section 2.  Nowhere in the OAIS RM do we say that PDI should be treated as Content Information which would have its own PDI and its own Descriptive Information.  When PDI is digital, we acknowledge that it will have explicit Representation Information.  However I doubt very few, if any, readers of the OAIS RM come to the view that PDI should become a new Content Information in its own AIP.  Of course we could change this, but should we?


2.  If we want to promote the view that PDI, and for consistency all other information (e.g. Representation Information components) that OAIS may be preserving should be treated as its own conceptual AIP, then to be consistent we need to show that the top level AIP contains a Content Data Object, any number of Representation data objects, and any number of PDI data objects.  Each of these data objects would need to be shown as its own AIP.  The top level AIP containing multiple internal AIPs would be the minimum to be shown but at least one more level of breakdown would be needed or there would be no point in going to the trouble of showing the internals as AIPs in their own right.  Consistency is not achieved unless each data object is a Content Data Object in an AIP.  Perhaps all this nesting of AIPs in the original AIP could be shown in a reasonable way, but I think it would look quite complex.  I think it would make the Representation Network look simple by comparison, if only because it would also contain such networks.  In any event, I think the Descriptive Information aspect of the internal AIPs would need to be foregone, at least in most cases, as I don’t see the utility in promoting Consumer search information for second and third levels of PDI.  However it could be useful internally depending on implementation.  In any event, if only for clarity, I think lower level AIPs would probably need to be identified as a special type of IP, much as the SIP and DIP are specialized.  I think Representation data object IPs would need a special name as would a PDI data object IP.  However the fixity part of PDI would probably need to have its own IP as it is quite different than the other components which might be seen as a single data object.  Actual usage seems very much case specific.  I’ve tried to be as accommodating to my understanding of David’s view as I can. It would have to be fleshed out with appropriate figures and text to convince me it was workable, and  I believe it would have to start in Section 2 and expanded upon in Section 4. 


David brought up the issue of how a Designated Community would be assigned to address who would need to understand the original PDI, and then the PDI of the PDI, possibly etc.  I believe the original PDI needs to be understood by the Designated Community for the original Content Information (e.g., NASA Image) as, for example, the Context and Provenance can have a significant impact on the understanding that is conveyed by the Content Information - at least in some cases like experimental observation data. The PDI (on this PDI) Designated Community would probably be Archive staff with a certain natural language ability.  This does not seem like a big issue to me.


3.  I find that nested AIPs (or nested specialized IPs) raises the abstraction level of the information modeling and may work against continuing wide adoption.  It would certainly cause a significant revision of many figures and much text. At this point I do not see that it would be a net benefit to the clarity of the OAIS RM or its adoption.




Barring a clear demonstration that my concerns above are not valid, I think the best approach is to continue with the use of Figure 2-3 in its current OAIS form, together with a discussion in Section 4 addressing how an OAIS might go about preserving PDI while maintaining reasonable Authenticity.  There is, of course, nothing to prevent an OAIS from implementing a somewhat recursive approach to data object preservation as long as it can be conceptualized as shown in Figure 2-3.  After all, it is data objects that are being preserved but there are mandatory special types with mandatory relationships along with no required implementation mechanisms.  For me, it comes down to what is the best way to continue communicating preservation perspectives recognizing a very broad base of current OAIS RM adopters as well as supporting future adopters with increased clarity.


= = = =



MOIMS-DAI mailing list

 <mailto:MOIMS-DAI at mailman.ccsds.org> MOIMS-DAI at mailman.ccsds.org

 <https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai> https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai

MOIMS-DAI mailing list
 <mailto:MOIMS-DAI at mailman.ccsds.org> MOIMS-DAI at mailman.ccsds.org
 <https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai> https://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-dai


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.ccsds.org/pipermail/moims-dai/attachments/20180605/4939ae7b/attachment.html>

More information about the MOIMS-DAI mailing list