Why Endpoints and Item Clouds
At some level, every repository will have a different beginning set of Kinds, because each Kind will have a different pseudo-random UUID. At a conceptual level, what matters is that my Kinds behave just like your Kinds. The fact that different repositories will have different UUIDs for their basic Kinds presents a problem when sharing Items, however.
Conceptually, we want to allow a Contact to exist in two different repositories simultaneously. At the low level, it isn't clear what this means. To fully describe most Kinds, you need to know about LOTS of other Kinds. To have a perfect copy of my Contact, you may need to have a perfect copy of my entire
Content Model, because the web of related UUIDs from one repository will be meaningless in another repository.
One way to deal with this problem might be to give certain common Kinds a constant UUID. No one I've talked to seemed to think this would work very well, it might be interesting to hear an explanation of why not.
The
Content Model will likely change over time, so not all Chandler users will have identical Content Models. But
much of the
Content Model will stay the same from year to year.
We don't want to pass around the entire
Content Model (or large subsets of it) every time we want to share a few items. We want to be able to describe a subset of one repository in such a way that a second repository can interpret exactly what that subset is supposed to mean, even if some of its
Content Model looks a little different.
Item Clouds
- Item Clouds, or just Clouds, are an attempt to clearly define a subset of a repository. To get there, we need a few other bits of terminology:
- Let an Endpoint be a new, well defined first class element of the Chandler universe (or maybe it can be a special kind of Kind, needs more thought) Think of Endpoints as being canonical representations of vanilla Chandler Kinds.
- A Cloud Path is a description of how to walk a repository starting from a particular Entrypoint. Literal Attributes may be included or ignored. Reference Attributes may be:
- included, in which case the referred to Item is included in the cloud, and the referred to Item's Attributes are included or not, depending on the Path's description
- ignored
- endpointed in which case the referred to Item isn't included, but instead a reference to an Endpoint is included.
- A Cloud must be complete, that is to say, given an Item I in a Cloud, all Items referred to by I must either be part of I, or they must be Endpoints.
Endpoints
The idea of Endpoints is to provide some mechanism for referring to (conceptually) identical Kinds in different repositories.
If you set aside the problem of a changing Content Model and assume that the Content Model will stay the same forever (bad assumption), Endpoints could be implemented by just referring to Content Model Kinds by their path in the repository. So, if I wanted to send out a cloud referring to a
Contact, it might look like:
<ItemCloud>
<Item uuid="123456789">
<Attribute name="kind" endpointRef="//parcels/OSAF/contentmodel/contacts/Contact" />
<Attribute name="contactName" itemRef="24680"/>
</Item>
<Item uuid="24680">
<Attribute name="kind" endpointRef="//parcels/OSAF/contentmodel/contacts/ContactName" />
<Attribute name="contactNameOwner" itemRef="123456789"/>
<Attribute name="fullName">John Doe</Attribute>
</Item>
</ItemCloud>
If this little Cloud was created in a remote repository, that repository would bind the endpointRefs to its own version of those well known Kinds.
In the real world where the Content Model does, every Endpoint should have a name, version, and source. Each name, version, and source combination should be unique.
Endpoint Mappings
In some circumstances, it may not be completely essential that a remote repository understand perfectly what an Item's Kind is. From a user's perspective, if all I want to do is look through the shared contacts in your repository to find a phone number, it shouldn't really matter if we represent our Contacts in slightly different ways, as long as your repository can send me information about Contact names and phone numbers.
To make this work, mappings between versions could be created. In this way, if Jacob has a collection of OSAF Contacts v.2, and Jacob wants to share them with Imelda, whose repository only has OSAF Contacts v.1 registered, when Jacob tries to share a contact with Imelda, Jacob's repository can negotiate the highest version of Contact, then if they don't match perfectly, Jacob's repository can map Contact's to the appropriate version. Such a mapping might lose a bit of information, but preserving the most common pieces of a Contact is probably good enough.
This would probably only work for read only sharing, to do read-write, probably both versions would need to be the same. But falling back to read-only is far better than not being able to share at all.
In the long run, this seems like a useful backwards compatibility feature, it's probably not important to make it happen until Chandler 1.0 ships.
New Endpoints
Anyone should be able to define an Endpoint, not just OSAF. Perhaps XML namespaces could be used to define different families of Endpoints.
--
JeffreyHarris - 5-16 Apr 2004
Comments
Very interesting way to look at how data can be represented so that information that describes the version of the data is brought along as well as a roadmap of how to read the data that may not be known to the reader. Is this accurate?
Also, would it wrong to think of an Endpoint as being analogous to a Tag in a version control system? Is it a way of marking a spot in the continuum of changes that is the core schema?
--
MikeT - 08 Apr 2004
Yup, a lot like a Tag in a version control system. Maybe Tag's a better name than endpoint?
--
JeffreyHarris - 16 Apr 2004
See also