Import Export Landscape
Introduction
There are many cases where users are likely to want their Chandler data to be exported to formats other programs can read. Similarly, Chandler is unlikely to get many users if it can't import data stored in major data formats. In addition to these broad uses for import/export, Chandler power users would like to be able to access data stored in the repository in all its meta-data laden glory, and we'd like to satisfy their desires.
There is a high level of overlap between
Sharing? and import/export. To achieve either, it's necessary (or at least highly desirable) to define a canonical format for expressing Chandler items and collections of items. Once Chandler data can be easily represented in a well defined format, it can be transformed into any format for export.
Layers of data accessibility
Chandler data is stored in the
Repository?. Not all data that exists in the repository is available to users interacting with the Chandler world. Listed below are the major conceptual levels of data accessibility.
- Implementation detail level - data is stored as part of the repository but isn't directly accessible from python
- Python level - Data is accessible to Python processes running on the local machine
- P2P level - Data is accessible to anyone with permission via P2P sharing
- Server share level - Data is accessible to anyone on the internet with permission via an always-on WebDav? server
Are 2, 3 and 4 equivalent? If not, what are the differences?
Export levels
There are two primary ways that data might be exported, an export of the entire repository to a human readable format, and export of specific collections of items.
Exporting entire repositories (and later importing them) would be a useful tool for experimenting. However, to perfectly reconstruct a repository, quite a bit of information is needed above and beyond the information developers normally interact with.
What data beyond UUIDs and attributes of items are necessary to perfectly reconstruct the repository? Is there an intermediate level of accuracy that would omit version information or other details but would still be usable?
Exporting a subset of a repository (one item or a collection of related items) in an efficient way bumps into the fact that many Chandler items are defined relative to a large collection of items in the
Content Model?. It would be nice to avoid exporting the Content Model every time an item is exported, since every Chandler user is likely to already have a copy of the Content Model.
Now that namespaces are being used, perhaps items in certain namespaces could be exported differently than others? If we take this route, we also need to work on the schema evolution problem, i.e. what to do if I export a Contact v1.1 and you're schema only defines Contact v1.0.
Exporting to useful formats
From a theoretical perspective, any data representations which can be related by
bijections are equivalent.
Once OSAF exposes data in a well defined way, it can be considered exported. However, if the transform between Chandler's representation and the format another program wants isn't obvious, different representations are far from equivalent from a user's perspective.
To make it easy for users, Chandler should export data to a variety of standard formats (see
InteroperationExperimentsProposal). To make it easy for people to debug export code or to write new export formats, Chandler should (whenever possible) define exports as a transformation of canonical Chandler data using a standard transform language like XSLT. Exporting megabytes of email to mbox format via XSLT is probably too slow to be reasonable, but exporting smaller data sets probably wouldn't pose a performance problem.
Additional questions
I believe that data shared from a remote repository is implemented locally as a separate repository. Will this be true for imported data? Is this relevant to how import is implemented?
How should imported objects duplicating existing names and email addresses (or anything else) be dealt with? Overwrite? Create a separate item? Add to the existing item? To start, we'll just create a new item.
References
--
JeffreyHarris - 30 Jun 2004