r9 - 09 Oct 2003 - 19:31:36 - AndyDentYou are here: OSAF >  Jungle Web  >  ObsoleteDocuments > ChandlerDiscussionTopics > Networking > RepositoryAccessProtocolAPI

Repository Access Protocol API (RAP)

<OSAF Current Thinking -- 22 Feb 2003>


Draft February 22 2003

The general purpose of RAP is to provide network access to a Chandler repository. A repository can be generalized as a general purpose database consisting of items with an arbitrary number of name value pairs. We were unsatisfied with currently available access protocols primarily due to network inefficiencies and/or complexity issues. We have decided to model RAP on IMAP but using a more generalized and powerful API. Since extending IMAP is not an option due to a large number of email only assumptions within it, we will be building an entirely new protocol, but borrowing some IMAP concepts.

The core transport layer will be based on The Blocks Extensible Exchange Protocol Core (BEEP), as specified in RFC3080. BEEP provides framing, SASL, TLS and very importantly, parrallelism. BEEP will allow RAP to perform multiple repository queries over a single authenticated connection and return answers as they are available. This represents a significant improvement over IMAP as it exists today.

The following API is presented in Python syntax. It is not intended to act as a specification, since we are not even close to finished. We will be working on a specification over the next 6 months. This API should give the reader a strong idea of the scope and direction of the RAP protocol, but it is far from complete.

class RAP( [host[, port]] )
This class implements the actual RAP protocol. The connection is created and version and capabilities are determined when the instance is initialized. If host is not specified, 'localhost' is used. If port is omitted, a standard port, as yet undefined, is used.

Exceptions are defined as attributes of the RAP class:

exception RAP.error
Exception raised on any errors. The reason for the exception is passed to the constructor as a string.

exception RAP.abort
RAP server errors cause this exception to be raised. This is a sub-class of RAP.error.

exception RAP.interrupt
A application triggered interrupt causes this exception to be raised. This is a sub-class of RAP.error.
exception RAP.connection_closed
The connection has been closed by the close_connection() method.

exception RAP.connection_lost
The connection has been disconnected due to loss of the network or an abnormal server termination.


* MORE EXCEPTIONS HERE IN THE FUTURE *

A RAP instance has the following methods:

close_repository()
Close the currently selected repository. * NOT CURRENTLY KNOW HOW TRANSACTIONS ARE HANDLED AT CLOSE TIME. MY GUESS WOULD BE TO FORCE A TRANSACTION AT CLOSE *

Arguments: NONE

Returns: NONE, throws exception on error

close_connection()
Close the connection. Implicitly calls close_repository(), if one is selected. After close_connection() has been called no other repository access functions can be called, if they are called, they will return an exception.

authenticate( func )
Authenticate command -- requires response processing.
NOTE: At this point I don't know if we need this method. BEEP provides SASL authentication at a level below RAP, but we may want to be able to have multiple identities per connection.

Arguments: a function for querying string responses from the user.

Returns: NONE, throws exception on error

capabilities()
Arguments: NONE

Returns: a list of repository objects each describing an optional or extention based capability of the currently connected repository server. The list of capabilities is global for all repositories on the server.

select( repository_name )
Arguments: repository_name specifies the name of a repository to be selected. All further operations will be performed on this repository.

Returns: Nothing on successful selection. If another repository is currently selected, this function implicitly calls close_repository() on the currently selected repository. Throws exception on error.

search( search_type, search_arguments, [attribute_list], [ retrieval_flags ] )
Arguments: search_type specifies the type of search to be performed. search_argumentsrepresents a list of necessary data to perform the search. attribute_list is optional and specifies which attributes of the objects returned from the search are returned. Keywords "ALL" or "NONE" may be used in place of an attribute_list. If "NONE" is used, the UID is still returned. If attribute_list is omitted only the UID of each object is returned.

retreival_flags is a list of name value pairs that specifies several modifiers that effect the result set returned. retrieval_flags can contain the following:

retreive_names_flag: TRUE | FALSE - specifies that all the attribute names should be returned, even if we are not requesting all of the values. This is useful when building a partial object cache so that you can tell when you need to fetch more attributes or if your object is complete.
values_smaller_than: INTEGER - specifies that values equal to or larger the specified size are not returned. Values will be marked accordingly and their size returned
partial_value_range: pair of INTEGERS - specifies a byte range. The byte range is used to return parts of values. If an entire value fits within the range, the complete value is returned and marked accordingly. If the object is larger than the range or the range does not begin at zero, a partial object is returned and marked accordingly. The size of the entire value is returned as well.
limit_result_set: pair of INTEGERS - specifies a range. The range is used to return parts of a result set. If the entire result set fits within the range the whole set is returned, otherwise a partial result set is returned. The size of the entire set is returned as well.

Returns: an iterator object to a list of objects containing the specified attributes. If a given object did not contain an attribute named in the attribute_list the attribute will not be part of the returned object.

search_types have yet to be defined, but should include the capability to sort the result set for retrieval.
* Have not yet specified how the objects returned notify the caller that an attribute name is present, but the value has yet to be retrieved *
* Several questions about how to return all the partial value and limited result set information need to be answered *

retrieve( uid_list, [attribute_list], [ retreival_flags ] )
Arguments: uid_list specifies any positive number of UIDs for retrieval. attribute_list is optional and specifies which attributes of the objects listed are to be returned. Keywords "ALL" or "NONE" may be used in place of an attribute_list. If "NONE" is used, the UID is still returned. If attribute_list is omitted all attributes are returned. retreive_names_flag: see the definition above under search()

Returns: an iterator object to a list of objects containing the specified attributes. If a given object did not contain an attribute named in the attribute_listthe attribute will not be part of the returned object.
* WHAT DO WE DO ABOUT INVALID UIDs? *


put
( list_of_objects )
Arguments: list_of_objects is a list of objects containing a UID and any number of attributes per objects. There is no size restriction on object length or number of objects. If an object already exists in the database, the attributes in the put command will overwrite attributes in the existing object. Existing attributes not included in the put will not be deleted. This allows for the rapid update of a small number of attributes within a large set of objects.

Returns: NONE, throws exception on error

delete ( uid_list, [attribute_list] )
Arguments: uid_list specifies any positive number of UIDs for deletion. attribute_list is optional and specifies which attributes of the objects listed are to be deleted. If attribute_list is omitted all attributes are deleted and the object itself is deleted.

transaction( "SAVE" | "REVERT" )
Arguments: keywords "SAVE" or "REVERT"

Returns: NONE, throws exception on error

Class RAP_iterater()


RAP_iterater has the following methods:

next( [number])
Arguments: number is optional and specifies the number of objects to be returned. If not specified the default is one (1)
Returns: a list of repository objects.

all
()
Arguments: NONE

Returns: a list of repository objects containing all objects within the iterator. Or if some have already been returned via next() only the remaining objects are returned.

abort()
Arguments: NONE

Returns: nothing. The current iterator will no longer be valid, all further network data associated with the iterator will be cancelled.

Issues with current spec:

* need negotiable transfer encodings. BEEP will provide this as part of the channel start process, but I haven't worked out the details.
* need some thoughts about per attribute authorization
* need to fill in alot of details around object retrieval


-- LouMontulli - 22 Feb 2003

<End OSAF Current Thinking>



Questions and Answers

locking (for multi-user concurrent access)

  • Question: Will a RAP client be able to lock an item in a repository? (BrianDouglasSkinner - 26 Feb 2003)

  • Answer: This is still under discussion, but likely not. I would expect to see a system more like CVS where changes are merged in rather than held locked. Some locking would be necessary during writes and merges, but otherwise not. (LouMontulli - 27 Feb 2003)

change notification

  • Question: Will RAP provide some kind of mechanism to let a RAP client "subscribe" to get notified about changes to an item? (BrianDouglasSkinner - 26 Feb 2003)

  • Answer: Yes, definately. A database trigger mechanism is being designed now, and RAP will reflect that functionality. RAP is going to use BEEP for a transport layer which supports asyncronous messaging which allows for notifications. There are some significant challenges though:
    • What happens to notifications when a client isn't connected and can't be reached?
    • How do large servers deal with triggers? It may be unworkable to require a large server to support triggers. In this case what other mechanism can the client use? (LouMontulli - 27 Feb 2003)

change notification

  • Question:
    • For example, let's say a RAP client uses search or retrieve to get an event in a calendar, and then the event is displayed to some user (Pat) in the UI. If another user (Chris) on another machine changes the end time of the event, how does the UI code on Pat's client find out that the event needs to be redisplayed? Would it be good to have some kind of mechanism that allowed a RAP client to automatically subscribe for notifications about all of the items that the RAP server has returned to the client?
    • Or, not even considering issues about multi-user concurrent access, let me just offer a single-user example. Say I'm using Chandler, and I have a couple views open: a calendar day view and a calendar week view. If I create a new event in the day view, how does the week view get notified that there now exists a new event it should be displaying?
    • And, if we are talking about general mechanism for managing notifications about changes to query results, then is that related to the issue that people are talking about in these posts on the design list: lists: "Knowing when you've read the most recent e-mail in a mailbox" and lists: "Recognizing a response, and group filters". (BrianDouglasSkinner - 26 Feb 2003)

  • Answer: This is a great example. There are multiple ways this could be implemented. I suspect that we will need to use a few of these methods in order to deal with firewall issues:
    • If a RAP connection is active, database change notifications could be sent
    • triggers could be used to send a notification via:
      • RAP
      • a jabber message
      • email
    • The client could periodicaly poll the server to look for changes. (LouMontulli - 27 Feb 2003)

database triggers

  • Question: Will the RAP API allow a client to subscribe to be notified when an arbitrary database condition is met?
    • On the Extensibility page, there are a couple bullet points the need for trigger and scripts, "Chandler can detect when an arbitrary database condition is met" and "Execution of a script whenever a particular condition is triggered"? Are there some scripts that run on the server, and others that run on the client? Does the server ever notify a client that it should run a script? (BrianDouglasSkinner - 26 Feb 2003)

  • Answer: Triggers will be as arbitrary as possible, but we will not support a turing complete language for triggers. Because triggers run on remote servers we cannot allow arbitrary code to run there or else the security of all users could be comprimised. (LouMontulli - 27 Feb 2003)

delete semantics

  • Question: What are the semantics of the delete method?
    • If a RAP client deletes an item that represents an entire calendar, do all of the events that are "in" that calendar get deleted?
    • Is there some notion of garbage collection for items that are "orphaned" as the result of a deletion?
    • What happens when one RAP client makes changes to a calendar that another RAP client just deleted?
    • How is referential integrity handled? If item A has an attribute that points to item B, what happens to item A when item B gets deleted?
    • Does the repository keep a change log for each item, or a series of item versions? Using the delete method, when an item is deleted, does that mean that item is just marked as deleted, or is item actually erased on disk, along with all of the previous versions and/or change log entries? (BrianDouglasSkinner - 26 Feb 2003)

  • Answer: These issues are still being discussed, and depend more on the database than RAP. We would like to use a database with garbage collection to make things easier. (LouMontulli - 27 Feb 2003)



Discussion


Comment on "class RAP ([host[, port]])": A user might want multiple repositories on their localhost for testing purposes. Suppose that someone has made some schema changes and wants to test these on new repository before making changes to their "working" repository. (PaulBHill 3/18/03)


Comment on Capabilities: I don't think that capabilities should be global for all repositories on the server. If you are running on multiple port numbers the capabilities should be specific to the port that the client connected to. Again, suppose that you have two repositories that have different schemas, capabilities might end up reporting information about the schema requirements or version... (PaulBHill 3/18/03)


Question about triggers: Will triggers need to be replicated themselves or will it just be enough that the data generated (if any) by the trigger's execution will be replicated? The reason I ask is that I sense that triggers will essentially be callbacks to a parcel by the data api. If so, then triggers will be unable to run on anything except the computer where the parcel has been installed on.

-- MikeT - 21 May 2003


BEEP Observations

Please consider using and enhancing BEEPy http://beepy.sourceforge.net/ which seems a clean design and good starting point. I've spent a lot of time developing and debugging my own BEEP-based (pure XML wrappers) stack using C++ on OS/X, Windows and PocketPC?, with additional client-only libs in Python, Flash ActionScript? and RealBASIC? (unfortunately the client refuses to release it as open source) and based on that experience I can offer a few tips:

  • don't underestimate the workload to start from scratch
  • beware of race conditions trying to start channels before startup is complete. In particular, localhost timing on Windows is quite different from remote connections.
  • you need to think about timeouts if connections become unresponsive but socket closure isn't detected - this is not part of BEEP.
-- AndyDent - 10 Oct 2003
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r9 < r8 < r7 < r6 < r5 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.