Observable Queries
This page is for discussing Observable Queries.
- this page touches on these subjects
- queries
- notifications
- caching: cache coherency & cache invalidation
- observer/observable
- sharing: publishing & subscribing
- replication & synchronization
Contents
Summary #
What is an observable query?
I'm using the term "query" in the sense of a Chandler query -- a query that's used to get a result set of items that are shown in a view.
I'm using the term "observable" in the sense of the Observer/Observable design pattern, like you see in java in java.util.Observer and java.util.Observable. (If you want more background on that, have a look at this article.)
An observable query is a python object that represents a Chandler query. Other python objects can register to be an observer of an observable query. Whenever the result set of the query changes, the query notifies its observers.
What problem is this solving?
There are two problems here:
- On a single machine, when a user makes a change in one view, how does another view find out about it?
- On a network, when one user makes a change to an item, how do other users find out about it?
Observable queries could be used to address either, or both, of those questions.
Does this affect the UI?
No, observable queries are an "implementation detail". Chandler could be implemented with or without observable queries, and hopefully users would never know the difference. Observable queries are just an implementation mechanism, operating under the covers.
Client-side or repository-side?
One option is to use observable queries just on the client-side. Another option is to have observable queries on both the client-side and the repository-side, and for the client and repository to communicate about them.
What are the pros & cons? @@@ Brian -- of which option?
- pros:
- might improve network efficiency
- might lead to fewer bugs
- might lead to more consistent user experience
- might lead to less work for parcel developers
- see also the motivations section below
- cons:
- requires more up-front work developing the observable query features
- might lead to more complicated APIs for parcel developers to learn
- might increase total code complexity
Browsing vs. Subscription #
Browsing and subscription are two different cases:
- Browsing example: Chantal publishes her calendar, making it publicly readable to everyone in her department. Dean "browses" Chantal's calendar, looking at all the events scheduled for next month. As Dean is looking at the calendar on his machine, Chantal adds an event in the calendar on her machine. Dean does not automatically see the new event. Dean's machine made a one-time query, and he won't ever see any new changes unless he forces a new query, which he could maybe do by pressing some button, or by closing the window and then opening it again.
- Subscription example: Erin publishes her calendar, and Felix subscribes to it. Whenever Felix looks at Erin's calendar, he always seeing the latest info. If Erin adds a new event, the event will automatically show up in Felix's view.
Observable queries can be used within a client, or they can be used in a coordinated client-repository design.
- If observable queries are used just within a client, that will work fine with either the browsing case or the subscription case, but it won't have anything much to do with actually implementing the browsing or the subscription.
- If observable queries are used in a coordinated client-repository design, then the observable queries will help solve the problem of how to manage subscriptions. For the simpler task of just browsing (rather than subscribing), the observable queries wouldn't be necessary, and some simple one-time query could be used instead.
A Few Scenarios #
Scenario 1 -- simple straw man case #
- Imagine you're writing a simple stand-alone app. Here's what it looks like:
- single user: Only one person ever uses the app.
- no server: There's no client & server -- there's just a single process running.
- no peer-to-peer: There are no other peers of this app. The app will never even access a network.
- no data store: There's no data store. All the data is loaded at start-up and kept in memory until shut-down.
- model & view: There is, however, a separation between the model and the view. Data objects form the model, and view code displays the data objects.
- multiple views: The user can see multiple views at once, side-by-side. Each view displays a single data object. A data object can appear in more than one view.
- Question: If the user modifies an object in one view, and other views are displaying the same object, how do the other views find out that they should redisplay the object?
- Answer: The app framework includes some kind of "observer/observable" mechanism. Whenever a view first displays an object, it registers itself as an observer of that object. Whenever the object is modified, it notifies all its observers.
Scenario 2 -- views display result sets #
- Scenario 2 is identical to Scenario 1, except that a view can display a number of objects at once. The view determines what objects to display based on the properties of the objects.
- Example: In a calendar app, there are event objects. A Week View shows all the events that fall in a given week. A Day View shows the events for a given day.
- Question: Let's say the user has two views, a Day View that shows Thursday, and a Week View that shows the whole week. Suppose the user, in Week View, drags a "Lunch" event from the Wednesday column and drops it on the Thursday column. How does the separate Day View find out that it needs to display the "Lunch" event?
- Answer:
- observer/observable: In scenario 1 we used a simple "observer/observable" mechanism to register interest in individual objects, but that won't work now. In scenario 2, the Day View has never displayed the "Lunch" event before, and has never registered as an observer, so it is never notified of the edit.
- polling: A second alternative is to have Day View periodically poll the model. Every few seconds it could query the model asking for all the events that are scheduled for Thursday. After each query, day view takes the new result set and compares it to the old result set currently being displayed. If the new result set contains a new event, the event is added to the old result set, and the view is updated. If the new result set is missing one... well, you get the idea.
- observe everything: A third alternative is to have Day View observe everything. Day View registers with the model, requesting to observe every change to the model. Whenever any data object is changed, the model notifies Day View, and Day View figures out whether it cares.
- observable queries: A fourth alternative is to have observable queries. When Day View is first displayed, it creates a query that asks for all the events that fall on the given day. Day View registers itself as an observer of the query, and then submits the query to the model. The model determines the initial result set for the query, sets the query to point to the result set, and tells the query to notify its observers. Whenever a data object is changed, the model notifies all the currently active queries, and each query figures out whether it cares. If the query cares about the change, then it updates its result set and notifies its observers. When the user closes the Day View window, Day View deletes the query and the query is no longer registered with the data model.
Scenario 101 -- simple client-server case #
- Scenario 101 is identical to Scenario 1, except now there's a client-server architecture, with multiple users. The server maintains a data store, and clients submit changes to the server. Five different people may be sitting in front of five different clients, each one looking at a view of a single shared data object.
- Question: If one user modifies the object in one view, then how do the other clients find out that they need to redisplay the object?
- Answer:
- "cache invalidation": We could extend the RAP API to support the idea of cache invalidation notices. Each client uses the RAP API to register interest in object. The server keeps track of the interest registrations, and whenever a data object is changed, the server sends invalidation notices to the interested clients.
- explicit vs. implicit registration: The RAP API could be set up to handle interest registrations either implicitly or explicitly. With implicit registration, whenever the server sent a data object to a client (e.g. in response to any query), the server would automatically register the client as being interested in that data object. With explicit registration, the client would have to use a separate method in the RAP API to send an interest registration request to the server.
- synchronous vs. asynchronous notifications: When the server needs to notify a client about a change to a data object, that notification could be transmitted either synchronously or asynchronously, using a server push model or a client pull model. The server could just up and send a notification whenever it wanted, or it could periodically send batches of notifications. Alternatively, the server could never send notifications, instead waiting for the client to periodically poll for new notifications. Or the invalidation notices could be included as unsolicited "attachments" at the end of server responses to other queries.
- changes vs. notices: When the server sends a notification to a client, we have a few options for what we want the notification to include:
- invalidation notice: The notification could just be an invalidation notice. The invalidation notice would just give the uid of data object in question, with no further info. The client would find the corresponding data object in its local cache and mark it as stale. The client might then issue a new query to the server to get a fresh copy of the data object, or the client might choose to handle the situation in some other way.
- the new data object: Rather than just send an invalidation notice, the server might just assume that the client is going to be interested in seeing the new data object, so the server might always preemptively send a fresh copy of the new data object.
- the changes: Alternatively, the server might send just a "change object", which would represent the change made to a data object. Let's say the data object in question is an event object with dozens of attributes, but that the change only touched a single attribute, the "start time". Instead of resending the whole event, the server could create a change object. There could be different types of change objects, but for a simple attribute change, the change object might just have 3 attributes: the uid of the event, the attribute that was changed, and the new value assigned to the attribute. The client could then "apply" the change to its own cached version of the object.
Scenario 102 -- client-server, with views that display result sets #
- Scenario 102 incorporates all the aspects of both Scenario 2 and Scenario 101. So, to summarize:
- client/server: Many clients talk to a single server.
- multi-user: Many people are using the many clients.
- no peer-to-peer: There is no peer-to-peer communication between the clients.
- server data store: All the data is stored in a central data store on the server.
- multiple views: Each user can see multiple views at once, side-by-side.
- views show result sets: Each view can show the set of data objects that resulted from some query.
- Question: Let's say we have two users, Allison and Barry. They are both looking at a single calendar, let's say the calendar for the engineering department schedule. Allison has two views up, a Week View and a Day View showing Thursday. And Barry also has two views up, a Week View and a Day View showing Thursday. Allison, in Week View, drags the "Design meeting" event from the Wednesday column and drops it on the Thursday column. How does Allison's Day View find out that it needs to display the "Design meeting" event? How do Barry's two views each find out that they need to update their displays?
- Answer: There are a lot of possible options for how this could work. Here's the sequence of events that I'm proposing:
- observable queries:
- Allison brings up Week View
- When Allison first brings up Week View, it creates a Python query object. Let's call that Allison's Week View Query. The Week View Query object is defined as asking for all the events that fall in the given week. Week View registers itself as an observer of the Week View Query object, and then submits the Week View Query object to the data model, perhaps through some "Data Model API".
- The model checks to see if the Week View Query can be resolved locally, using just the data objects already cached in the client. Let's say that it can't, maybe because Allison only just launched Chandler, and almost nothing has been cached.
- The model forwards the query on to the server, via RAP.
- The server looks in the data store, and assembles the appropriate result set for the query.
- The server registers the query as an active query. The server keeps a copy of the query, and makes a mental note to itself to send Allison notifications if the data store is changed in any way that affects the result set of the query.
- The server returns the initial result set to Allison's client.
- On Allison's client, the data model code caches the initial result set. Then the data model code hands the result set to the original Week View Query object.
- The Week View Query object notifies all of it's observers, which in this case just means the Week View that originally created the query.
- Week View displays the result set.
- Allison brings up Day View
- Simplest option: In the simplest case, this day view query is handled exactly the same way that the week view query was, creating a round trip to the server.
- Smart-client option: Alternatively, if the client had more smarts, it could recognize that it didn't need the server to resolve the query. In this case, the model code on the client would notice that it already had everything it needed in the local cache, and it would construct the result set for the query. However, I think it might be tricky to know when you can resolve a query locally and when you can't. For more on this, see the section below about Query Subsets.
- Smart-server option: Here's another alternative. Let's say that the client isn't smart enough to avoid bothering the server, so the client sends the query to the server. The server could simply look in the data store to resolve the query, just like in step 5 of the example above. Or, if the server is smarter, it could first check to see if the query could be satisfied using only the information already in the server cache. The code for this might end up looking just like the code that it would take to implement the "smart-client option" described above.
- Barry brings up Week View
- Barry's Week View query is handled exactly the same way that Allison's Week View query was handled.
- Barry brings up Day View
- Barry's Day View query is handled exactly the same way that Allison's Day View query was handled.
- Allison edits an event
- Allison, in Week View, drags the "Design meeting" event from the Wednesday column and drops it on the Thursday column.
- The Week View code immediately draws the event in the Thursday column, providing a responsive UI. The Week View code then tells the "Design meeting" event to change its "start time". The event creates a change object representing the change, and applies the change to itself.
- The "Design meeting" event tells the client data model that it has been changed, and passes it the change object.
- The data model code determines which queries are interested in the change. The data model notifies both Day View query and the Week View query about the change.
- The Day View query looks at the change, and looks at the changed event, and determines that the event needs to be added to the query result set. The query adds the event to the result set, and notifies its observer, Day View.
- Day View gets the notification and displays the "Design meeting" event.
- The Week View query looks at the change, and looks at the changed event, and realizes that this is a change to one of the events in its result set. The event still belongs in the result set, but the event has changed, so the Week View query notifies its observer, Week View.
- Week View gets the notification, and Week View recognizes the change object as the same change object from back in step 2. Week view realizes that it doesn't need to do anything now, because it already did a re-draw back in step 2.
- Meanwhile, back in the client data model code... After handling all the local queries (in steps 4 to 8), the data model code now tells the server about the change, via RAP.
- The server applies the change to the appropriate event record in the data store.
- The server checks to see if it has a list of active queries. The server notifies each of the queries about the change it just applied. Each query thinks about whether it cares about the change.
- Allison has two registered queries, the Week View query and the Day View query. It's possible that neither of these queries cares about the change, because they know the change came from Allison in the first place, and they know that Allison's client has already applied the change and updated the UI. Alternatively, maybe these two queries should go through all the formalities, and report the change back to Allison's client.
- Barry also has two registered queries, the Week View query and the Day View query. Each of these queries thinks about the change, recognizes that this is a change it cares about, and tells the server it cares.
- The server sends a notification to Barry's client. The notification may just include the change itself, or it might also include a list of the queries that say they care.
- Barry's client receives the change notification from the server.
- The data model code on Barry's machine finds the "Design meeting" event in the local cache and applies the change to the event. This triggers a cascade of steps, which almost exactly mirror what happened on Allison's client in steps 3 through 8 above. The only exception is step 8, where in the case of Barry's client, Week View just goes ahead and re-displays the event.
- Barry closes Week View
- Week View messages to its Week View query object, unregistering itself as an observer.
- The query object notices that it no longer has any observers, and so it decides to retire. It messages to the data model code, asking to unregister itself as a query.
- The data model code sends a message to the server, via RAP, to unregister the query.
- The server finds its corresponding server representation of the query, and it removes the query from the list of active queries, and deletes the query.
- Back on Barry's client, the data model looks at its cache, and checks to see if there are any cached objects that exist in the cache solely because they were in the result set of the now-retired query. If there are any such objects, the data model drops them from the cache uid look-up tables, thus allowing Python to garbage collection them.
- The data model on Barry's client then unregisters the query, allowing the query to be garbage collected.
- Barry closes Day View
Scenario 202 -- peer-to-peer, with views that display result sets #
- Scenario 202 is like Scenario 102, but with peer-to-peer sharing instead of a client-server model. So, to summarize:
- multi-user: Many people are using the many clients.
- peer-to-peer: Any of the clients may subscribe to info provided by any of the other clients.
- multiple views: Each user can see multiple views at once, side-by-side.
- views show result sets: Each view can show the set of data objects that resulted from some query.
One way to think about the peer-to-peer scenario is just as an extension of the client-server scenario above. In the client-server scenario there was just one server. In the peer-to-peer case, each peer can act as a kind of server for any of the other peers. So from each client's perspective, it's now gathering data from many servers rather than one server.
But the peer-to-peer situation is a little more complicated. The client-server scenario assumed that there would always be a connection between the client and the server. The peer-to-peer scenario recognizes that individual peers may often not be running, and that there may be long periods when two peers fail to connect. Peers can keep local copies of result sets, so that the results are still available even if the connection goes down.
I'm not sure I have a good understanding of OSAF's peer-to-peer design. For some exploration of that, check out the next section, with the
Box Diagrams. But for now, I'm assuming the design will look something like this:
+--------------------+ +------------------------+
| Tanya's client |---+-RAP-| Tanya's repository |
+--------------------+ / +------------------------+
/
+--------------------+/ +------------------------+
| Sebastian's client |-----RAP-| Sebastian's repository |
+--------------------+ +------------------------+
- Question: Let's say we have two users, Sebastian and Tanya. They are both looking at a single calendar, let's say Tanya's calendar, which Sebastian subscribes to. Sebastian has two views up, a Week View and a Day View showing Thursday. And Tanya also has two views up, a Week View and a Day View showing Thursday. Sebastian, in Week View, drags the "Design meeting" event from the Wednesday column and drops it on the Thursday column. How does Sebastian's Day View find out that it needs to display the "Design meeting" event? How do Tanya's two views each find out that they need to update their displays?
- Answer: There are a lot of possible options for how this could work. Here's the sequence of events that I'm proposing. This is similar to the answer I proposed for Scenario 102, with just a few differences, which are marked in red.
- observable queries:
- Tanya brings up Week View
- Tanya's Week View query is handled exactly the same way that Allison's Week View query was handled in Scenario 102.
- Tanya brings up Day View
- Tanya's Day View query is handled exactly the same way that Allison's Day View query was handled in Scenario 102.
- Sebastian brings up Week View
- When Sebastian first brings up Week View, it creates a Python query object. Let's call that Sebastian's Week View Query. The Week View Query object is defined as asking for all the events that fall in the given week. Week View registers itself as an observer of the Week View Query object, and then submits the Week View Query object to the data model, perhaps through some "Data Model API".
- The model checks to see if the Week View Query can be resolved locally, using just the data objects already cached in the client. Let's say that it can't, maybe because Sebastian only just launched Chandler, and almost nothing has been cached.
- The model forwards the query on to Sebastian's repository, via RAP.
- Sebastian's repository checks to see if it can assemble a result set on its own, using its own copies of the calendar events that were in the calendar in Tanya's repository. Let's say Sebastian's repository is able to assemble the result set, because the copies seem to be fresh. (A second scenario would be that the copies are stale, but Tanya's repository is off-line, so we use the stale copies anyway. Or, another scenario would be that the copies are stale, and Sebastian's client ends up polling Tanya's repository and updating Sebastian's repository with fresh info.)
- Sebastian's repository registers the query as an active query. The repository keeps a copy of the query, and makes a mental note to itself to send Sebastian notifications if the data store is changed in any way that affects the result set of the query.
- Sebastian's repository returns the initial result set to Sebastian's client.
- On Sebastian's client, the data model code caches the initial result set. Then the data model code hands the result set to the original Week View Query object.
- The Week View Query object notifies all of it's observers, which in this case just means the Week View that originally created the query.
- Week View displays the result set.
- Sebastian brings up Day View
- Simplest option: In the simplest case, this day view query is handled exactly the same way that the week view query was, creating a round trip to Sebastian's repository.
- Smart-client option: Alternatively, if the client had more smarts, it could recognize that it didn't need the repository to resolve the query. In this case, the model code on the client would notice that it already had everything it needed in the local cache, and it would construct the result set for the query. However, I think it might be tricky to know when you can resolve a query locally and when you can't. For more on this, see the section below about Query Subsets.
- Smart-repository option: Here's another alternative. Let's say that the client isn't smart enough to avoid bothering the repository, so the client sends the query to the repository. The repository could simply look in the data store to resolve the query, just like in step 5 of the example above. Or, if the repository is smarter, it could first check to see if the query could be satisfied using only the information already in the repository cache. The code for this might end up looking just like the code that it would take to implement the "smart-client option" described above.
-
-
- Sebastian edits an event
- Sebastian, in Week View, drags the "Design meeting" event from the Wednesday column and drops it on the Thursday column.
- The Week View code immediately draws the event in the Thursday column, providing a responsive UI. The Week View code then tells the "Design meeting" event to change its "start time". The event creates a change object representing the change, and applies the change to itself.
- The "Design meeting" event tells the client data model that it has been changed, and passes it the change object.
- The data model code determines which queries are interested in the change. The data model notifies both Day View query and the Week View query about the change.
- The Day View query looks at the change, and looks at the changed event, and determines that the event needs to be added to the query result set. The query adds the event to the result set, and notifies its observer, Day View.
- Day View gets the notification and displays the "Design meeting" event.
- The Week View query looks at the change, and looks at the changed event, and realizes that this is a change to one of the events in its result set. The event still belongs in the result set, but the event has changed, so the Week View query notifies its observer, Week View.
- Week View gets the notification, and Week View recognizes the change object as the same change object from back in step 2. Week view realizes that it doesn't need to do anything now, because it already did a re-draw back in step 2.
- Meanwhile, back in the client data model code... After handling all the local queries (in steps 4 to 8), the data model code now tells Sebastian's repository about the change, via RAP.
- The new change is added to a change queue in Sebastian's repository.
- Sebastian's repository applies the change to the appropriate event record in the data store -- the local copy of the original event from Tanya's repository. After applying the change locally, the change is marked as having been applied locally, but it is left in the queue.
- Sebastian's repository checks to see if it has a list of active queries. Sebastian's repository notifies each of the queries about the change it just applied. Each query thinks about whether it cares about the change.
- Sebastian has two registered queries, the Week View query and the Day View query. It's possible that neither of these queries cares about the change, because they know the change came from Sebastian in the first place, and they know that Sebastian's client has already applied the change and updated the UI. Alternatively, maybe these two queries should go through all the formalities, and report the change back to Sebastian's client.
- Meanwhile, back in the client data model code... Having told Sebastian's repository about the change, the data model code now needs to tell Tanya's repository. Sebastian's client now tells Tanya's repository about the change, via RAP, and then tells Sebastian's repository to delete the change from the queue of pending changes.
- Tanya's repository applies the change to the appropriate event record in the data store.
- Tanya's repository checks to see if it has a list of active queries. Tanya's repository notifies each of the queries about the change it just applied. Each query thinks about whether it cares about the change.
- Tanya also has two registered queries, the Week View query and the Day View query. Each of these queries thinks about the change, recognizes that this is a change it cares about, and tells the server it cares.
- Tanya's repository sends a notification to Tanya's client. The notification may just include the change itself, or it might also include a list of the queries that say they care.
- Tanya's client receives the change notification from Tanya's repository.
- The data model code on Tanya's machine finds the "Design meeting" event in the local cache and applies the change to the event. This triggers a cascade of steps, which almost exactly mirror what happened on Sebastian's client in steps 3 through 8 above. The only exception is step 8, where in the case of Tanya's client, Week View just goes ahead and re-displays the event.
- Meanwhile, back on Tanya's repository... Having told Tanya's client about the change, the repository now needs to tell other interested parties. Tanya's repository checks for subscription queries, and finds that Sebastian subscribes to the calendar. Tanya's repository notifies the subscription query about the change it just applied. The calendar subscription query sees that this is a change to one of the events in the subscribed calendar, so at first glance the subscription query thinks that it cares about the change. However, on looking more closely, the subscription query then sees that the change originated from Sebastian's client, and the subscription is registered with Sebastian's client, so there's no need to send an update notification to Sebastian's client..
- Sebastian closes Week View
- Week View messages to its Week View query object, unregistering itself as an observer.
- The query object notices that it no longer has any observers, and so it decides to retire. It messages to the data model code, asking to unregister itself as a query.
- The data model code sends a message to Sebastian's repository, via RAP, to unregister the query.
- Sebastian's repository finds its corresponding representation of the query, and it removes the query from the list of active queries, and deletes the query.
- Back on Sebastian's client, the data model looks at its cache, and checks to see if there are any cached objects that exist in the cache solely because they were in the result set of the now-retired query. If there are any such objects, the data model drops them from the cache uid look-up tables, thus allowing Python to garbage collection them.
- The data model on Sebastian's client then unregister the query, allowing the query to be garbage collected.
- Sebastian closes Day View
Box Diagrams #
I'm not sure I have a good understanding of OSAF's peer-to-peer design.
Stand-alone app #
In the stand-alone case, there's just one computer, with a single client accessing its own personal repository.
+--------+ +------------+
| client |-RAP-| repository |
+--------+ +------------+
There might also be a middleware layer. It's not clear yet where the middleware layer would end up in the architecture -- maybe on the client side of RAP, or maybe on the repository side.
+--------+------------+ +------------+
| client | middleware |-RAP-| repository |
+--------+------------+ +------------+
+--------+ +------------+------------+
| client |-RAP-| middleware | repository |
+--------+ +------------+------------+
Client-server #
In the client-server case, there are 2 or more different clients, on different machines, accessing a single server repository.
+----------+
| client 1 |+
+----------+ \ +------------+
>-RAP-| repository |
+----------+ / +------------+
| client 2 |/
+----------+
Again, the middleware layer might end up on either side of RAP.
+----------+------------+
| client 1 | middleware |+
+----------+------------+ \ +------------+
>-RAP-| repository |
+----------+------------+ / +------------+
| client 2 | middleware |/
+----------+------------+
+----------+
| client 1 |+
+----------+ \ +------------+------------+
>-RAP-| middleware | repository |
+----------+ / +------------+------------+
| client 2 |/
+----------+
Peer-to-peer #
In the peer-to-peer case, it seems like there are a few different ways that things could be set up. I'm not sure what the OSAF current thinking is about how peer-to-peer will be done.
Here's one option. Each client has its own repository. When user 2 browses user 1's data, client 2 talks directly to repository 1.
+----------+ +--------------+
| client 1 |---+-RAP-| repository 1 |
+----------+ / +--------------+
/
+----------+/ +--------------+
| client 2 |-----RAP-| repository 2 |
+----------+ +--------------+
Here's a second option. In this case, each client only ever talks to its own repository. When user 2 browses user 1's data, that's accomplished by repository 2 making calls to repository 1.
+----------+ +--------------+
| client 1 |-------------------------+-RAP-| repository 1 |
+----------+ / +--------------+
/
+----------+ +--------------+ /
| client 2 |-RAP-| repository 2 |/
+----------+ +--------------+
Here's a third option. In this case, each client only ever talks to its own repository. When user 2 browses user 1's data, that's accomplished by client 2 talking directly to client 1.
+----------+ +--------------+
| client 1 |-RAP-| repository 1 |
+-----+----+ +--------------+
/
API?
/
+----------+/ +--------------+
| client 2 |-RAP-| repository 2 |
+----------+ +--------------+
Here's a fourth option. It's hard for me to imagine this one being more attractive than the first three, but I thought I'd include it just to complete the logical quartet of options.
+----------+ +--------------+
| client 1 |-RAP-| repository 1 |
+-----+----+ +--------------+
/
API?
/
+----------+ +----+---------+
| client 2 |-RAP-| repository 2 |
+----------+ +--------------+
Query Subsets #
Sometimes a view will make a new query, where that new query turns out to be a logical subset of some previous query. In theory, we could take advantage of this to reduce network traffic, although in practice it's probably not worth it.
Example
For example, let's say Olga switches from week view to day view. Day view will create a new query. The new query is asking for all the events that ((a: are in that calendar) AND (b: fall on that day) AND (c: are visible to Olga) AND (d: meet any other filtering constraints that Olga set up)). Because Olga was just looking at week view, there's already a week view query in effect. The new day view query has exactly the same query criteria as the week view query, except for part 'b:', where day view wants just the events for Thursday, rather than the events for the whole week.
At the time the query is made, we don't know yet exactly what the result set of the query will be, but we do know that the new result set will be a subset of the existing week view result set. Which means that in theory we could resolve the new query locally, without asking the repository for help. In practice it may not be worth it to try to resolve queries locally, or it may only be worth it in some special cases.
Perils
In the general case, every time a new query was created, the client model code would have to compare the new query to all the existing queries, checking to see if the new query was a subset of an existing query. That sounds like a lot of work, but the code for that might actually be pretty simple, and it might be easy to get it to run quite quickly.
But there are other issues too. What happens if Olga closes week view before she closes day view? The week view query wants to unregister itself, but now the day view query is dependent on it. One option would be to (a) cut the dependency, and then (b) have the day view query register itself with the repository, and then (c) let the week view query unregister itself. Another option would be to just leave the week view query in place, and just have it serve as a feeder to the day view query. That's simple enough, but the downside there is that now the repository is sending update notifications about all the events for the whole week, when the client really only cares about a seventh of them.
Server-side
The discussion above talks about using query subsets on the client-side. You could also use query subsets on the server-side. When a server receives a new query, the server could check to see if the new query was a subset of some existing query that the server was still servicing. One problem, though, is that the different queries that a server gets are likely to come from different users, who may have different access permissions, which may make it impossible to consider one user's query to ever be a subset of a different user's query.
Persistent Queries #
If observable queries are used just on the client-side, then they don't need to be persistent. If observable queries are used for handling subscriptions between repositories, then the observable queries need to be persistent, to keep track of active queries.
Active queries vs. saved queries
There are two separate situations where Chandler might want to save a query persistently in a repository as some kind of Query Item. I want to distinguish between these two separate cases.
- saved queries:
- A user may save a query string to use again later.
- Example: A user might make some custom query like "select XX from Contacts where name is Smith". And then the user might want to bookmark this query -- saving the query to run again later. In this scenario the user is saving the query string, not saving the results of the query. That scenario is not a subject that this page is addressing.
- active queries:
- A user may subscribe to something that another user publishes.
- Example: Pablo launches Chandler and subscribes to a calendar of events that Rebekah publishes. Pablo's client creates a query object that represents the info that Pablo is subscribing to. The query object is saved in Pablo's repository. Pablo quits Chandler. When Pablo launches Chandler again next week, Chandler will still remember that Pablo is subscribed to Rebekah's calendar.
Active queries
Pablo's subscription query should automatically be saved as soon Pablo subscribes. Depending how things are set up, the subscription query may need to be saved in both Pablo's repository and Rebekah's repository. There should only be single query, with a single uid, and the canonical instance of the query should live in Pablo's repository. But there may be copy of it in Rebekah's repository.
When Pablo re-launches Chandler after a long weekend, Pablo's copy of Chandler looks at the list of the active queries, finds the subscription to Rebekah's calendar, and tries to establish a new connection to Rebekah's repository. Likewise, if Rebekah re-launches her copy of Chandler after a long weekend, her copy of Chandler will see that Pablo has a subscription, and Rebekah's copy of Chandler will try to establish a new connection to Pablo's repository.
Persistent information
Here's some of the information that might be kept in Pablo's repository and Rebekah's repository:
- Pablo's repository:
- the original query -- a description of the query criteria
- the result set -- pointers to the items in the result set
- status
- has an initial result set been returned?
- has only a partial result set been returned?
- when was the most recent refresh of the result set?
- should we poll Rebekah's repository, or wait to be notified of updates?
- Rebekah's repository:
- a copy of the original query, including a pointer to the original in Pablo's repository
- the result set -- pointers to the items in the result set
- status
- has an initial result set been returned?
- has only a partial result set been returned?
- when was the most recent refresh of the result set?
- should we send notifications to Pablo's repository, or wait to polled?
- queued notifications
- new result set changes that are queued to be sent to Pablo's repository as soon as there's a connection between the repositories
User Notifications #
For the most part, this whole idea about observable queries is all just an "under-the-covers" implementation mechanism. Normally the user wouldn't ever know about or care about the fact that there are observable queries kicking around in the code.
But even up at the user level there are queries. Some users will explicitly make queries, maybe even by typing in query strings in some query language. And a user will see views that show result sets, and the idea of a "result set" may be how users think about those views.
Philip's use case
Here's a relevant excerpt from a post by Philip Trauring to the design mailing list:
In Mail.app, like Eudora, it shows the mailboxes with unread e-mails in bold - but without a way to show me which mailboxes just received e-mail I don't know right away which mailboxes have new e-mail - mainly because in many of the mailboxes I never read all the e-mails. This forces me to mark all the e-mail in my mailboxes as read if I want to know if a new e-mail is received. This is not ideal because some times I want to go back and read through older messages and I'd like to know if I've read them already or not.
It would be nice to indicate when receiving mail which mailboxes have received new e-mails. What would also be nice is if there was a way to indicate if I've read the last message in the mailbox. It could be as simple as changing the color of the mailbox name.
And
Ducky responds:
We're pretty sure that we won't have folders in the classical sense, so I'm not sure exactly how we'd do this. We will probably have things that look like folders, but will really be stored queries. For example, the thing that looks like a folder named "Family" will really be a stored query for:
- all received messages with CATEGORY="Family"
I'm scratching my head a little bit as to how I'd implement a passive notification of "new message returned by this query since last time this stored query was run", and think it might need to be an agent's responsibility. I'll keep thinking about it.
Using observable queries Philip's use case
For the use case described in those posts, observable queries could be one way to solve the problem. Philip could set up his folders (aka mailboxes) however he wanted, using queries to sift his mail into different folders. (The queries themselves might be something he's quite conscious of, or the queries might get defined down in the works somewhere, so perhaps Philip only has to work with some "filtering agent", and the agent creates the queries for each folder.) In any case, a query gets created for each folder.
Philip wants to know two things about each folder: he wants to know if it as unread messages in it, and he wants to know if it has new messages in. With observable queries, the query automatically gets notified whenever its result set changes, so the query itself knows about new messages. Philip could use that as a notification trigger, and ask Chandler to send him a notification whenever there's new mail in the result set. But that might lead to a lot of notifications.
Alternatively, the UI of folder list could be set up to display some glyph (or highlighting, or whatever) whenever the folder has new contents. Or, more specifically, the folder displays the glyph if-and-only-if (the time of the last update notification to the observable query) is more recent than (the time the user last "looked at" the folder). Maybe "looked at" just means that the user opened the folder so that the contents got displayed in some list view, or maybe "looked at" means that the user explicitly marked the folder somehow to note that he's looked at it.
Other use cases
The use case above, about Philip's e-mail, is just one scenario. You can imagine other folders where the user would want to be notified about any change to result set of the observable query for the folder. Here are a few examples:
- Karen wants to be notified whenever new events are scheduled in her calendar.
- Jerry wants to be notified whenever new events are scheduled in Karen's calendar, or whenever those events are deleted or edited. Really, Jerry wants to be notified whenever there's any change to the result set of the query for this calendar.
- Michelle opens a find panel, and does a search for the terms "NASA" and "O-ring". First she searches her own e-mail archives, but that doesn't turn up anything. So then she searches her entire repository, but that still doesn't turn up much. Then she searches all of the published repositories of all of people in her department. She saves the search as a new folder, and Chandler diligently keeps searching, day and night. Whenever any new result gets added to the folder, Michelle wants to be notified immediately.
Motivations #
Here's a quick outline of my original motivations for suggesting that Chandler might want to have observable queries:
Network Efficiency
Strategies for improving network efficiency:
- minimize client polling
- Make sure that the repository knows enough about what the "client" machine wants, so that the repository can queue up notifications for the client, without the client needing to keep checking for changes.
- send changes
- Send changes, not data objects. The changes should normally be smaller than the data objects.
- don't re-send queries
- Avoid sending two or more identical queries. Make sure that client always knows what it's already asked for, so that it doesn't inadvertently ask for it again. If two views display the same data, the second view shouldn't need to send a query to the repository.
- don't re-send result sets
- Avoid sending two or more identical result sets. Make sure that a repository always knows what it's already sent, so that it doesn't resend information in response to a new query.
- avoid some queries entirely
- Try to have clients recognize situations when they can resolve a new query from their local cache, without having to even as the repository. For an example, see Query Subsets.
User Experience
- consistent behavior
- Help to ensure that all parcels behave consistently with regard to queries to updates, by building the parcels on a common framework.
- good behavior
- Help to ensure that all parcels handle queries and updates well, by providing a framework that does a good job.
Bugs
- factor code
- Minimize the number of bugs, by keeping query and update bugs from appearing in parcel code. Move that functionality (and those bugs) down into framework code. Debug the query stuff once, in the framework, rather than once in each parcel.
Developer Workload
- factor code
- Minimize developer workload by coding the query stuff once, in the framework, rather than once in each parcel that's ever developed.
References #
Excerpt from DB Topics: Cache Coherency:
- Cache coherency is the problem of keeping one or more caches in sync with a remote store or database, when the remote content changes through the agency of other clients. Theoretically, every client of a server database has (at least) a local cache of content corresponding to recent queries that are still in memory, and this is important when more than one client can write changes. Cache coherency matters more when the cache content lives longer, such as when using an explicit object cache (OsafDbObjectCache?). A client typically caches objects by an ID, and this ID is the same one used by the server. When a server changes any object, it can broadcast its ID to clients, so clients can discard local copies of that object in their caches when it has not yet been modified. Locally modified versions of objects should resolve conflicts with the new server version, either on the client or server side, when the object is next written. If every object has an integer version number corresponding to the number of times it has been written, and clients supply the old version number when writing an object, this would simplify merging of independent writes on the server side. A server might broadcast every time an object is changed, or it might batch together multiple changes in order to reduce the amount of network traffic. But batching has the effect of increasing the time between object changes in server and a correct view of this each client. However, the latency between server changes and accurate views in clients is something we accept anyway when we avoid locking (OsafDbLocking?) in order to reserve objects for one client to change (locking is a nest of worms). When caches contain read-only objects only, latency in coherency only causes objects that are locally stale for some period of time, and one can choose an upper bound for how long staleness is tolerated. When caches contain writable objects, latency in coherency makes it easier for multiple writers to be in conflict -- but as long as we already plan to resolve write conflicts (because we even plan to support synchronization generally: OsafDbSynchronization?), we have a well-defined scheme for resolving the conflicts.
Excerpt from
March 2003 DB Plan: Copying:
- Some Chandler features involved copying content from one repository to another, such as to implement replication and synchronization. While this kind of feature might be implemented inside the repository, it can also be implemented by a Chandler client. There are good reasons to do it this way. For example, it makes a repository simpler. ... Letting a Chandler client implement replication and synchronization might also work better in conjunction with security features hiding content a client is not allowed to see. Content should only be replicated and synchronized when permissions exist to make such copies.
Excerpt from
March 2003 DB Plan: Summary:
- Replication and synchronization do not happen in the database per se. These features are application based, and are initiated by clients without central control. This means a database does not have semantics of distributed transparency, and does not make large guarantees about distributed consistency. Replication and synchronization are P2P algorithms.
Excerpt from
March 2003 DB Plan: Network:
- The caching layer needs to keep track of outstanding IDs that have been requested from the server which have not yet arrived. That way multiple demands for the same object will not issue multiple redundant requests. Objects already requested in a batch load need not be requested again when an individual object is accessed.
Excerpt from
DB Topics: Middle Tier:
- the Chandler repository design might split architecture into client and server parts for pure data storage features. A repository server aims to store data, and need not have any executable code like agents scripts. However, it makes sense to host agents scripts in a server context someplace, even if it is not the repository server. So we expect another kind of server in a middle tier architecture which runs code like scripts for agents, on behalf of clients while accessing content from the repository server.
Excerpt from
RAP API: Questions:
- Question: Will RAP provide some kind of mechanism to let a RAP client "subscribe" to get notified about changes to an item? (BrianDouglasSkinner - 26 Feb 2003)
- Answer: Yes, definately. A database trigger mechanism is being designed now, and RAP will reflect that functionality. RAP is going to use BEEP for a transport layer which supports asyncronous messaging which allows for notifications. There are some significant challenges though:
- What happens to notifications when a client isn't connected and can't be reached?
-
- How do large servers deal with triggers? It may be unworkable to require a large server to support triggers. In this case what other mechanism can the client use? (LouMontulli - 27 Feb 2003)
- Question:
- For example, let's say a RAP client uses search or retrieve to get an event in a calendar, and then the event is displayed to some user (Pat) in the UI. If another user (Chris) on another machine changes the end time of the event, how does the UI code on Pat's client find out that the event needs to be redisplayed? Would it be good to have some kind of mechanism that allowed a RAP client to automatically subscribe for notifications about all of the items that the RAP server has returned to the client?
- Or, not even considering issues about multi-user concurrent access, let me just offer a single-user example. Say I'm using Chandler, and I have a couple views open: a calendar day view and a calendar week view. If I create a new event in the day view, how does the week view get notified that there now exists a new event it should be displaying?
- And, if we are talking about general mechanism for managing notifications about changes to query results, then is that related to the issue that people are talking about in these posts on the design list: lists: "Knowing when you've read the most recent e-mail in a mailbox" and lists: "Recognizing a response, and group filters". (BrianDouglasSkinner - 26 Feb 2003)
- Answer: This is a great example. There are multiple ways this could be implemented. I suspect that we will need to use a few of these methods in order to deal with firewall issues:
- If a RAP connection is active, database change notifications could be sent
- triggers could be used to send a notification via:
- RAP
- a jabber message
- email
- The client could periodicaly poll the server to look for changes. (LouMontulli - 27 Feb 2003)
Contributors
Discussion
David Jeske's comments on Wintermute data change notifications
Sept 2003 Dev list discussion
- Andrew Francis: orignal post
- BrianDouglasSkinner: Reply
- Andrew Francis: follow-up post A from Andrew Francis
- Andrew Francis: follow-up post B from Andrew Francis
- BrianDouglasSkinner: reply A
- BrianDouglasSkinner: reply B
Tinderbox Agents
Your observable queries sound almost identical to the Agents in Tinderbox. Agents are a special kind of container that constantly looks through the document, finding notes that match criteria you specified. If it finds notes, it makes an alias to those notes inside itself. (
TinderBox? as an outliner allows you to have aliases to other items in the outline as leaf items, so your outline isn't just a tree.)
There's a good walkthrough of setting up Agents to manage a todo list at
http://radio.weblogs.com/0100524/stories/2002/06/08/TinderDo.html
For anyone wondering why it is so important to think about this kind
of stuff now, Joel's article on Leaky Abstractions is worth reading.
http://www.joelonsoftware.com/articles/LeakyAbstractions.html
I've not had the time to read this page in the detail I'd like so apologies if these ideas are covered and I missed it.
In general, I think Brian's done an excellent job in providing some detailed scenarios.
The
ObservableQueries issue has similarities to a problem I'm thinking about at present for graphing scientific data which may vanish but still be referred to by a configuration file.
My solution has some generalities that may be useful:
- if you can't supply some data, supply fake data that can be used in its stead, but try to have an architecture which can make it known to be fake (eg: display something with special icon and text style to indicate it is based on stale data)
- make it possible for the user to find out why something didn't happen the way they expected - if you display a marker to indicate data is stale, that could lead to an explanation of a server being offline
- compromise now but fix things later, if possible - rather than copying stale data and leaving it like that, a dynamic source like ObservableQueries can refresh the data when it becomes available later.
--
AndyDent - 09 Oct 2003
I've designed a solution for this before, based on WebDAV. The name was Active Search Folders (hey, everything was active in those days). The client machine would send a MKCOL request to the WebDAV server and create a new folder. The body of the MKCOL request was a SQL statement (or other search syntax) and a couple command options. If the server allowed the request, the server would then keep the contents of the folder up to date so that PROPFIND queries to the folder would always see the current matches. Multiple users could browse the set of search folders and all use them.
In a peer-to-peer model, you can still have one of the peers (the one with the content) act as the content server and host the active search folder. Note also that the local user (the one sharing the content) could create the search folder through internal
APIs, then send the URL for the search folder to the other user browsing the content.
It's a little more difficult for a remote viewer without write permission to craete and use search folders (by "viewer" I mean the agent viewing the content, as opposed to the repository hosting the content). Possibly we could have search folders that were saved in the viewer's repository, but periodically polled the content-hosting repository for changes. Alternatively, we could allow liberal creation of search folders even when the requestor does not have write permission on anything else in the repository.
-
LisaDusseault - 26 Apr 2004