BIE |
PersistenceMAIN GOALSWhat are the goals of adding persistence to galaxy?
APPROACH The current plan is to build the functionality in an ad hoc manner into Galaxy, so that we can better understand the issues and difficulties. Perhaps then we can step back and consider how one would incorporate this functionality into a system in a more generic fashion. SPECULATIVE CLI SESSION What would this mean in terms of system interaction? Here are some *very* speculative cli session extracts :
>set datatree = new DataTree(datastream, tessellation, distfactory) >psave datatree >quit A few weeks later....
>prestore datatree >set sim = new MultiLevelSimulation(datatree, ...) > .... Or, if we wanted to know how we created the data tree:
>pinfo datatree Created: January 12th 1979 Created by: weinberg Type: DataTree* Galaxy Version: 0.9.45 UPO ID: 17938 Components used to build: datastream, tessellation, distfactory Dependent on: tessellation Objects dependent on datatree: simulation3, alistairssim... >pinfo tessellation .... Or, maybe we could export the object and send it to somebody else. This would have to include dependencies too, so that the person at the other end could successfully reconstruct the object.
>pexport datatree XMLFormat "datatree.xml" There comes a time when certain objects are no longer required. However, we must make sure consistency is maintained:
>premove tessellation This object cannot be deleted. The following objects depend on this object: datatree > premove datatree datatree was deleted. >premove tessellation tessellation was deleted. USER PERSISTENT OBJECTS The main new concept is the User Persistent Object, or UPO. I've found it hard to completely pin down a definition of a UPO, so bear with me in this description. UPOs are different from C++ objects, or traditional persistent objects in orthogonal persistence. UPOs are the abstract objects that a user would deal with when interacting with the system - in galaxy these objects correspond almost exactly to the objects that can be created in CLI. So UPOs are tessellations, data trees, simulations, streams (buffered), stream filters, and models. Some examples of objects in galaxy that are not UPOs: Nodes, MethodTable, SymbolTable, clivector. One of the important things to realize is that we are not keeping track of all C++ objects, and that only a few of the C++ classes would correspond to UPOs. The user only has interest in the high level objects related to the computation or experiment being performed. Eliot used the term "objects with semantic integrity" when describing UPOs. I'm not comfortable with this term because I'm not completely sure of its meaning. Perhaps someone can help me here? IMMUTABILITY Maintaining dependencies is much more straightforward when UPOs are immutable. The main advantage is that an object doesn't become invalid because something it was built from has changed. Some objects can be modeled very naturally as immutable objects. In Galaxy, tessellations are a great example of this -- once built, their structure does not change. However, some objects are inherently mutable -- a good example would be an output stream, which changes every time a new record is appended. We can get round this problem of mutable objects if we shift our requirement of immutability from objects to object states. We can model mutable objects as a series of immutable states, and we can model truly immutable objects as an object with a single state. If we were to record the state every time a mutable object changed, then this would approach would not be feasible. Fortunately, we are only interested in having a record at certain points in history - specifically at the point when the user requests that the state is saved. We can reduce the amount of data used when storing mutable object states by storing them as differences from another state. It might take a little longer to reconstruct object states, but this will be essential for objects such as streams, where the mutation is an append operation. PROPERTIES OF OBJECTS UPOs will have certain generic properties that are common to all UPOs. This is probably an incomplete list:
They will also have properties specific to their type:
NAMING, SEARCHING, FINDING, and STORING objects. There are many possible ways of archiving objects - we could use a database, a mixture of the file system and a database, or the file system alone. There are advantages and disadvantages to all of these approaches. Also, there are plenty of representations we can use to store object content - CDF(?), XML, CORBA.... EVOLUTION OF DATA STORAGE FORMAT While not a problem that will be encountered during initial development, it is prudent to also consider how objects and their data storage format will evolve. It is hoped that the abstract nature of UPOs will mean that implementation changes will not always lead to a changed storage format. For example, while the way that tessellations are implemented might change, the way that a tessellation is described in a file need not. Realistically, however, there will be points when the storage format will need to change. For minor changes it might be possible to provide convertors to upgrade stored objects, but where major changes are required then it may not even be possible to do this because there is no obvious mapping. GARBAGE COLLECTION. There will be issues of garbage collection to think about at some point. Obviously, we can't just keep on saving objects for ever - this will eventually swamp the disks. Garbage collection will be user driven at the highest level - we can't just start deleting saved objects the user has specifically saved away. However, there will be occasions where UPOs contain references to other UPOs that were not explicitly created by the user (eg a Frontier UPO being saved when a Simulation state is saved). We will have to clear up these orphaned objects. There is also the issue of maintaining consistency in the UPO store. We must protect the user from deleting UPOs that are required by other UPOs. Send suggestions, questions, and feedback to WEINBERG at ASTRO dot UMASS dot EDU. Documentation generated at Fri Mar 26 00:35:11 2010 by
|