Alzabo (version 0.64) - Alzabo::ObjectCache

Index



NAME

Alzabo::ObjectCache - A simple in-memory cache for row objects.


SYNOPSIS

  use Alzabo::ObjectCache( store => 'Alzabo::ObjectCache::Store::Memory',
                           sync  => 'Alzabo::ObjectCache::Sync::BerkeleyDB',
                           sync_dbm_file => 'somefile.db' );


DESCRIPTION

This class exists primarily to delegate necessary caching operations to other objects.

It always contains two objects. One is responsible for storing the objects to be cached. This can be done in any way that the storing object sees fit.

The syncing object is responsible for making sure that objects in multiple processes stay in sync with each other, as well as within a single process. For example, if an object in process 1 is deleted and then process 2 attempts to retrieve the same object from the database, process 2 needs to be told (in this case via an exception) that this object is no longer available. Similarly if process 1 updates the database then if there is a cached object in process 2, it needs to know that it should fetch its data again.


IMPORT

This module is configured entirely through the parameters passed when it is imported.

Parameters

All parameters given will be also be passed through to the import method of the storing and syncing class being used.


LRU STORAGE

Any storage module can be turned into an LRU cache by passing an lru_size parameter to this module when using it.

For example:

  use Alzabo::ObjectCache( store => 'Alzabo::ObjectCache::Store::Memory',
                           lru_size => 100,
                           sync  => 'Alzabo::ObjectCache::Sync::BerkeleyDB',
                           sync_dbm_file => 'somefile.db' );


CACHING SCENARIOS

The easiest way to understand how the Alzabo caching system works is to outline different scenarios and show the results based on different caching configurations.

Scenario 1 - Single process - delete followed by select/update

In a single process, the following sequence occurs:

- A row object is retrieved.

- The row object's delete method is called, removing the data it represents from the database.

- The program attempts to call the row object's select or update method.

Results

Scenario 2 - Multiple processes - delete followed by select

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- Process 2 retrieves a row object for the same database row.

- Process 1 calls that object's delete method.

- Process 2 calls that object's select method.

Results

Scenario 3 - Multiple processes - delete followed by update

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- Process 2 retrieves a row object for the same database row.

- Process 1 calls that object's delete method.

- Process 2 calls that object's update method.

Results

Scenario 4 - Multiple processes - update followed by update

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- Process 2 retrieves a row object for the same database row.

- Process 1 calls that object's update method.

- Process 2 calls that object's update method.

- Process 1 calls that object's select method.

Results

Scenario 5 - Multiple processes - delete followed by insert (same primary key)

Assume two process, ids 1 and 2.

- Process 1 retrieves a row object.

- The row is deleted. In this case, it does not matter whether this happens through Alzabo or not.

- Process 2 inserts a new row, with the same primary key.

- Process 1 or 2 calls that object's select method.

Results

This example may seem a bit far-fetched but is actually quite likely when using MySQL's auto_increment feature with older versions of MySQL, where numbers could be re-used.

Summary

The most important thing to take from this is that you should never use the Alzabo::ObjectCache::Sync::Null class in a multi-process situation. It is really only safe if you are sure your code will only be running in a single process at a time.

In all other cases, either use no caching or use one of the other syncing classes to ensure that data really is synced across multiple processes.


RACE CONDITIONS

It is important to note that there are small race conditions in the syncing scheme. When data is requested from a row object, the row object first makes sure that it is up to date with the database. If it is not, it refreshes itself. Then, it returns the requested data (whether or or not it had to refresh). It is possible that in the time between checking whether or not it is expired that an update could occur. This would not be seen by the row object.

I don't occur this a bug since it is impossible to work around and is unlikely to be a problem. In a single process, this is not an issue. In a multi-process application, this is the price that is paid for caching.

If this is a problem for your application then you should not use caching.


SYNCING MODULES

The following syncing modules are available with Alzabo:

Alzabo::ObjectCache::Sync::Null

This module simply emulates the syncing interface without doing any actual syncing, though it does track deleted objects. This module is useful is you want to cache objects in a single process but you don't need the overhead of real syncing.

Alzabo::ObjectCache::Sync::BerkeleyDB

Alzabo::ObjectCache::Sync::SDBM_File

Alzabo::ObjectCache::Sync::DB_File

These three modules all use DBM files, via the relevant module, to do multi-process syncing. They are listed in order from fastest to slowest. Using DB_File is significantly slower than either BerkeleyDB or SDBM_File, which are both relatively fast.

They all take the same parameters:

Alzabo::ObjectCache::Sync::RDBMS

This module uses an RDBMS to do syncing. This does not need to be the same database as your data is stored in, though it could be.

If the database it is told to use does not contain the table it needs, it will use the Alzabo::Create modules to create it. If you have warnings turned on, this will cause a warning telling you that these modules were loaded, as having them loaded in any sort of persistent process is probably a waste of memory.

The table it stores data in looks like this:

  AlzaboObjectCacheSync
  ----------------------
  object_id       varchar(22)   primary key
  sync_time       varchar(40)

This modules take the following parameters:

Alzabo::ObjectCache::Sync::IPC

This module is quite slow and is included mostly for historical reasons (it was one of the first syncing modules made). I recommend against using it but if you must it takes the following parameters:


STORAGE MODULES

All of the storage modules may be turned into LRU caches by simply passing the lru_size parameter.

The following storage modules are included with Alzabo:

Alzabo::ObjectCache::Store::Null

This module mimics the storage interface without actually storing anything. It is useful if you want to use syncing without any storage.

Alzabo::ObjectCache::Store::Memory

This module simply stored cached objects in memory.

Alzabo::ObjectCache::Store::BerkeleyDB

This module stores serialized cached objects in a DBM file using the BerkeleyDB module.

It takes these parameters:

Alzabo::ObjectCache::Store::RDBMS

This module uses an RDBMS to do store. This does not need to be the same database as your data is stored in, though it could be.

For example, if you are using Oracle as your primary RDBMS, caching serialized objects in a MySQL database might be a performance boost.

If the database it is told to use does not contain the table it needs, it will use the Alzabo::Create modules to create it. If you have warnings turned on, this will cause a warning telling you that these modules were loaded, as having them loaded in any sort of persistent process is probably a waste of memory.

The table it stores data in looks like this:

  AlzaboObjectCacheStore
  ----------------------
  object_id       varchar(22)   primary key
  object_data     blob

The actual type of the object_data column will vary depending on what RDBMS you are using.

This modules take the following parameters:


Alzabo::ObjectCache METHODS

new

Returns

A new Alzabo::ObjectCache object.

fetch_object ($id)

Returns

The specified object if it is in the cache. Otherwise it returns undef.

store_object ($object)

Stores an object in the cache. This will not overwrite an existing object in the cache. To do that you must first call the delete_from_cache method.

is_expired ($object)

Returns

Whether or not the given object is expired.

is_deleted ($object)

Returns

A boolean value indicating whether or not an object has been deleted from the cache.

register_refresh ($object)

Tells the cache system that an object has refreshed its data from the database.

register_change ($object)

Tells the cache system that an object has updated its data in the database.

register_delete ($object)

This tells the cache that the object has been removed from its external data source. This causes the cache to remove the object internally. Future calls to is_deleted for this object will now return true.

delete_from_cache ($object)

This method allows you to remove an object from the cache. This does not register the object as deleted. It is provided solely so that you can call store_object after calling this method and have store_object actually store the new object.

clear

Call this method to completely clear the cache.


MAKING YOUR OWN SUBCLASSES

It is relatively easy to create your own storage or syncing modules by following a fairly simple interface.

Storage Interface

The interface that any object storing module needs to implement is as follows:

new

Returns

A new object.

fetch_object ($id)

Returns

The specified object if it is in the cache. Otherwise it returns undef.

store_object ($object)

Stores an object in the cache but should not overwrite an existing object.

delete_from_cache ($object)

This method deletes an object from the cache.

clear

Completely clears the cache.

Syncing Interface

Any class that implements the syncing interface should inherit from Alzabo::ObjectCache::Sync. This class provides most of the functionality necessary to handle syncing operations.

The interface that any object storing module needs to implement is as follows:

_init

This method will be called when the object is first created.

clear

Clears the process-local sync times (not the times shared between processes).

sync_time ($id)

Returns

Returns the time that the object matching the given id was last refreshed.

update ($id, $time, $overwrite)

This is called to update the state of the syncing object in regards to a particularl object. The first parameter is the object's id. The second is the time that the object was last refreshed. The third parameter tells the syncing object whether or not to preserve an existing time for the object if it already has one.


AUTHOR

Dave Rolsky, <autarch@urth.org>