Zend Framework Performance Architecture Part 2: Multiple Row Caching
by Steven Brown on Jun.10, 2009, under Performance, PHP, Zend Framework
In the last article I covered how to precache individual rows from a database in order to improve the performance of your application. While this is certainly useful for instances where we need just a single row, a lot of our queries are to retrieve multiple rows from the database. One of the problems with caching results from these queries is that if a single row’s details change, we need to scrap the entire cache and start again, otherwise we might display outdated information. This applies not only to the fields we are using in the query, but also fields we display to the user. Let’s look at a basic example:
$productTable = new Product_Table(); $products = $productTable->fetchAll(array( 'price < ?' => 50.00, ), 'price ASC', 10); foreach ($products as $product) { echo $product->name . ': $' . sprintf('%0.2f', $product->price) . '<br />'; }
In this example we want only the cheapest ten products that cost less than $50. While we only use the price field in the query, we also use the name field when displaying the product.
The first thing we should do to make caching easier is to move loose queries like this into methods of the model’s table class, for example:
class Product_Table extends Yewchube_Db_Table_Abstract { protected $_name = 'product'; protected $_primary = 'id'; protected $_rowClass = 'Product_Row'; public function fetchTenLessThanPriceSortByPriceAsc($price) { return $this->fetchAll(array( 'price < ?' => $price, ), 'price ASC', 10); } }
Now we would select products using the following:
$productTable = new Product_Table(); $products = $productTable->fetchThenLessThanPriceSortByPriceAsc(50.00); foreach ($products as $product) { echo $product->name . ': $' . sprintf('%0.2f', $product->price) . '<br />'; }
By moving this code into the model class we can modify the caching in one place instead of in every place where we need this query. In fact I prefer to always use methods for queries in case I want to add caching in the future.
Now let’s add caching to this query:
public function fetchTenLessThanPriceSortByPriceAsc($price, $useCache = true) { $cache = Zend_Cache::factory('Core', 'File', array( 'lifetime' => 3600, // One hour 'automatic_serialization' => true, ), array( 'cache_dir' => '/my/cache/dir/', 'file_name_prefix' => $this->_name . '_' . __FUNCTION__, // The table and method name 'hashed_directory_level' => 2, )); $cacheName = sha1($price); if (!$useCache || false === ($results = $cache->load($cacheName))) { $results = $this->fetchAll(array( 'price < ?' => $price, ), 'price ASC', 10); $cache->save($result, $cacheName); } return $results; }
So what we do here is store the result in cache for a period of one hour, and we hope that the data doesn’t change within that hour. After that hour the data will be grabbed from the database again instead of from cache.
Once again we have provided the $useCache trapdoor to allow us to force the data to be loaded from the database and stored in cache.
Right now everything is fine, if our site has heavy traffic and this query is used often the load will be dramatically reduced by loading the result from a file instead of from the database. The problem is if the data changes it could take anywhere up to an hour for the new data to be visible to users. In this situation it could lead to incorrect product names and prices being displayed.
In order to keep our data current we would need to clear this cache any time a product row’s data changed:
class Product_Row extends Yewchube_Db_Table_Row_Abstract { private function _updateCache() { $this->_table->clearFetchTenLessThanPriceSortByPriceAscCache(); } protected function _postInsert() { $this->_updateCache(); parent::_postInsert(); } protected function _postUpdate() { $this->_updateCache(); parent::_postUpdate(); } protected function _postDelete() { $this->_updateCache(); parent::_postDelete(); } }
We have added methods to monitor when a row’s data has changed, and when it does we call the clearFetchAllLessThanPriceSortByPriceAscCache() method on the row’s table. Notice we also call the parent class methods to ensure any code there will be executed. Let’s add clearFetchAllLessThanPriceSortByPriceAscCache() to the table:
class Product_Table extends Yewchube_Db_Table_Abstract { protected $_name = 'product'; protected $_primary = 'id'; protected $_rowClass = 'Product_Row'; public function fetchTenLessThanPriceSortByPriceAsc($price, $useCache = true) { $cache = Zend_Cache::factory('Core', 'File', array( 'lifetime' => 3600, // One hour 'automatic_serialization' => true, ), array( 'cache_dir' => '/my/cache/dir/', 'file_name_prefix' => $this->_name . '_' . __FUNCTION__, // The table and method name 'hashed_directory_level' => 2, )); $cacheName = sha1($price); if (!$useCache || false === ($results = $cache->load($cacheName))) { $results = $this->fetchAll(array( 'price < ?' => $price, ), 'price ASC', 10); $cache->save($result, $cacheName); } return $results; } public function clearFetchTenLessThanPriceSortByPriceAsc() { $cache = Zend_Cache::factory('Core', 'File', array( 'lifetime' => 3600, // One hour 'automatic_serialization' => true, ), array( 'cache_dir' => '/my/cache/dir/', 'file_name_prefix' => $this->_name . '_' . __FUNCTION__, // The table and method name 'hashed_directory_level' => 2, )); $cache->clean(Zend_Cache::CLEANING_MODE_ALL); } }
The new method is very simple, it creates the cache object and then removes everything that has been cached with the given prefix.
Now any time we change a product row’s data the appropriate cache is cleared so no outdated information will be displayed. Unfortunately if even a single field changes the entire cache is cleared, even though the query only uses the price field to determine the results.
One trick I use here is to select only the IDs from the database and then use the fetchRowById() method to retrieve the precached row data. When we do this we will only need to clear the cache if the relevant fields change, in this case the price, since the other data is kept up to date with precaching (see previous article).
First we need to add some methods to our Yewchube_Db_Table_Abstract class:
abstract class Yewchube_Db_Table_Abstract extends Zend_Db_Table_Abstract { public function fetchRowById($id, $useCache = true) { $cache = Zend_Cache::factory('Core', 'File', array( 'lifetime' => 31536000, // One year 'automatic_serialization' => true, ), array( 'cache_dir' => '/my/cache/dir/', 'file_name_prefix' => $this->_name, // The table name 'hashed_directory_level' => 2, )); $cacheName = 'id_' . (int)$id; if (!$useCache || false === ($result = $cache->load($cacheName))) { $result = $this->fetchRow(array( 'id = ?' => (int)$id, )); $cache->save($result, $cacheName); } else { if (is_object($result)) { $result->setTable($this); } } return $result; } public function selectId() { $select = $this->select(); $select->from($this->_name, array('id')); return $select; } public function fetchIds($select) { return $this->getAdapter()->fetchCol($select); } public function convertIdsToObjects() { $results = array(); foreach ($ids as $id) { $results[] = $this->fetchRowById($id); } return $results; } }
The first method we add here, selectId(), returns a Zend_Db_Table_Select object that has been modified to return only the ID field from the current table. The next allows us to retrieve just the IDs from the result as an array. The last allows us to convert the array of IDs into an array of objects by using fetchRowById() to retrieve the data from cache.
Let’s see how to utilise these methods in our query:
public function fetchTenLessThanPriceSortByPriceAsc($price, $useCache = true) { $cache = Zend_Cache::factory('Core', 'File', array( 'lifetime' => 3600, // One hour 'automatic_serialization' => true, ), array( 'cache_dir' => '/my/cache/dir/', 'file_name_prefix' => $this->_name . '_' . __FUNCTION__, // The table and method name 'hashed_directory_level' => 2, )); $cacheName = sha1($price); if (!$useCache || false === ($ids = $cache->load($cacheName))) { $select = $this->selectId(); $select->where('price < ?', $price); $select->order('price ASC'); $ids = $this->fetchIds($select); $cache->save($ids, $cacheName); } return $this->convertIdsToObjects($ids); }
Notice how now we store the array of IDs in the cache instead of the object data. This means we can now adjust our cache clearing to be more selective:
private function _updateCache() { if (!isset($this->_cleanData['price']) || $this->_cleanData['price'] != $this->_data['price']) { $this->_table->clearFetchTenLessThanPriceSortByPriceAscCache(); } }
Now instead of clearing the cache whenever any data changes, we have made it clear only if the price changes. Note while I possibly could have checked $this->_modifiedFields to see if the price field had changed, I have found that if you set the price field to the same as the existing data it will still be added as a modified field, so I prefer to compare the dirty and clean data to prevent unnecessary cache cleaning.
Note that as with any performance recommendation, the results will vary depending on your system resources and structure. In this example we only want ten rows, which is a small number similar to the number of items you might display on a single page, however if you are selecting a large number of rows it may be extremely slow or load intensive, negating any advantage. Be sure to monitor your application’s performance to ensure you are getting a positive performance gain.
In the next article we will cover how to take advantage of similar code to reliably denormalise you data to improve query speed.