Steven Brown

Zend Framework Performance Architecture Part 3: Denormalising Data

by on Jun.12, 2009, under Performance, PHP, Zend Framework

If you’re not already indexing your databases, head on over to http://hackmysql.com/documents and read everything there.

If you’re really looking to squeeze every last drop of performance out of your database then you can denormalise your data where otherwise you would need a join. Let’s look at an example query:

$productTable = new Product_Table();
$select = $productTable->select();
$select->from('product');
$select->join('category', 'category.id = product.category_id', array());
$select->where('category.active = ?', 1);
$select->where('product.active = ?', 1);
$products = $productTable->fetchAll($select);

In this query we are joining the product table with the category table. We only want products that are active, and that exist within active categories. Note the results will be Product_Row objects because I do not return any data from the category table.

In order to avoid the join here we can instead have something like this:

$productTable = new Product_Table();
$select = $productTable->select();
$select->from('product');
$select->where('product.category_active = ?', 1);
$select->where('product.active = ?', 1);
$products = $productTable->fetchAll($select);

So now each product row has a field that contains the category’s active value. This makes indexing much simpler, but now any time we change a category’s active field we need to go through and update all of the products for that category. This could get a bit tricky – if we are changing the category in several places throughout our code and we forget to update it in one place it will mess up everything.

Thankfully we already have something that can track changes to category rows in one place. As in the previous articles we can use _postInsert(), _postUpdate() and _postDelete() to update the product rows. Our Category_Row might look something like this:

class Category_Row extends Yewchube_Db_Table_Row_Abstract
{
    private function _updateProducts()
    {
        if (!isset($this->_cleanData['active']) || $this->_cleanData['active'] != $this->_data['active']) {
            $productTable = new Product_Table();
            $productTable->update(array(
                'category_active' => $this->active,
            ), 'category_id = ' . $this->id);
        }
    }
 
    protected function _postInsert()
    {
        $this->_updateProducts();
        parent::_postInsert();
    }
 
    protected function _postUpdate()
    {
        $this->_updateProducts();
        parent::_postUpdate();
    }
 
    protected function _postDelete()
    {
        $this->_updateProducts();
        parent::_postDelete();
    }
}

Note how we check if the active field has changed, if so we update the category_active field for all rows in the product table that are in the current category. One thing you will note here is that we are not calling the save() method on each row, meaning our previous caching mechanisms won’t be used. There are a couple of options we can use now, depending on your situation. If you expect that the number of product rows for each category would be quite small, you might actually fetch the products and update them individually, like so:

    private function _updateProducts()
    {
        if (!isset($this->_cleanData['active']) || $this->_cleanData['active'] != $this->_data['active']) {
            $productTable = new Product_Table();
            $products = $productTable->fetchAllInCategory($this->id);
            foreach ($products as $product) {
                $product->category_active = $this->active;
                $product->save();
            }
        }
    }

This could be quite intensive though if you expect a large number of rows. Instead you may have to clear the product row caches so that they will be re-fetched the next time they are needed. There are two different approaches here also, the first is to tag product row caching with the category ID and then clear all caches with that tag, although I have found some performance issues since the tags are not centrally indexed. Since we know the IDs of the products that need to be cleared I would instead recommend the following:

    private function _updateProducts()
    {
        if (!isset($this->_cleanData['active']) || $this->_cleanData['active'] != $this->_data['active']) {
            $cache = Zend_Cache::factory('Core', 'File', array(
               'lifetime' => 31536000, // One year
               'automatic_serialization' => true,
            ), array(
                'cache_dir' => '/my/cache/dir/',
                'file_name_prefix' => 'product', // The table name
                'hashed_directory_level' => 2,
            ));
            $productTable = new Product_Table();
            $products = $productTable->fetchAllInCategory($this->id);
            foreach ($products as $product) {
                $cacheName = 'id_' . (int)$id;
                $cache->remove($cacheName);
            }
        }
    }

By individually removing each cache we avoid a possibly exhaustive metadata file search. The caches are now cleared and will be refilled the next time we need one of these product rows. You can now denormalise your data with ease and rest assured the data will be updated properly.

Although the code in this and the previous article is effective, it’s also pretty ugly and complicated. In the next article we’ll look at using an observer pattern to decouple the models and make things simpler to maintain.


Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!