Indexer optimization

Indexer Batching

Magento can increase the memory for processing a large amount of data by using memory engines instead of InnoDB. The algorithm increases the memory value for the max_heap_table_size and tmp_table_size MySQL parameters.

The interface BatchSizeManagementInterface provides the ability to set the MEMORY table size for indexer processes according to batch size and index row size.

namespace Magento\Framework\Indexer;

use Magento\Framework\DB\Adapter\AdapterInterface;

/**
 * Batch size manager can be used to ensure that MEMORY table has enough memory for data in batch.
 * @api
 */
interface BatchSizeManagementInterface
{
    /**
     * Ensure memory size for data in batch.
     *
     * @param AdapterInterface $adapter database adapter.
     * @param int $batchSize
     * @return void
     */
    public function ensureBatchSize(\Magento\Framework\DB\Adapter\AdapterInterface $adapter, $batchSize);
}

The interface IndexTableRowSizeEstimatorInterface calculates the memory size for all rows per entity in the index table. The entity can store several rows in an index table generated by different store dimensions, such as count of websites and customer groups.

namespace Magento\Framework\Indexer;

/**
 * Calculate memory size for entity according different dimensions.
 * @api
 */
interface IndexTableRowSizeEstimatorInterface
{
    /**
     * Calculate memory size for entity row.
     *
     * @return float
     */
    public function estimateRowSize();
}

Exceeding allocated memory size

When the allocated memory size for a temporary table will be greater than 20% of innodb_buffer_pool_size, the following message is written to the Magento log.

Memory size allocated for the temporary table is more than 20% of innodb_buffer_pool_size.

To prevent this error message, update innodb_buffer_pool_size or decrease the batch size value (which decreases memory usage for the temporary table).

Batching configuration

Batching is available for the following indexers:

Index name Configuration file Configured object Parameter name Default value
catalog_product_price (Product Price) Magento/Catalog/etc/di.xml Magento\Catalog\Model\ResourceModel\Product\Indexer\Price\BatchSizeCalculator batchRowsCount[‘default’] 5000
cataloginventory_stock (Stock) Magento/CatalogInventory/etc/di.xml Magento\CatalogInventory\Model\Indexer\Stock\Action\Full batchRowsCount[‘default’] 200
catalog_category_product (Category Products) Magento/Catalog/etc/di.xml Magento\Catalog\Model\Indexer\Category\Product\Action\Full batchRowsCount 100000
catalog_product_attribute (Product Attribute) Magento/Catalog/etc/di.xml Magento\Catalog\Model\ResourceModel\Product\Indexer\Eav\BatchSizeCalculator batchSizes[‘decimal’], batchSizes[‘source’] 1000, 1000

Changing the batch size can help you optimize indexer running time. For example, for a store with the following characteristics:

  • 10 websites
  • 10 store groups
  • 20 store views
  • 300 tier prices
  • About 40,000 products (of which 254 are configurable)

reducing the batch size for catalog_product_price indexer from 5000 to 1000 decreases the execution time from about 4 hours to less than 2 hours. You can experiment to determine the ideal batch size. In general, halving the batch size can decrease the indexer execution time.

The following examples illustrate how to define a custom batch size for configurable products. Add these samples to your {Your_Module_Name}/etc/di.xml.

....
<type name="Magento\Catalog\Model\ResourceModel\Product\Indexer\Price\BatchSizeCalculator">
    <arguments>
        <argument name="batchRowsCount" xsi:type="array">
            <item name="configurable" xsi:type="number">5000</item>
        </argument>
    </arguments>
</type>
...
<type name="Magento\CatalogInventory\Model\Indexer\Stock\Action\Full">
    <arguments>
        <argument name="batchRowsCount" xsi:type="array">
            <item name="configurable" xsi:type="number">200</item>
        </argument>
    </arguments>
</type>

Indexer Table Switching

Magento optimizes certain indexer processes to prevent deadlocks and wait locks caused by read/write collisions on the same table. In these cases, Magento uses separate tables for performing read operations and reindexing. As a result of this table switching process, customers are not impacted when you run a full reindex. For example, when catalog_product_price is reindexing, customers won’t be slowed down as they navigate on Categories pages, search products, or user layer navigation filters with price filters.

Magento uses the following tables to support table switching.

Indexer name Tables used
catalog_product_price catalog_product_index_price, catalog_product_index_price_replica
cataloginventory_stock cataloginventory_stock_status, cataloginventory_stock_status_replica
catalog_category_product catalog_category_product_index, catalog_category_product_index_replica
catalog_product_attribute (select, multiselect attributes) catalog_product_index_eav, catalog_product_index_eav_replica
catalog_product_attribute (decimal values) catalog_product_index_eav_decimal, catalog_product_index_eav_decimal_replica
catalogrule_rule catalogrule_product, catalogrule_product_replica, catalogrule_product_price, catalogrule_product_price_replica, catalogrule_group_website, catalogrule_group_website_replica

Make sure that these indexers are in “Update By Schedule” mode. If “Update On Save” mode is selected, some data can be lost if you make changes during full reindex.

The indexer table switching mechanism requires additional database storage.