Updating very large tables can be a time taking task and sometimes it might take hours to finish. In addition to this, it might also cause blocking issues.
Here are few tips to optimize the updates on large data volumes.
Removing index on the column to be updated.
Executing the update in smaller batches.
Disabling Delete triggers.
Replacing Update statement with a Bulk-Insert operation.
With that being said, let’s apply the above points to optimize an update query.
The code below creates a dummy table with 200,000 rows and required indexes.
01 |
CREATE TABLE tblverylargetable |
12 |
INSERT INTO tblverylargetable |
14 |
Replicate( 'Dummy' , 160), |
15 |
Replicate( 'Dummy' , 160)) |
20 |
ON tblverylargetable(col1) |
22 |
CREATE INDEX ix_col2_col3 |
23 |
ON tblverylargetable(col2) |
Consider the following update query which is to be optimized. It’s a very straight forward query to update a single column.
1 |
UPDATE tblverylargetable |
The query takes 2:19 minutes to execute.
Let’s look at the execution plan of the query shown below. In addition to the clustered index update, the index ix_col1 is also updated. The index update and Sort operation together take 64% of the execution cost.
1. Removing index on the column to be updated
The same query takes 14-18 seconds when there isn’t any index on col1. Thus, an update query runs faster if the column to be updated is not an index key column. The index can always be created once the update completes.
2. Executing the update in smaller batches
The query can be further optimized by executing it in smaller batches. This is generally faster. The code below updates the records in batches of 20000.
4 |
UPDATE TOP (20000) tblverylargetable |
The above query takes 6-8 seconds to execute. When updating in batches, even if the update fails or it needs to be stopped, only rows from the current batch are rolled back.
3. Disabling Delete triggers
Triggers with cursors can extremely slow down the performance of a delete query. Disabling After delete triggers will considerably increase the query performance.
4. Replacing Update statement with a Bulk-Insert operation
An update statement is a fully logged operation and thus it will certainly take considerable amount of time if millions of rows are to be updated.The fastest way to speed up the update query is to replace it with a bulk-insert operation. It is a minimally logged operation in simple and Bulk-logged recovery model. This can be done easily by doing a bulk-insert in a new table and then rename the table to original one. The required indexes and constraint can be created on a new table as required.
The code below shows how the update can be converted to a bulk-insert operation. It takes 4 seconds to execute.
8 |
INTO tblverylargetabletemp |
The bulk-insert can then be further optimized to get additional performance boost.
Reference: http://www.sqlservergeeks.com/blogs/AhmadOsama/personal/450/sql-server-optimizing-update-queries-for-large-data-volumes