When to use the NOLOCK hint in SQL Server

I frequently hear of, and see, developers and DBA’s using the NOLOCK hint within SQL Server to bypass the locking mechanism and return their data sets as soon as possible. There are times when this is OK, such as when you are running an ad hoc query and are only interested in approximate results. It is somewhat less OK to write this hint into application code and reports, unless you don’t actually care whether the data returned is accurate.

The big problem with NOLOCK is that the effects of using it are not fully understood by many of the coders who are using it. The common perception is that you’re simply reading uncommitted data, and the odd roll-back isn’t too much to worry about. If that was the full extent of the problem, then the developer would be fairly right – we tend not to roll back too often so don’t worry about it. However, there are more insidious side effects which are not generally understood. Effects caused by how the underlying database actually works.

To try and explain the true nature of issuing READ UNCOMMITTED selects, via NOLOCK, I have created an example so you can see the issue at work.

Here’s 2 scripts. SCRIPT 1 creates a table, puts some static data into it, then starts inserting lots of data. Each row is padded for realism, to get a few rows per block. The Primary Key is a “UNIQUEIDENTIFIER”, so we should expect to get the keys spread, and subsequent inserts into the same blocks as our initial inserts. This should generate some block splits – something that happens a lot in SQL Server.
SCRIPT1:

IF OBJECT_ID('dbo.test_table') IS NOT NULL
DROP TABLE dbo.test_table;
-- create a table and pad it out so we only get a few rows per block
CREATE TABLE dbo.test_table
(
 pk UNIQUEIDENTIFIER DEFAULT ( NEWID() ) NOT NULL
,search_col VARCHAR(10)
,count_col  INT
,padding CHAR(100) DEFAULT ( 'pad' )
);
alter TABLE dbo.test_table add constraint test_table_pk primary key clustered (pk);
DECLARE @LOOP1 INT
SET @LOOP1=0
WHILE (@LOOP1 < 100)
BEGIN
 SET @LOOP1=@LOOP1+1
 INSERT INTO dbo.test_table ( search_col, count_col ) VALUES('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1);
END;
select getdate(),sum(count_col) from dbo.test_table (NOLOCK) where search_col = 'THIS_ONE';
set nocount on
-- insert 100,000 rows, which should cause some lovely block splits as the PK will look to insert into the same block as the data we already have in there
-- we need to run the select in another windoow at the same time
DECLARE @LOOP INT
SET @LOOP=0
WHILE (@LOOP < 100000)
BEGIN
 SET @LOOP=@LOOP+1
 INSERT INTO dbo.test_table ( search_col, count_col ) VALUES ( CAST( RAND() * 1000000 AS CHAR) , 100000 )
END
select getdate(),sum(count_col) from dbo.test_table (NOLOCK) where search_col = 'THIS_ONE';

Output from SCRIPT1 – note that the 2 selects, before and after inserts, give the same output.

----------------------- -----------
2014-10-12 23:51:34.210 1000
---------------------- -----------
2014-10-12 23:51:53.490 1000

Whilst SCRIPT1 is running, run SCRIPT 2 in another window in the same database. It’s just repeating the same SELECT with (NOLOCK) over and over again. The WHERE clause doesn’t change, and the correct result set should never change… but due to the block splits we see it change. A lot. As the data from the block split is duplicated into the split block before cleanup on the old block, the NOLOCK, performing the READ UNCOMMITTED select sees the “data duplication” in the newly split block.
SCRIPT2:

set nocount on
DECLARE @LOOP INT
SET @LOOP=0
WHILE (@LOOP < 10000)
begin
 SET @LOOP=@LOOP+1
 select getdate(),sum(count_col) from dbo.test_table (NOLOCK) where search_col = 'THIS_ONE';
end;

Output from SCRIPT2 (trimmed)

2014-10-12 23:51:35.473 1000
.
2014-10-12 23:51:35.530 1000
2014-10-12 23:51:35.530 1000
2014-10-12 23:51:35.533 1005
2014-10-12 23:51:35.533 1000
2014-10-12 23:51:35.537 1000
2014-10-12 23:51:35.537 1000
2014-10-12 23:51:35.540 1003
2014-10-12 23:51:35.540 1000
2014-10-12 23:51:35.543 1001
2014-10-12 23:51:35.543 1000
2014-10-12 23:51:35.547 1000
2014-10-12 23:51:35.550 1000
2014-10-12 23:51:35.550 1000
2014-10-12 23:51:35.553 1000
2014-10-12 23:51:35.557 1006
2014-10-12 23:51:35.557 1003
2014-10-12 23:51:35.560 1000
2014-10-12 23:51:35.560 1000
.
2014-10-12 23:51:53.383 1000
2014-10-12 23:51:53.400 1000
2014-10-12 23:51:53.417 1004
2014-10-12 23:51:53.433 1001
2014-10-12 23:51:53.450 1000
2014-10-12 23:51:53.467 1002
2014-10-12 23:51:53.483 1000
2014-10-12 23:51:53.507 1000
Query was cancelled by user.

 
So, using the NOLOCK hint can return incorrect results, even if the data you are selecting is unchanged, unchanging, and NOT subject to rollback.
Locking is there for a reason. ACID transactions exist for a reason.
If you care about your data, you should try to access it correctly and treat it well, otherwise you have to ask if the code you are writing really has value. If it doesn’t have value, why are you storing the data in an expensive relational database, when you could use a freeware database engine or just pipe it straight to /dev/null – that’s really quick.

One solution to this problem is to change the locking method of SQL Server, and start using Read Committed Snapshot Isolation** mode. This allows readers to access the data without blocking writers or be blocked by writers. It works similarly to Oracle’s Multi-Version Concurrency Control, and (sweeping generalisation alert!) allows SQL Server to scale better.

**NOLOCK still “works” the same in this mode – it needs to be removed from your code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: