By Rob Garrison Published: 6/14/2006 Reader Level: Intermediate Rated: 4.00 by 1 member(s). |
Introduction
The benefits and drawbacks of GUIDs as primary keys are generally known. Despite the drawbacks, some people use them to fulfill specific business requirements (the IDs are unique across multiple servers). For a concise explanation of the drawbacks, see the second paragraph of this blog entry by Marcus Mac Innes.
Thankfully, SQL Server 2005 introduces a new way to create GUIDs that generates them in sequence. This gives you the benefit of sequential IDs (like using an IDENTITY) with the uniqueness benefits of GUIDs.
There are some very important things to understand about this feature. From SQL Server 2005 Books Online:
"Creates a GUID that is greater than any GUID previously generated by this function on a specified computer."
"Important: If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID, and therefore access data associated with that GUID."
"The GUIDs generated by NEWSEQUENTIALID() are unique only within a particular computer if the computer does not have a network card."
"You can use NEWSEQUENTIALID() to generate GUIDs to reduce page contention at the leaf level of indexes."
Read that first sentence carefully, because some have interpreted it incorrectly. It means that if you don't have a network card, you could create GUIDs that match GUIDs created on another machine.
Let's see how this works by looking at GUIDs created one after the other on two different machines. This code will get us started:
CREATE TABLE t1 (EntryID uniqueidentifier DEFAULT NewID()) INSERT INTO t1 DEFAULT VALUES INSERT INTO t2 DEFAULT VALUES SELECT EntryID, CONVERT(binary(16), EntryID) AS ConvertBinary FROM t1 |
|
|
|
EntryID ConvertBinary |
|
|
|
EntryID ConvertBinary |
Based on the documentation, there would never be a collision between the GUIDs created on different machines.
As expected, multiple tables using NewSequentialID() will create sequential IDs. This is clear from a simple example.
CREATE TABLE dbo.t1 (col1 uniqueidentifier DEFAULT NewSequentialID()) -- Three inserts into t1 -- Alternating inserts into t2 and t3 SELECT CONVERT(binary(16), Col1) AS 't1Col1' FROM dbo.t1 |
t2Col1 t3Col1 |
I built a test database to compare NewID() GUIDs, NewSequentialID() GUIDs, and integer IDENTITY() as clustered primary (surrogate) keys. The basic setup is a credit card database. There are three tables: Customer, Card, and Purchase. Each table has an ID column, and each child table has a foreign key to the parent's primary key.
The script creates 100,000 customers, two cards per customer, and 100 purchases per card. (Note that the attached script has the 100,000 number commented out. Generating the full data set took almost 43 hours.)
|
Using a sequential GUID here saves significant space versus a standard NewID() GUID. The bigint is still much better than either of the GUID alternatives.
When inserting into an identity column, SCOPE_IDENTITY() will return the identity column value from the just-added record.
For SQL Server 2005, when inserting into a column with NewID() or NewSequentialID() as the primary (surrogate) key, you could retrieve the ID based on the natural key. But the exciting development in 2005 is the OUTPUT clause. Try this simple test:
CREATE TABLE t1 (EntryID uniqueidentifier DEFAULT NewID()) CREATE TABLE t2 (EntryID uniqueidentifier DEFAULT NewSequentialID()) INSERT INTO t1 OUTPUT INSERTED.* DEFAULT VALUES |
|
The OUTPUT clause gives you back the GUID that was just created. This is a clear advantage over doing a SELECT to look up the GUID based on data in the record.
I always want to understand how changes in design and implementation affect performance. (All tests were run on a single-processor desktop with a 2.66 GHz P4, 2 GB RAM, and a single-spindle 80 GB IDE drive running Windows 2003 Server and SQL Server 2005 SP1.)
In this simple test, the execution plan shows that the insert into Customer1 (NewSequentialID) takes 62% of the overall batch time (INSERT: 56%, SELECT: 6%), while the insert into Customer3 (bigint IDENTITY) takes only 38%. If you leave out the post-insert SELECT statements, the numbers are 60%/40%, so the OUTPUT statement is causing a hit by itself.
DECLARE @outputTblCustomer1 TABLE (CustomerID uniqueidentifier) -- Customer1 SELECT CustomerID FROM @outputTblCustomer1 -- Customer3 SELECT SCOPE_IDENTITY() AS CustomerID |
If you have avoided using GUIDs because of page splits and other issues, reconsider this option as you move to SQL Server 2005.
Because of the performance disadvantage of the ID retrieval, the larger amount of space required (16 bytes versus 8 bytes), and the inherent difficulty in using GUIDs, I believe bigint IDENTITY() still wins over NewSequentialID(). If you have some compelling reason to use GUIDs, then I suggest using NewSequentialID() rather than NewID() ... unless you cannot accept the predictability of NewSequentialID(), then use NewID().
It always comes down to making a decision based on your unique business requirements.