Time for the first post in the Inside the Storage Engine series. I'm going to focus on SQL Server 2005 in this series and I'll point out major differences between 2005 and previous versions. Please drop me a line if there's something you'd like to see explained and demo'd.
Before jumping into how things work, I'd like to go over two commands I'll be using a lot - DBCC PAGE and DBCC IND. These are both undocumented and unsupported commands, but are very safe to use as they're used extensively inside and outside Microsoft when troubleshooting. Nevertheless, use at your own risk. They're quite well known in the SQL community and I and others have publicized them before (I even demo'd them last year at ITForum in Spain).
To illustrate their use, I'm going to use a simple script I wrote to prove that page splits never roll back. I was having a discussion with someone a while ago about this question and the answer is always no. A page split occurs when an insert or update has to happen at a certain point in an index page, and there's no room on the page to accomomodate the new or updated record. Page splits are done internally as separate 'system' transactions. Once a system transaction commits, it cannot be rolled back - even if the user transaction it was part of rolls back.
So, let's run through the script. First thing to do is create a database containing a table with an index (as page splits only happen in indexes).
USE MASTER;
GO
IF DATABASEPROPERTY (N'pagesplittest', 'Version') > 0 DROP DATABASE pagesplittest;
GO
CREATE DATABASE pagesplittest;
GO
USE pagesplittest;
GO
CREATE TABLE t1 (c1 INT, c2 VARCHAR (1000));
CREATE CLUSTERED INDEX t1c1 ON t1 (c1);
GO
Now I'm going to fill up a page in the index, but leave a gap in the c1 values so that I can force a page split by inserting the missing key value.
INSERT INTO t1 VALUES (1, REPLICATE ('a', 900));
INSERT INTO t1 VALUES (2, REPLICATE ('b', 900));
INSERT INTO t1 VALUES (3, REPLICATE ('c', 900));
INSERT INTO t1 VALUES (4, REPLICATE ('d', 900));
-- leave a gap at 5
INSERT INTO t1 VALUES (6, REPLICATE ('f', 900));
INSERT INTO t1 VALUES (7, REPLICATE ('g', 900));
INSERT INTO t1 VALUES (8, REPLICATE ('h', 900));
INSERT INTO t1 VALUES (9, REPLICATE ('i', 900));
GO
I can find out what the first index page is using the DBCC IND command:
DBCC IND ('pagesplittest', 't1', 1);
GO
This command list all the pages that are allocated to an index. Here's the output in this case:
The columns mean:
So you can see we've got a single page clustered index with an IAM page. Note that the page IDs returned may differ on your server. Let's look at the data page:
DBCC TRACEON (3604);
GO
DBCC PAGE (pagesplittest, 1, 143, 3);
GO
The traceflag is to make the output of DBCC PAGE go to the console, rather than to the error log. The syntax for DBCC PAGE is:
dbcc page ( {'dbname' | dbid}, filenum, pagenum [, printopt={0|1|2|3} ])
The filenum and pagenum parameters are taken from the page IDs that come from various system tables and appear in DBCC or other system error messages. A page ID of, say, (1:143) has filenum = 1 and pagenum = 143.
The printopt parameter has the following meanings:
The per-row interpretation works for all page types, including the allocation bitmaps. In our case, we asked for a detailed output. I'll explain the various parts of the output in a post about the anatomy of a page. Here's the output from DBCC PAGE, with a bunch of the repeated per-row info removed for brevity:
PAGE: (1:143)
BUFFER:
BUF @0x02C49720bpage = 0x05400000 bhash = 0x00000000 bpageno = (1:143)
bdbid = 8 breferences = 0 bUse1 = 22163
bstat = 0xc0010b blog = 0x32159bb bnext = 0x00000000PAGE HEADER:
Page @0x05400000m_pageId = (1:143) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x4 m_level = 0 m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 68 m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594042384384
Metadata: PartitionId = 72057594038386688 Metadata: IndexId = 1
Metadata: ObjectId = 2073058421 m_prevPage = (0:0) m_nextPage = (0:0)
pminlen = 8 m_slotCnt = 8 m_freeCnt = 744
m_freeData = 7432 m_reservedCnt = 0 m_lsn = (18:113:2)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = 0Allocation Status
GAM (1:2) = ALLOCATED SGAM (1:3) = ALLOCATED
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGEDSlot 0 Offset 0x60 Length 917
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Memory Dump @0x5C24C060
00000000: 30000800 01000000 0300f802 00110095 †0...............
00000010: 03616161 61616161 61616161 61616161 †.aaaaaaaaaaaaaaa<snip> I've removed this section to save space
00000380: 61616161 61616161 61616161 61616161 †aaaaaaaaaaaaaaaa
00000390: 61616161 61††††††††††††††††††††††††††aaaaa
UNIQUIFIER = [NULL]Slot 0 Column 1 Offset 0x4 Length 4
c1 = 1
Slot 0 Column 2 Offset 0x11 Length 900
c2 = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aSlot 1 Offset 0x3f5 Length 917
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Memory Dump @0x5C24C3F5
<snip> And again...
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
As you can see from the output, each row is 917 bytes long and there's only 744 bytes free (look at the m_freecnt value in the PAGE HEADER section). This means that we can't insert another row on that page of the same length - there just isn't space - but that's what we're going to do! Remember that this page currently has nine rows on it. Let's force a page split:
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (5, REPLICATE ('a', 900));
GO
Now we know there wasn't enough room so the page must have split. Let's check DBCC IND again to see if another page was allocated to the index - here's the output:
Two pages have been added - an index page and another data page. Before we added the extra row and caused the page split, the index only needed one page. Now that there are two data pages, there needs to be an index page to allow searches through the index b-tree. Let's take a look at the two data pages to see which rows are stored on which page. Doing DBCC PAGE on them shows that page (1:143) has 5 rows, with c1 values 1 through 5, and page (1:154) has the 4 rows with c1 values 6 through 9 (I'm not going to post all the DBCC PAGE output - that would make the post way too long and it gives you an incentive to try the commands out). This is what we'd expect, as the page split occurs at the insertion point, and the row being inserted is put onto the page that split.
Now let's rollback the user transaction and see what happens:
ROLLBACK TRAN;
GO
Running the DBCC PAGE commands again shows that the index structure remains the same as after the split. Page (1:154) has the 4 rows on it from the split and page (1:143) has the other rows but not the one we inserted in the explicit transaction.
So, proof that a page split is never rolled back. I'll be making much more use of these two DBCC commands in future posts and I'll do the page anatomy one later this week. Let me know if there's anything in particular you'd like to see described in this series.