今天有一个表要做分区,写代码时,就想总结一下。
至于为什么要分区,分区的时机选择,分区的理论依据等就不赘述了,请参考MSDN。直接上代码。
这其实就是今天我对一个测试库的分区时的代码。环境是SQL Server 2008 R2,在2008里表分区其实是有可视化实现功能的。
USE MASTER GO --40万行分成5个文件组,PRIMARY加下面四个 --文件组命名:FG_数据库名_表名_字段名_流水号 ALTER DATABASE TEST ADD FILEGROUP FG_TEST_Product_ID_1; ALTER DATABASE TEST ADD FILEGROUP FG_TEST_Product_ID_2; ALTER DATABASE TEST ADD FILEGROUP FG_TEST_Product_ID_3; ALTER DATABASE TEST ADD FILEGROUP FG_TEST_Product_ID_4; GO USE TEST GO --给每个文件组加个次数据文件 --文件命名:文件组名_data_流水号 ALTER DATABASE TEST ADD FILE ( NAME=N'FG_TEST_Product_ID_1_data_1', FILENAME=N'D:\Data\FG_TEST_Product_ID_1_data_1.ndf', SIZE=50MB, FILEGROWTH=10% ) TO FILEGROUP FG_TEST_Product_ID_1; ALTER DATABASE TEST ADD FILE ( NAME=N'FG_TEST_Product_ID_2_data_1', FILENAME=N'D:\Data\FG_TEST_Product_ID_2_data_1.ndf', SIZE=50MB, FILEGROWTH=10% ) TO FILEGROUP FG_TEST_Product_ID_2; ALTER DATABASE TEST ADD FILE ( NAME=N'FG_TEST_Product_ID_3_data_1', FILENAME=N'D:\Data\FG_TEST_Product_ID_3_data_1.ndf', SIZE=50MB, FILEGROWTH=10% ) TO FILEGROUP FG_TEST_Product_ID_3; ALTER DATABASE TEST ADD FILE ( NAME=N'FG_TEST_Product_ID_4_data_1', FILENAME=N'D:\Data\FG_TEST_Product_ID_4_data_1.ndf', SIZE=50MB, FILEGROWTH=10% ) TO FILEGROUP FG_TEST_Product_ID_4; GO --创建分区函数 --分区函数命名:fn_Partition_表名_字段 CREATE PARTITION FUNCTION fn_Partition_Product_ID(INT) AS RANGE RIGHT FOR VALUES(80000,160000,240000,320000); GO --创建分区架构 --分区架构命名:Sch_表名_字段名 CREATE PARTITION SCHEME Sch_Product_ID AS PARTITION fn_Partition_Product_ID TO ([PRIMARY],[FG_TEST_Product_ID_1],[FG_TEST_Product_ID_2],[FG_TEST_Product_ID_3],[FG_TEST_Product_ID_4]); GO /**一切准备工作都已经完成,但是我用的是一个生产环境中的库的附本, 所以这个表上除了主键索引,还有一些其它的索引。 接下来的操作,主题就是在一个现有的表上如何进行分区。 **/ USE [TEST] GO BEGIN TRANSACTION --移除聚集主键 ALTER TABLE [dbo].[Product] DROP CONSTRAINT [PK_Product] --重新添加聚集主键,并指定分区架构 ALTER TABLE [dbo].[Product] ADD CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Sch_Product_ID]([ID]) /** 以下是对这个表上其它的非聚集索引的处理:重建时指定分区架构。 在WITH子句中加入DROP_EXISTING = ON选项,重建时如存在同名索引,先删除后新建。 **/ /**这样做目的其实就是实现存储位置对齐:不同索引会按照相同的分区架构将数据分布到文件组. 但是个人认为,这样好处是在于多表的存储对齐而不是单表. 详情可参考:http://msdn.microsoft.com/zh-cn/library/ms345146(SQL.90).aspx **/ CREATE NONCLUSTERED INDEX [IX_Product_CategoryID_ID] ON [dbo].[Product] ( [CategoryID] ASC ) INCLUDE ( [ID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Sch_Product_ID]([ID]) CREATE NONCLUSTERED INDEX [IX_Product_ComID_IsAudit] ON [dbo].[Product] ( [ComID] ASC, [IsAudit] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Sch_Product_ID]([ID]) CREATE NONCLUSTERED INDEX [IX_Product_ComID_IsShelf_ShelfDate] ON [dbo].[Product] ( [ComID] ASC, [IsShelf] ASC, [ShelfDate] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Sch_Product_ID]([ID]) CREATE NONCLUSTERED INDEX [IX_Product_IA_IS_SD_ID_CID] ON [dbo].[Product] ( [IsAudit] ASC, [IsShelf] ASC, [ShelfDate] ASC ) INCLUDE ( [ID], [CategoryID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Sch_Product_ID]([ID]) COMMIT TRANSACTION
分区表就就完成了,这些代码花了小部分时间,但是我想查看一下分区后数据在文件组中的分布
结果花了大部分时间在这个上面。
--创建这个函数的目的是传入分区边界值,就能得到这个分区对应的文件组. -- ============================================= -- Author: Joe.TJ -- Create date: 20111130 -- Description: 获取各分区表中各分区对应的文件组 -- ============================================= CREATE FUNCTION dbo.fn_GetFileForPartition ( @schemeName NVARCHAR(100),--分区架构名称 @rangeValue int--分区的边界值 ) RETURNS TABLE AS RETURN ( SELECT DDS.data_space_id AS [filegroup_id],DDS.destination_id AS [partition_number], DS.name as [filegroup_name] FROM sys.destination_data_spaces AS DDS JOIN sys.data_spaces AS DS ON DS.data_space_id=DDS.data_space_id WHERE DDS.partition_scheme_id=(SELECT data_space_id FROM sys.partition_schemes WHERE name=@schemeName) AND DDS.destination_id=$PARTITION.fn_Partition_Product_ID(@rangeValue-1) ) GO --这个查询是要查询得我想要的信息,别名A表中得到分区段的基本信息, --CROSS APPLY就是上面的函数,得到文件组的信息. SELECT A.*,B.[filegroup_name] FROM ( SELECT TOP 100 PERCENT $PARTITION.fn_Partition_Product_ID(ID) AS Partition_Num, MIN(ID) AS MinValue,MAX(ID) AS MaxValue,COUNT(1) as Row_Num FROM dbo.Product GROUP BY $PARTITION.fn_Partition_Product_ID(ID) ORDER BY $PARTITION.fn_Partition_Product_ID(ID) ) AS A CROSS APPLY dbo.fn_GetFileForPartition(N'Sch_Product_ID',A.MaxValue) AS B --将分区最大值当边界值传入