SQL Server 列存储索引性能总结(6)——聚集和非聚集列存储索引的压缩

接上文:SQL Server 列存储索引性能总结(5)——列存储等待信息,前面的文章主要集中在聚集列存储上,下面也是时候引入一下费聚集列存储索引的内容。

   这篇文章集中在不同列存储索引的“压缩”上面。还是使用ContosoRetailDW库做演示。对比一下两种列存储索引的压缩效率,其实我们应该已经知道结论了,不过不妨再看看过程。
   本文选择四个不同数据量,列也不相同的表做对比,尽可能覆盖多一点的应用场景。
   先删除所有索引和主键、外键:

-- 删除外键约束
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimAccount]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimCurrency]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimDate]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimEntity]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimProductCategory]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimScenario]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimChannel]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimCurrency]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimDate]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimProduct]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimPromotion]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimStore]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimCurrency]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimDate]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimProduct]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimStore]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimChannel]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimCurrency]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimDate]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimProduct]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimScenario]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimStore]

-- 删除主键约束

ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [PK_FactInventory_InventoryKey]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [PK_FactSalesQuota_SalesQuotaKey]

   然后对这四个表创建聚集列存储索引(下称CCI),并检查空间情况:

Create Clustered Columnstore Index CCI on dbo.FactStrategyPlan WITH (DATA_COMPRESSION = COLUMNSTORE);

Create Clustered Columnstore Index CCI on dbo.FactSales WITH (DATA_COMPRESSION = COLUMNSTORE);

Create Clustered Columnstore Index CCI on dbo.FactInventory WITH (DATA_COMPRESSION = COLUMNSTORE);

Create Clustered Columnstore Index CCI on dbo.FactSalesQuota WITH (DATA_COMPRESSION = COLUMNSTORE);

exec sp_spaceused 'dbo.FactStrategyPlan', true;
exec sp_spaceused 'dbo.FactSales', true;
exec sp_spaceused 'dbo.FactInventory', true;
exec sp_spaceused 'dbo.FactSalesQuota', true;

   空间分别为:

  • FactStrategyPlan:32256 KB
  • FactSales:47168 KB
  • FactInventory:87344 KB
  • FactSalesQuota:127000 KB
    SQL Server 列存储索引性能总结(6)——聚集和非聚集列存储索引的压缩_第1张图片

   接下来删除CCI,rebuild一下堆表减少碎片后,创建非聚集索引在堆表上:

-- 删除CCI:
Drop Index CCI on dbo.FactStrategyPlan;
Drop Index CCI on dbo.FactSales;
Drop Index CCI on dbo.FactInventory;
Drop Index CCI on dbo.FactSalesQuota;

--重建表以便减少碎片:
Alter table dbo.FactStrategyPlan Rebuild;
Alter table dbo.FactSales Rebuild;
Alter table dbo.FactInventory Rebuild;
Alter table dbo.FactSalesQuota Rebuild;


-- 创建非聚集列存储索引:
Create NonClustered Columnstore Index NCI
	on dbo.FactStrategyPlan (StrategyPlanKey, Datekey, EntityKey, ScenarioKey, AccountKey, CurrencyKey, ProductCategoryKey, Amount, ETLLoadID, LoadDate, UpdateDate) 	WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create NonClustered Columnstore Index NCI
	on dbo.FactSales (SalesKey, DateKey, channelKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, UnitCost, UnitPrice, SalesQuantity, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, SalesAmount, ETLLoadID, LoadDate, UpdateDate) 	WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create NonClustered Columnstore Index NCI
	on dbo.FactInventory (InventoryKey, DateKey, StoreKey, ProductKey, CurrencyKey, OnHandQuantity, OnOrderQuantity, SafetyStockQuantity, UnitCost, DaysInStock, MinDayInStock, MaxDayInStock, Aging, ETLLoadID, LoadDate, UpdateDate)
	WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create NonClustered Columnstore Index NCI
	on dbo.FactSalesQuota (SalesQuotaKey, ChannelKey, StoreKey, ProductKey, DateKey, CurrencyKey, ScenarioKey, SalesQuantityQuota, SalesAmountQuota, GrossMarginQuota, ETLLoadID, LoadDate, UpdateDate)
	WITH ( DATA_COMPRESSION = COLUMNSTORE);

-- 再次检查空间:
exec sp_spaceused 'dbo.FactStrategyPlan', true;
exec sp_spaceused 'dbo.FactSales', true;
exec sp_spaceused 'dbo.FactInventory', true;
exec sp_spaceused 'dbo.FactSalesQuota', true;

SQL Server 列存储索引性能总结(6)——聚集和非聚集列存储索引的压缩_第2张图片
   空间分别为:

  • FactStrategyPlan:191368 KB
  • FactSales:419248 KB
  • FactInventory:720296 KB
  • FactSalesQuota:635408 KB

   我们可以看出CCI确实比NCI提升很多。不过我使用的是SQL Server On Linux 2019,如果是2014/2016,可能压缩效果没有那么高。
SQL Server 列存储索引性能总结(6)——聚集和非聚集列存储索引的压缩_第3张图片

再次提醒本文以空间为主,非聚集索引不管是列存储还是行存储,都不仅仅是对空间有作用,更多的是对查询的性能起作用。

   接下来再做一组测试,可以看出CCI和NCI在压缩算法上的不同,同时CCI还支持更新,多种数据类型等特点。

-- 删除非聚集列存储索引:
Drop Index NCI on dbo.FactStrategyPlan;
Drop Index NCI on dbo.FactSales;
Drop Index NCI on dbo.FactInventory;
Drop Index NCI on dbo.FactSalesQuota;


-- 创建"行存储"唯一聚集索引,使用页压缩算法
create clustered Index UCI on dbo.FactStrategyPlan (StrategyPlanKey ASC)WITH ( DATA_COMPRESSION = PAGE);

create clustered Index UCI on dbo.FactSales (SalesKey ASC)	WITH ( DATA_COMPRESSION = PAGE);

create clustered Index UCI on dbo.FactInventory (InventoryKey ASC) WITH ( DATA_COMPRESSION = PAGE);

create clustered Index UCI on dbo.FactSalesQuota (SalesQuotaKey ASC) WITH ( DATA_COMPRESSION = PAGE);


-- 创建非聚集"列存储"索引
Create NonClustered Columnstore Index NCI
	on dbo.FactStrategyPlan (StrategyPlanKey, Datekey, EntityKey, ScenarioKey, AccountKey, CurrencyKey, ProductCategoryKey, Amount, ETLLoadID, LoadDate, UpdateDate) WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create NonClustered Columnstore Index NCI 
	on dbo.FactSales (SalesKey, DateKey, channelKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, UnitCost, UnitPrice, SalesQuantity, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, SalesAmount, ETLLoadID, LoadDate, UpdateDate)
	WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create NonClustered Columnstore Index NCI
	on dbo.FactInventory (InventoryKey, DateKey, StoreKey, ProductKey, CurrencyKey, OnHandQuantity, OnOrderQuantity, SafetyStockQuantity, UnitCost, DaysInStock, MinDayInStock, MaxDayInStock, Aging, ETLLoadID, LoadDate, UpdateDate)
	WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create NonClustered Columnstore Index NCI
	on dbo.FactSalesQuota (SalesQuotaKey, ChannelKey, StoreKey, ProductKey, DateKey, CurrencyKey, ScenarioKey, SalesQuantityQuota, SalesAmountQuota, GrossMarginQuota, ETLLoadID, LoadDate, UpdateDate)
	WITH ( DATA_COMPRESSION = COLUMNSTORE);


-- 检查体积:
exec sp_spaceused 'dbo.FactStrategyPlan', true;
exec sp_spaceused 'dbo.FactSales', true;
exec sp_spaceused 'dbo.FactInventory', true;
exec sp_spaceused 'dbo.FactSalesQuota', true;


-- 删除非聚集列存储索引:
Drop Index NC_PK_FactStrategyPlan on dbo.FactStrategyPlan;
Drop Index NC_PK_FactSales on dbo.FactSales;
Drop Index NC_PK_FactInventory on dbo.FactInventory;
Drop Index NC_PK_FactSalesQuota on dbo.FactSalesQuota;


-- 创建聚集"列存储"索引:
Create Clustered Columnstore Index CCI on dbo.FactStrategyPlan WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create Clustered Columnstore Index CCI on dbo.FactSales WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create Clustered Columnstore Index CCI on dbo.FactInventory WITH ( DATA_COMPRESSION = COLUMNSTORE);

Create Clustered Columnstore Index CCI on dbo.FactSalesQuota WITH ( DATA_COMPRESSION = COLUMNSTORE);


-- 再次检查空间:
exec sp_spaceused 'dbo.FactStrategyPlan', true;
exec sp_spaceused 'dbo.FactSales', true;
exec sp_spaceused 'dbo.FactInventory', true;
exec sp_spaceused 'dbo.FactSalesQuota', true;

   结果如下:
SQL Server 列存储索引性能总结(6)——聚集和非聚集列存储索引的压缩_第4张图片

SQL Server 列存储索引性能总结(6)——聚集和非聚集列存储索引的压缩_第5张图片
   这一次,压缩率没有了那么高。因为收到了传统行存储索引的影响,其算法跟“基础数据”都会导致列存储的压缩效果。
SQL Server 列存储索引性能总结(6)——聚集和非聚集列存储索引的压缩_第6张图片

下一篇:SQL Server 列存储索引性能总结(7)——导入数据到列存储索引的Delta Store

你可能感兴趣的:(Azure,SQL,DB,列存储)