接上篇:
SQL Server性能调教系列(4)--Profiler(上)
3.分析跟踪记录
在跟踪了一段时间之后,在文件中就会保存有跟踪的数据(包括IO,Duration,CPU,Reads,Writes,RowCounts等计数器),接下来就是把跟踪的数据加载到表并分析这些数据。可以选择在Profile中打开并检查这些跟踪数据,会有些限制,如不能完成太多的操作,大量重复的SQL语句,没有汇总。
3.1 加载数据到表(使用函数fn_trace_gettable返回表格形式的数据,作为范例只选择分析T-SQL代码和Duration查询的运行时间)
select CAST(textdata as nvarchar(max)) as tsql_code,duration into Workload from sys.fn_trace_gettable('C:\test\performancetrace_20100802.trc',NULL) as TT
3.2 汇总相同的SQL项
select tsql_code,SUM(duration) as total_duration from workload group by tsql_code
(由于我是在Production上面做的trace,考虑到系统的安全性,在此不便透露分析的SQL代码,实在很抱歉,各位朋友如有兴趣可在自己的测试环境中测试,讨论测试的结果)
问题:分组聚合后会看到逻辑上相同(参数不同)的查询会被分到不同的组,因为在筛选器中使用了不同的值。因为这些相同逻辑的SQL会使用相同的执行计划,应该聚合在一起才能准备的分析总的查询运行的时间。
3.3 问题处理方案一(大致分段截取)
通常情况下SQL语句都是Select+栏位,左边有很大一部分是相同的,根据SQL字符的长度,截取前一段来聚合。如取前50,100,150. 方法简单,容易操作,会聚合一部分数据,但是长度不太好取值,只能调整前缀的长度去测试。
select left(tsql_code,50) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,50) --or select left(tsql_code,100) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,100) --or select left(tsql_code,150) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,150)
3.4 问题处理方案二(复杂,精确,逻辑上相同的SQL,参数用通配符替代),这个方法是T-SQL查询技术内幕中介绍的方法,如果需要更加详细的说明,请阅读这本书,你会得到更多的启发。
(1) 模式化查询,它对于相同模式的查询是一样的。
建立函数:
CREATE FUNCTION [dbo].[fn_SQLSigTSQL] (@p1 NTEXT, @parselength INT = 4000) RETURNS NVARCHAR(4000) -- This function will replace the parameters with '#' -- This function is provided "AS IS" with no warranties, -- and confers no rights. -- Use of included script samples are subject to the terms specified at -- http://www.microsoft.com/info/cpyright.htm -- -- Strips query strings AS BEGIN DECLARE @pos AS INT; DECLARE @mode AS CHAR(10); DECLARE @maxlength AS INT; DECLARE @p2 AS NCHAR(4000); DECLARE @currchar AS CHAR(1), @nextchar AS CHAR(1); DECLARE @p2len AS INT; SET @maxlength = LEN(RTRIM(SUBSTRING(@p1,1,4000))); SET @maxlength = CASE WHEN @maxlength > @parselength THEN @parselength ELSE @maxlength END; SET @pos = 1; SET @p2 = ''; SET @p2len = 0; SET @currchar = ''; set @nextchar = ''; SET @mode = 'command'; WHILE (@pos <= @maxlength) BEGIN SET @currchar = SUBSTRING(@p1,@pos,1); SET @nextchar = SUBSTRING(@p1,@pos+1,1); IF @mode = 'command' BEGIN SET @p2 = LEFT(@p2,@p2len) + @currchar; SET @p2len = @p2len + 1 ; IF @currchar IN (',','(',' ','=','<','>','!') AND @nextchar BETWEEN '0' AND '9' BEGIN SET @mode = 'number'; SET @p2 = LEFT(@p2,@p2len) + '#'; SET @p2len = @p2len + 1; END IF @currchar = '''' BEGIN SET @mode = 'literal'; SET @p2 = LEFT(@p2,@p2len) + '#'''; SET @p2len = @p2len + 2; END END ELSE IF @mode = 'number' AND @nextchar IN (',',')',' ','=','<','>','!') SET @mode= 'command'; ELSE IF @mode = 'literal' AND @currchar = '''' SET @mode= 'command'; SET @pos = @pos + 1; END RETURN @p2; END
该函数参数为一个查询字符串和要分析的代码的长度,但会输入查询的签名,并用井号(#)替换所有的参数。测试结果如下:
select dbo.fn_SQLSigTSQL('select * from Sales.SalesOrderHeader where SalesOrderID=''43659'' and Status=''5'' ',500)
CLR在处理迭代/过程逻辑和字符串处理时比T-SQL效率高,下面介绍用CLR实现模式化查询。
a. 建立C#版的Classs Libary,函数如下:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Microsoft.SqlServer.Server; using System.Data.SqlTypes; using System.Text.RegularExpressions;
public partial class SQLSignature { // fn_SQLSigCLR [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)] public static SqlString fn_SQLSigCLR(SqlString querystring) { return (SqlString)Regex.Replace( querystring.Value, @"([\s,(=<>!](?![^\]]+[\]]))(?:(?:(?:(?# expression coming )(?:([N])?(')(?:[^']|'')*('))(?# character )|(?:0x[\da-fA-F]*)(?# binary )|(?:[-+]?(?:(?:[\d]*\.[\d]*|[\d]+)(?# precise number )(?:[eE]?[\d]*)))(?# imprecise number )|(?:[~]?[-+]?(?:[\d]+))(?# integer ))(?:[\s]?[\+\-\*\/\%\&\|\^][\s]?)?)+(?# operators ))", @"$1$2$3#$4"); } // fn_RegexReplace - for generic use of RegEx-based replace [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)] public static SqlString fn_RegexReplace( SqlString input, SqlString pattern, SqlString replacement) { return (SqlString)Regex.Replace( input.Value, pattern.Value, replacement.Value); } }
b. 加载.dll中间语言代码到DB
USE master; CREATE ASSEMBLY SQLSignature FROM 'C:\SQLSignature\SQLSignature\bin\Debug\SQLSignature.dll';
c. 注册函数fn_SQLSigCLR和fn_RegexReplace
CREATE FUNCTION dbo.fn_SQLSigCLR(@querystring AS NVARCHAR(MAX)) RETURNS NVARCHAR(MAX) WITH RETURNS NULL ON NULL INPUT EXTERNAL NAME SQLSignature.SQLSignature.fn_SQLSigCLR; GO CREATE FUNCTION dbo.fn_RegexReplace( @input AS NVARCHAR(MAX), @pattern AS NVARCHAR(MAX), @replacement AS NVARCHAR(MAX)) RETURNS NVARCHAR(MAX) WITH RETURNS NULL ON NULL INPUT EXTERNAL NAME SQLSignature.SQLSignature.fn_RegexReplace; GO
d. 注册完成之后,用下面代码测试:
SELECT dbo.fn_SQLSigCLR(tsql_code) AS sig_sql, duration FROM dbo.Workload;
结果的SQL全被模式化,井号(#)替代所有的参数。
(2) 以用上面建立的函数,模式化追踪的T-SQL语句,并分类汇总。
a. 以用查询签名,为每个字符串生成整数的校验和(CheckSum),方便以后的汇总计算,提高效率:
ALTER TABLE dbo.Workload ADD cs INT NOT NULL DEFAULT (0); GO UPDATE dbo.Workload SET cs = CHECKSUM(dbo.fn_SQLSigCLR(tsql_code)); CREATE CLUSTERED INDEX idx_cl_cs ON dbo.Workload(cs);
b. 用每个签名的检验和计算运行时间填充临时表#AggQueries,包括运行时间的百分比,以及运行时间降序的行号。
IF OBJECT_ID('tempdb..#AggQueries') IS NOT NULL DROP TABLE #AggQueries; GO SELECT cs, SUM(duration) AS total_duration, 100. * SUM(duration) / SUM(SUM(duration)) OVER() AS pct, ROW_NUMBER() OVER(ORDER BY SUM(duration) DESC) AS rn INTO #AggQueries FROM dbo.Workload GROUP BY cs; CREATE CLUSTERED INDEX idx_cl_cs ON #AggQueries(cs);
查询聚合之后临时表的内容,数据量会大大的减少,包含签名,总的运行时间,运行时间占总运行时间的半分比,排序序号。
c.筛选并匹配,使用APPLY运算符得到查询模式和一个示例查询。
WITH RunningTotals AS ( SELECT AQ1.cs, CAST(AQ1.total_duration / 1000. AS DECIMAL(12, 2)) AS total_s, CAST(SUM(AQ2.total_duration) / 1000. AS DECIMAL(12, 2)) AS running_total_s, CAST(AQ1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(AQ2.pct) AS DECIMAL(12, 2)) AS run_pct, AQ1.rn FROM #AggQueries AS AQ1 JOIN #AggQueries AS AQ2 ON AQ2.rn <= AQ1.rn GROUP BY AQ1.cs, AQ1.total_duration, AQ1.pct, AQ1.rn HAVING SUM(AQ2.pct) - AQ1.pct <= 90 -- percentage threshold ) SELECT RT.rn, RT.pct, S.sig, S.tsql_code AS sample_query FROM RunningTotals AS RT CROSS APPLY (SELECT TOP(1) tsql_code, dbo.fn_SQLSigCLR(tsql_code) AS sig FROM dbo.Workload AS W WHERE W.cs = RT.cs) AS S ORDER BY RT.rn;
4. 有了查询模式,示例查询,和占用时间的百分比例和排序。然后就可以着手优化。也可以通过类似的方式,找到造成大量结果集,大多数的I/O问题的查询模式。
四:总结
Perfiler是一个很好用的工具来追踪系统的性能和工作的负荷,从而准确的找到值得优化的SQL,提高效率,大大减少工作量。
附件下载:Server性能计数器.rar
参考:Microsoft SQL Server 2005技术内幕:T-SQL查询