数据库系统异常排查之DMV(转)

一.  从数据库连接情况来判断异常:

1. 首先我们来看一下目前数据库系统所有请求情况:

select s.session_id,s.status,db_name(r.database_id) as database_name,s.login_name,s.login_time,

s.host_name,c.client_net_address,c.client_tcp_port,s.program_name, r.cpu_time, r.reads,

r.writes,c.num_reads,c.num_writes,s.client_interface_name, s.last_request_start_time, s.last_request_end_time,c.connect_time,

c.net_transport, c.net_packet_size,r.start_time, r.status, r.command,r.blocking_session_id, r.wait_type,r.wait_time,

r.last_wait_type, r.wait_resource, r.open_transaction_count,r.percent_complete,r.granted_query_memory from

Sys.dm_exec_requests r with(nolock)right outer join Sys.dm_exec_sessions s  with(nolock)on r.session_id = s.session_id right

outer join Sys.dm_exec_connections c  with(nolock)on s.session_id = c.session_id where s.session_id >50 order by s.session_id


这个查询将目前数据库中的所有请求都显示出来了,其中比较重要的有Status、Login_name、Host_Name,Client_Net_Address、Program_name

等,但是信息比较多,我们很难查看有什么异常,但是可以通过一图中红色圈的数字:441 初步判断连接数是否超过了平时的标准(很多时候

系统异常是连接

数过多造成的,而连接数过多又是因为其他原因影响的)。

2. 哪个用户连接数最多:

select login_name,COUNT(0) user_count from Sys.dm_exec_requests r with(nolock)right outer join Sys.dm_exec_sessions s  with

(nolock)on r.session_id = s.session_id right outer join Sys.dm_exec_connections c  with(nolock)on s.session_id = c.session_id

where s.session_id >50 group by login_name order by 2 desc
从图中我们可以很方便的看出用户连接数情况,如果我们的不同的功能是使用不同的的数据库账号的话,就能初步判断是哪部分功能可能出现

了异常。
select s.host_name,c.client_net_address,COUNT(0) host_count from Sys.dm_exec_requests r with(nolock) right outer join

Sys.dm_exec_sessions s  with(nolock)on r.session_id = s.session_id right outer join Sys.dm_exec_connections c  with(nolock)on

s.session_id = c.session_id where s.session_id >50 group by host_name,client_net_address order by 3 desc

这个查询能够一下就帮我们找出来哪些机器发起了对数据库的链接,它们的链接数量是否有异常;这个其实对调查某些问题非常有用,我有一

次就遇

到一个case:

用户反映,过一两个星期,系统就会出现一次异常,出问题时数据库连接数量很高,大量的访问被数据库拒绝,过半个小时左右,系统又自动

恢复了,但是

在数据库里面查看,并没有发现有异常的进程和错误的信息,问题一时很棘手,很难定位,系统不稳定领导不满,DBA顶着压力一时不知道如何

是好;后面

转换方向,通过调查问题发生时,为什么会产生这么多连接,这些连接是那些机器发过来的,这些连接发过来正常吗,是数据库不砍业务的重

负,还是业务

在某个时间段内会出现暴涨等一系列原因,最终找出是一台Web因为开发人员代码写的有问题,内存出现内存泄露,导致大量的连接不能释放,

出问题是,

发出的数据库连接数比平时高3-4倍,最终影响到了数据库,问题压根和数据库没关系(从这个事实看出,DBA真是的炮灰角色,不是自己的问

题,也得顶

着压力调查出原因呀);如果在类似问题发生时,我们能通过这个查询及早知道问题是出在某台Web机器上,那就不用费尽心力来调查数据库了

4. 这些连接在访问哪个库:

select db_name(r.database_id) as database_name,COUNT(0) host_count from Sys.dm_exec_requests r with(nolock) right outer join

Sys.dm_exec_sessions s  with(nolock)on r.session_id = s.session_id right outer join Sys.dm_exec_connections c  with(nolock)on

s.session_id = c.session_id where s.session_id >50 group by r.database_id order by 2 desc
5. 进程状态:
select s.status,COUNT(0) host_count from Sys.dm_exec_requests r with(nolock) right outer join Sys.dm_exec_sessions s  with

(nolock)on r.session_id = s.session_id right outer join Sys.dm_exec_connections c  with(nolock)on s.session_id = c.session_id

where s.session_id >50 group by s.status order by 2 desc
结果(running数比较多,表面数据库压力比较大):
二. 从阻塞情况来判断异常

select t1.resource_type as [lock type] ,db_name(resource_database_id) as [database],t1.resource_associated_entity_id as [blk

object] ,t1.request_mode as [lock req] ,t1.request_session_id as [waiter sid],t2.wait_duration_ms as [wait time] ,(select

text from sys.dm_exec_requests as r with(nolock) cross apply sys.dm_exec_sql_text(r.sql_handle)  where r.session_id =

t1.request_session_id) as waiter_batch,(select substring(qt.text,r.statement_start_offset/2,(case when r.statement_end_offset

= -1 then len(convert(nvarchar(max), qt.text)) * 2     else r.statement_end_offset end - r.statement_start_offset)/2+1) from

sys.dm_exec_requests as r with(nolock) cross apply sys.dm_exec_sql_text(r.sql_handle) as qt where r.session_id =

t1.request_session_id) as waiter_stmt ,t2.blocking_session_id as [blocker sid] ,(select text from sys.sysprocesses as p with

(nolock) cross apply sys.dm_exec_sql_text(p.sql_handle)     where p.spid = t2.blocking_session_id) as blocker_stmt,getdate()

time   from sys.dm_tran_locks as t1 with(nolock) , sys.dm_os_waiting_tasks as t2 with(nolock) where t1.lock_owner_address =

t2.resource_address

2. 查看阻塞其他进程的进程(阻塞源头):
select  t2.blocking_session_id,COUNT(0) counts from sys.dm_tran_locks as t1 with(nolock) ,sys.dm_os_waiting_tasks as t2 with

(nolock) where t1.lock_owner_address = t2.resource_address group by blocking_session_id order by 2

3. 被阻塞时间最长的进程:

select top 10 t1.resource_type as [lock type] ,db_name(resource_database_id) as [database],t1.resource_associated_entity_id

as [blk object],t1.request_mode as [lock req],t1.request_session_id as [waiter sid],t2.wait_duration_ms as [wait time],

(select text from sys.dm_exec_requests as r with(nolock) cross apply sys.dm_exec_sql_text(r.sql_handle) where r.session_id =

t1.request_session_id) as waiter_batch,(select substring(qt.text,r.statement_start_offset/2,(case when r.statement_end_offset

= -1 then len(convert(nvarchar(max), qt.text)) * 2 else r.statement_end_offset end - r.statement_start_offset)/2+1) from

sys.dm_exec_requests as r with(nolock) cross apply sys.dm_exec_sql_text(r.sql_handle) as qt where r.session_id =

t1.request_session_id) as waiter_stmt ,t2.blocking_session_id as [blocker sid],(select text from sys.sysprocesses as p with

(nolock) cross apply sys.dm_exec_sql_text(p.sql_handle) where p.spid = t2.blocking_session_id) as blocker_stmt,getdate() time

from sys.dm_tran_locks as t1 with(nolock) , sys.dm_os_waiting_tasks as t2 with(nolock) where t1.lock_owner_address =

t2.resource_address order by t2.wait_duration_ms desc


 

你可能感兴趣的:(数据库异常分析)