Common scenarios to avoid in OLTP

Database Design

 

Rule

Description

Value

Source

Problem Description

1

High Frequency queries having a high number of   table joins.

>4

Sys.dm_exec_sql_text

Sys.dm_exec_cached_plans

High frequency queries with lots of joins can be   too normalized for high OLTP scalability.

2

Frequently updated tables having # indexes.

>3

Sys.indexes

sys.dm_db_index_operational_stats

Excessive index maintenance for OLTP.

3

Big IOs

Table Scans

Range Scans

>1

Perfmon object

SQL Server Access Methods

Sys.dm_exec_query_stats

A missing index flushes the cache.

4

Unused Indexes.

Index not in Sys.dm_db_index_usage_stats.  If an index is NEVER used, it will not   appear in the DMV sys.dm_db_index_usage_stats

Avoid Index maintenance for unused indexes.

 

CPU

 

Rule

Description

Value

Source

Problem Description

1

Signal Waits

>25%

Sys.dm_os_wait_stats

Time in runnable queue is pure CPU wait.

2

Plan reuse

<90%

Perfmon object

SQL Server Statistics

OLTP identical transactions should ideally   have >95% plan reuse.

3

Parallelism: Cxpacket waits

>5%

Sys.dm_os_wait_stats

Parallelism reduces OLTP throughput. CXPACKET   indicates that multiple CPUs are working in parallel, dividing up the query   in smaller pieces. Ordinarily a well tuned OLTP application would not   parallelize unless an index is missing, there is an incomplete WHERE clause,   or the query is not a true OLTP   transaction.

 

Memory

 

Rule

Description

Value

Source

Problem Description

1

Page life expectancy

<300 sec

Perfmon object

SQL Server Buffer Manager

SQL Server Buffer Nodes

Page life expectancy is the average number of   seconds a data page stays in cache.    Low values could indicate a cache flush that is caused by a big read.  Pure OLTP workloads do NOT issue big reads,   thus possible missing index.

2

Page life expectancy

Drops by 50%

Perfmon object

SQL Server Buffer Manager

Page life expectancy is the average number of   seconds a data page stays in cache.    Low values could indicate a cache flush that is caused by a big read.  Pure OLTP workloads do NOT issue big reads,   thus possible missing index.

3

Memory Grants Pending

>1

Perfmon object

SQL Server Memory Manager

Current number of processes waiting for a   workspace memory grant.

4

SQL cache hit ratio

<90%

SQL cache hit ratio falls under 90% for   sustained periods of time greater than 60 sec.

It is likely that large scans have to be   performed, which in turn flushes out the buffer cache.

 

IO

 

Rule

Description

Value

Source

Problem Description

1

Average Disk sec/read

>20 ms

Perfmon object

Physical Disk

Reads should take 4-8 ms without any IO   pressure.

2

Average Disk sec/write

>20 ms

Perfmon object

Physical Disk

Writes (sequential) can be as fast as 1 ms for   transaction log.

3

Big IOs

Table Scans

Range Scans

>1

Perfmon object

SQL Server Access Methods

A missing index flushes the cache.

4

If Top 2 values for wait stats are any of the   following:

ASYNCH_IO_COMPLETION

IO_COMPLETION

LOGMGR

WRITELOG

PAGEIOLATCH_x

Top 2

Sys.dm_os_wait_stats

If top 2 wait_stats values include IO, there   is an IO bottleneck.

5

Low bytes per sec.

 

Perfmon object

Physical Disk

 

 

Blocking

 

Rule

Description

Value

Source

Problem Description

1

Block percentage

>2%

Sys.dm_db_index_operational_stats

Frequency of blocks.

2

Block process report

30 sec

Sp_configure profiler

Report of statements.

3

Average Row Lock Waits

>100ms

Sys.dm_db_index_operational_stats

Duration of blocks.

4

If Top 2 values for wait stats are any of the   following:

LCK_M_BU

LCK_M_IS

LCK_M_IU

LCK_M_IX

LCK_M_RIn_NL

LCK_M_RIn_S

LCK_M_RIn_U

LCK_M_RIn_X

LCK_M_RS_S

LCK_M_RS_U

LCK_M_RX_S

LCK_M_RX_U

LCK_M_RX_X

LCK_M_S

LCK_M_SCH_M

LCK_M_SCH_S

LCK_M_SIU

LCK_M_SIX

LCK_M_U

LCK_M_UIX

LCK_M_X

Top 2

Sys.dm_os_wait_stats

If top 2 wait_stats values include locking,   there is a blocking bottleneck.

5

High number of deadlocks

>5 per hour

Trace flag 1204 to display in the errorlog and   or the profiler deadlock graph.

If the deadlock occurs with the same   participant SQL commands or operations multiple times, it is likely that   there is a locking problem.

 

Network

 

Rule

Description

Value

Source

Problem Description

1

High network latency coupled with an   application that has many round trips to the database.

Output queue length >2

Perfmon object: Network Interface

Indicates that the latency between the   application server and the database is high.

Could be caused by significant network   infrastructure between the application and the instance of SQL Server.

2

Network bandwidth is used up.

Packets Outbound Discarded

Packets Outbound Errors

Packets Received Discarded

Packets Received Errors

Perfmon object: Network Interface

Dropped packets are detected.

 

In summary, given the high volume of identical small transactions that characterize OLTP, transactions per second and resource usage can be improved as follows: 

  1. Database designs usually keep the number of indexes to a functional minimum as every insert, update, and delete incurs index maintenance.
  2. CPU can be reduced with plan reuse and join reduction.
  3. IO performance can be reduced with good indexing, join reduction, and high page life expectancy.
  4. Memory is optimal when there are no sudden drops in Page Life Expectancy.
  5. Sorts can be limited with index usage. That is, a certain sort order is supported by an index that is sorted the same way, either ascending or descending.
  6. Blocking can be reduced with index design and short transactions.

你可能感兴趣的:(oltp)