yumushui

pt-query-digest官方介绍与实际示例

http://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html

pt-query-digest

NAME

pt-query-digest - Analyze MySQL queries from logs, processlist, and tcpdump.

pt-query-digest 工具用于从日志logs，进程processlist，后台文件tcpdump等分析MySQL查询。

SYNOPSIS

Usage

 
      pt-query-digest [OPTIONS] [FILES] [DSN]

pt-query-digest analyzes MySQL queries from slow, general, and binary log files. It can also analyze queries from SHOWPROCESSLIST and MySQL protocol data from tcpdump. By default, queries are grouped by fingerprint and reported in descending order of query time (i.e. the slowest queries first). If no FILES are given, the tool reads STDIN. The optional DSNis used for certain options like --since and --until.

pt-query-digest 工具的用法示例： pt-query-digest 选项文件 DSN

Report the slowest queries from slow.log: 从慢查询日志 slow.log中生成pt报告

pt-query-digest slow.log

Report the slowest queries from the processlist on host1: 从主机host1上的进程列表processlist生成最慢查询的报告

pt-query-digest --processlist h=host1

Capture MySQL protocol data with tcppdump, then report the slowest queries: 使用tcpdump扑捉MySQL协议数据，然后将最慢的查询生成报告

 
      tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt

pt-query-digest --type tcpdump mysql.tcp.txt

Save query data from slow.log to host2 for later review and trend analysis: 保存host2上的slow.log日志中的查询数据，供后续重看和趋势分析

pt-query-digest --review h=host2 --no-report slow.log

RISKS

Percona Toolkit is mature, proven in the real world, and well tested, but all database tools can pose a risk to the system and the database server. Before using this tool, please:

Read the tool’s documentation
Review the tool’s known “BUGS”
Test the tool on a non-production server
Backup your production server and verify the backups

DESCRIPTION

pt-query-digest is a sophisticated but easy to use tool for analyzing MySQL queries. It can analyze queries from MySQL slow, general, and binary logs, as well as SHOW PROCESSLIST and MySQL protocol data from tcpdump. By default, the tool reports which queries are the slowest, and therefore the most important to optimize. More complex and custom-tailored reports can be created by using options like --group-by, --filter, and --embedded-attributes.

Query analysis is a best-practice that should be done frequently. To make this easier, pt-query-digest has two features: query review (--review) and query history (--history). When the --review option is used, all unique queries are saved to a database. When the tool is ran again with --review, queries marked as reviewed in the database are not printed in the report. This highlights new queries that need to be reviewed. When the --history option is used, query metrics (query time, lock time, etc.) for each unique query are saved to database. Each time the tool is ran with --history, the more historical data is saved which can be used to trend and analyze query performance over time.

ATTRIBUTES

pt-query-digest works on events, which are a collection of key-value pairs called attributes. You’ll recognize most of the attributes right away: Query_time, Lock_time, and so on. You can just look at a slow log and see them. However, there are some that don’t exist in the slow log, and slow logs may actually include different kinds of attributes (for example, you may have a server with the Percona patches).

See “ATTRIBUTES REFERENCE” near the end of this documentation for a list of common and --type specific attributes. A familiarity with these attributes is necessary for working with --filter, --ignore-attributes, and other attribute-related options.

With creative use of --filter, you can create new attributes derived from existing attributes. For example, to create an attribute called Row_ratio for examining the ratio of Rows_sent to Rows_examined, specify a filter like:

 
     --filter '($event->{Row_ratio} = $event->{Rows_sent} / ($event->{Rows_examined})) && 1'

The && 1 trick is needed to create a valid one-line syntax that is always true, even if the assignment happens to evaluate false. The new attribute will automatically appears in the output:

# Row ratio        1.00    0.00      1    0.50      1    0.71    0.50

Attributes created this way can be specified for --order-by or any option that requires an attribute.

OUTPUT

The default --output is a query analysis report. The --[no]report option controls whether or not this report is printed. Sometimes you may want to parse all the queries but suppress the report, for example when using --review or --history.

There is one paragraph for each class of query analyzed. A “class” of queries all have the same value for the --group-byattribute which is fingerprint by default. (See “ATTRIBUTES”.) A fingerprint is an abstracted version of the query text with literals removed, whitespace collapsed, and so forth. The report is formatted so it’s easy to paste into emails without wrapping, and all non-query lines begin with a comment, so you can save it to a .sql file and open it in your favorite syntax-highlighting text editor. There is a response-time profile at the beginning.

The output described here is controlled by --report-format. That option allows you to specify what to print and in what order. The default output in the default order is described here.

The report, by default, begins with a paragraph about the entire analysis run The information is very similar to what you’ll see for each class of queries in the log, but it doesn’t have some information that would be too expensive to keep globally for the analysis. It also has some statistics about the code’s execution itself, such as the CPU and memory usage, the local date and time of the run, and a list of input file read/parsed.

Following this is the response-time profile over the events. This is a highly summarized view of the unique events in the detailed query report that follows. It contains the following columns:

 
     Column        Meaning
============  ==========================================================
Rank          The query's rank within the entire set of queries analyzed
Query ID      The query's fingerprint
Response time The total response time, and percentage of overall total
Calls         The number of times this query was executed
R/Call        The mean response time per execution
V/M           The Variance-to-mean ratio of response time
Item          The distilled query
 
    

Rank          序号	The query's rank within the entire set of queries analyzed 	--整组查询分析中的查询排名
Query ID      查询ID	The query's fingerprint					 	--这个查询的唯一标识ID
Response time 响应时间	The total response time, and percentage of overall total	 --总的消耗时间，和时间占总时间的百分比
Calls         请求	The number of times this query was executed			 --这个查询被执行的次数
R/Call        响应/请求	The mean response time per execution				 --每次执行的平均响应时间
V/M           差异均值比The Variance-to-mean ratio of response time			 --响应时间的差异均值比
Item          项目	The distilled query						 --大致查询语句

实际运行结果示例：

# Profile
# Rank Query ID           Response time      Calls  R/Call   V/M   Item
# ==== ================== ================== ====== ======== ===== =======
#    1 0x0650494FC4D5D592 8314061.5308 88.9%  72382 114.8637 60... SELECT login_history
#    2 0xC9C1ED538105140F  746232.6800  8.0% 435206   1.7147  0.76 INSERT login_history
#    3 0xAAF4B9434E84127F  168280.1698  1.8%  14429  11.6626 13... SELECT user
#    4 0x813031B8BBC3B329   30772.1643  0.3%  55183   0.5576  5.50 COMMIT
#    5 0x0D0114A0FADCCBAA   10387.1341  0.1%    537  19.3429 28.00 SELECT order_queue_btc
#    6 0x0008723E6D9F6BB6   10318.3693  0.1%   1372   7.5207 30... SELECT finance_usd_history
#   15 0xCA761D78824B9B76    2247.0839  0.0%    493   4.5580 24.35 UPDATE order_queue_ltc
#   17 0x5360E7442950F908    2108.8105  0.0%   1010   2.0879 28.58 UPDATE wallet_status
#   18 0xEE091D9E56428964    1996.5714  0.0%    482   4.1423 17.99 UPDATE order_queue_btc
#   19 0x793A2D5DFD9A6D57    1815.0540  0.0%     83  21.8681 64... SELECT login_history
#   20 0xAA0C4A3435739574    1401.8884  0.0%     12 116.8240 55... SELECT finance_usd_history
#   21 0xB6D145CBB1D81B18    1376.4660  0.0%    301   4.5730 17.05 SELECT match_result_ltc
#   22 0x2AEDBFB7369C6EA7    1314.3095  0.0%    115  11.4288 69.43 SELECT match_result_btc
#   23 0xEA82C2B72F28D09A    1313.2121  0.0%    705   1.8627  0.35 INSERT notice_read
#   24 0xBC40C84EA5C70C9A    1304.9270  0.0%    272   4.7975 11.89 SELECT match_result_btc
#   25 0xA065C4701E81B7DC    1278.7655  0.0%    143   8.9424 96.14 SELECT match_result_ltc
#   26 0x7AD86284356A5C0C    1251.6529  0.0%     28  44.7019 11... SELECT config

A final line whose rank is shown as MISC contains aggregate statistics on the queries that were not included in the report, due to options such as --limit and --outliers. For details on the variance-to-mean ratio, please seehttp://en.wikipedia.org/wiki/Index_of_dispersion.

Next, the detailed query report is printed. Each query appears in a paragraph. Here is a sample, slightly reformatted so ‘perldoc’ will not wrap lines in a terminal. The following will all be one paragraph, but we’ll break it up for commentary.

# Query 2: 0.01 QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665

# Query 2（查询号为2） 0.01 QPS（QPS为0.01）, 0.02x conc（并发数为0.02x）, ID 0xFDEA8D2993C9CAF3 at byte 160665（ID号为10665字节上的0xFDEA8D2993C9CAF3）

实际示例为：

# Query 1: 0.10 QPS, 11.64x concurrency, ID 0x0650494FC4D5D592 at byte 28898746
# Query 2: 0.63 QPS, 1.08x concurrency, ID 0xC9C1ED538105140F at byte 28338359
# Query 3: 0.02 QPS, 0.25x concurrency, ID 0xAAF4B9434E84127F at byte 25010800
# Query 4: 0.07 QPS, 0.04x concurrency, ID 0x813031B8BBC3B329 at byte 25002907
# Query 5: 0.00 QPS, 0.01x concurrency, ID 0x0D0114A0FADCCBAA at byte 43266846
# Query 6: 0.00 QPS, 0.01x concurrency, ID 0x0008723E6D9F6BB6 at byte 29145571

This line identifies the sequential number of the query in the sort order specified by --order-by. Then there’s the queries per second, and the approximate concurrency for this query (calculated as a function of the timespan and total Query_time). Next there’s a query ID. This ID is a hex version of the query’s checksum in the database, if you’re using --review. You can select the reviewed query’s details from the database with a query like SELECT .... WHEREchecksum=0xFDEA8D2993C9CAF3.

If you are investigating the report and want to print out every sample of a particular query, then the following --filter may be helpful:

 
     pt-query-digest slow.log           \
   --no-report                     \
   --output slowlog                \
   --filter '$event->{fingerprint} \
        && make_checksum($event->{fingerprint}) eq "FDEA8D2993C9CAF3"'
 
    

Notice that you must remove the 0x prefix from the checksum.

Finally, in case you want to find a sample of the query in the log file, there’s the byte offset where you can look. (This is not always accurate, due to some anomalies in the slow log format, but it’s usually right.) The position refers to the worst sample, which we’ll see more about below.

Next is the table of metrics about this class of queries. 这类查询的查询指标表

 
     #           pct   total    min    max     avg     95%  stddev  median
# Count       0       2
# Exec time  13   1105s   552s   554s    553s    554s      2s    553s
# Lock time   0   216us   99us  117us   108us   117us    12us   108us
# Rows sent  20   6.26M  3.13M  3.13M   3.13M   3.13M   12.73   3.13M
# Rows exam   0   6.26M  3.13M  3.13M   3.13M   3.13M   12.73   3.13M
 
    

查询指标表在行中显示了总共的数量，执行时间，锁的时间，发出行，检查行，查询大小等内容，每项内容按照类别、百分比、总计、最小值、最大值、95%的情况、标准差和中位数。

实际操作示例为：

# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          9   72382
# Exec time     88 8314062s    50ms   1195s    115s    875s    265s      4s
# Lock time      2   1729s    39us      2s    24ms    91ms    58ms   332us
# Rows sent      0  63.69k       0       1    0.90    0.99    0.30    0.99
# Rows examine  11 583.28M       0  23.66k   8.25k  20.37k   5.96k   8.46k
# Query size     8   8.19M     117     119  118.60  118.34       1  118.34

# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         57  435206
# Exec time      7 746233s    51ms     68s      2s      4s      1s      2s
# Lock time      0    162s       0      4s   372us   366us    13ms    84us
# Rows sent      0       0       0       0       0       0       0       0
# Rows examine   0       0       0       0       0       0       0       0
# Query size    74  74.02M     160     180  178.35  174.84    0.03  174.84

# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          1   14429
# Exec time      1 168280s   100ms    377s     12s     47s     40s   219ms
# Lock time     82  62543s       0    297s      4s     19s     19s    68ms
# Rows sent      0  11.92k       0       1    0.85    0.99    0.36    0.99
# Rows examine   0  11.92k       0       1    0.85    0.99    0.36    0.99
# Query size     1   1.67M     106     122  121.22  118.34    0.15  118.34

The first line is column headers for the table. The percentage is the percent of the total for the whole analysis run, and the total is the actual value of the specified metric. For example, in this case we can see that the query executed 2 times, which is 13% of the total number of queries in the file. The min, max and avg columns are self-explanatory. The 95% column shows the 95th percentile; 95% of the values are less than or equal to this value. The standard deviation shows you how tightly grouped the values are. The standard deviation and median are both calculated from the 95th percentile, discarding the extremely large values.

The stddev, median and 95th percentile statistics are approximate. Exact statistics require keeping every value seen, sorting, and doing some calculations on them. This uses a lot of memory. To avoid this, we keep 1000 buckets, each of them 5% bigger than the one before, ranging from .000001 up to a very big number. When we see a value we increment the bucket into which it falls. Thus we have fixed memory per class of queries. The drawback is the imprecision, which typically falls in the 5 percent range.

Next we have statistics on the users, databases and time range for the query.

 
     # Users       1   user1
# Databases   2     db1(1), db2(1)
# Time range 2008-11-26 04:55:18 to 2008-11-27 00:15:15

The users and databases are shown as a count of distinct values, followed by the values. If there’s only one, it’s shown alone; if there are many, we show each of the most frequent ones, followed by the number of times it appears.

 
     # Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms  #####
# 100ms  ####################
#    1s  ##########
#  10s+
 
    

查询时间分布，从1us到10s+划定了多个范围，统计该语句执行时间处于那个时间段的最多。

实际情况示例为：

# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms  #
# 100ms  ######################
#    1s  ################################################################
#  10s+  ##############################


# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms  #
# 100ms  ##########
#    1s  ################################################################
#  10s+  #

# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms  ################################################################
#    1s  ########
#  10s+  ################

The execution times show a logarithmic chart of time clustering. Each query goes into one of the “buckets” and is counted up. The buckets are powers of ten. The first bucket is all values in the “single microsecond range” – that is, less than 10us. The second is “tens of microseconds,” which is from 10us up to (but not including) 100us; and so on. The charted attribute can be changed by specifying --report-histogram but is limited to time-based attributes.

 
     # Tables
#    SHOW TABLE STATUS LIKE 'table1'\G
#    SHOW CREATE TABLE `table1`\G
# EXPLAIN
SELECT * FROM table1\G
 
    

This section is a convenience: if you’re trying to optimize the queries you see in the slow log, you probably want to examine the table structure and size. These are copy-and-paste-ready commands to do that.

Finally, we see a sample of the queries in this class of query. This is not a random sample. It is the query that performed the worst, according to the sort order given by --order-by. You will normally see a commented # EXPLAIN line just before it, so you can copy-paste the query to examine its EXPLAIN plan. But for non-SELECT queries that isn’t possible to do, so the tool tries to transform the query into a roughly equivalent SELECT query, and adds that below.

If you want to find this sample event in the log, use the offset mentioned above, and something like the following:

tail -c + /path/to/file | head