http://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html
pt-query-digest - Analyze MySQL queries from logs, processlist, and tcpdump.
pt-query-digest 工具 用于从日志logs,进程processlist,后台文件tcpdump等分析MySQL查询。
pt-query-digest [OPTIONS] [FILES] [DSN]
pt-query-digest analyzes MySQL queries from slow, general, and binary log files. It can also analyze queries from SHOWPROCESSLIST and MySQL protocol data from tcpdump. By default, queries are grouped by fingerprint and reported in descending order of query time (i.e. the slowest queries first). If no FILES are given, the tool reads STDIN. The optional DSNis used for certain options like --since and --until.
pt-query-digest 工具的用法示例: pt-query-digest 选项 文件 DSN
Report the slowest queries from slow.log: 从慢查询日志 slow.log中生成pt报告
pt-query-digest slow.log
Report the slowest queries from the processlist on host1: 从主机host1上的进程列表processlist生成最慢查询的报告
pt-query-digest --processlist h=host1
Capture MySQL protocol data with tcppdump, then report the slowest queries: 使用tcpdump扑捉MySQL协议数据,然后将最慢的查询生成报告
tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt
pt-query-digest --type tcpdump mysql.tcp.txt
Save query data from slow.log to host2 for later review and trend analysis: 保存host2上的slow.log日志中的查询数据,供后续重看和趋势分析
pt-query-digest --review h=host2 --no-report slow.log
Percona Toolkit is mature, proven in the real world, and well tested, but all database tools can pose a risk to the system and the database server. Before using this tool, please:
pt-query-digest is a sophisticated but easy to use tool for analyzing MySQL queries. It can analyze queries from MySQL slow, general, and binary logs, as well as SHOW PROCESSLIST and MySQL protocol data from tcpdump. By default, the tool reports which queries are the slowest, and therefore the most important to optimize. More complex and custom-tailored reports can be created by using options like --group-by, --filter, and --embedded-attributes.
Query analysis is a best-practice that should be done frequently. To make this easier, pt-query-digest has two features: query review (--review) and query history (--history). When the --review option is used, all unique queries are saved to a database. When the tool is ran again with --review, queries marked as reviewed in the database are not printed in the report. This highlights new queries that need to be reviewed. When the --history option is used, query metrics (query time, lock time, etc.) for each unique query are saved to database. Each time the tool is ran with --history, the more historical data is saved which can be used to trend and analyze query performance over time.
pt-query-digest works on events, which are a collection of key-value pairs called attributes. You’ll recognize most of the attributes right away: Query_time, Lock_time, and so on. You can just look at a slow log and see them. However, there are some that don’t exist in the slow log, and slow logs may actually include different kinds of attributes (for example, you may have a server with the Percona patches).
See “ATTRIBUTES REFERENCE” near the end of this documentation for a list of common and --type specific attributes. A familiarity with these attributes is necessary for working with --filter, --ignore-attributes, and other attribute-related options.
With creative use of --filter, you can create new attributes derived from existing attributes. For example, to create an attribute called Row_ratio for examining the ratio of Rows_sent to Rows_examined, specify a filter like:
--filter '($event->{Row_ratio} = $event->{Rows_sent} / ($event->{Rows_examined})) && 1'
The && 1 trick is needed to create a valid one-line syntax that is always true, even if the assignment happens to evaluate false. The new attribute will automatically appears in the output:
# Row ratio 1.00 0.00 1 0.50 1 0.71 0.50
Attributes created this way can be specified for --order-by or any option that requires an attribute.
The default --output is a query analysis report. The --[no]report option controls whether or not this report is printed. Sometimes you may want to parse all the queries but suppress the report, for example when using --review or --history.
There is one paragraph for each class of query analyzed. A “class” of queries all have the same value for the --group-byattribute which is fingerprint by default. (See “ATTRIBUTES”.) A fingerprint is an abstracted version of the query text with literals removed, whitespace collapsed, and so forth. The report is formatted so it’s easy to paste into emails without wrapping, and all non-query lines begin with a comment, so you can save it to a .sql file and open it in your favorite syntax-highlighting text editor. There is a response-time profile at the beginning.
The output described here is controlled by --report-format. That option allows you to specify what to print and in what order. The default output in the default order is described here.
The report, by default, begins with a paragraph about the entire analysis run The information is very similar to what you’ll see for each class of queries in the log, but it doesn’t have some information that would be too expensive to keep globally for the analysis. It also has some statistics about the code’s execution itself, such as the CPU and memory usage, the local date and time of the run, and a list of input file read/parsed.
Following this is the response-time profile over the events. This is a highly summarized view of the unique events in the detailed query report that follows. It contains the following columns:
Column Meaning
============ ==========================================================
Rank The query's rank within the entire set of queries analyzed
Query ID The query's fingerprint
Response time The total response time, and percentage of overall total
Calls The number of times this query was executed
R/Call The mean response time per execution
V/M The Variance-to-mean ratio of response time
Item The distilled query
Rank 序号 The query's rank within the entire set of queries analyzed --整组查询分析中的查询排名 Query ID 查询ID The query's fingerprint --这个查询的唯一标识ID Response time 响应时间 The total response time, and percentage of overall total --总的消耗时间,和时间占总时间的百分比 Calls 请求 The number of times this query was executed --这个查询被执行的次数 R/Call 响应/请求 The mean response time per execution --每次执行的平均响应时间 V/M 差异均值比The Variance-to-mean ratio of response time --响应时间的差异均值比 Item 项目 The distilled query --大致查询语句实际运行结果示例:
# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# ==== ================== ================== ====== ======== ===== =======
# 1 0x0650494FC4D5D592 8314061.5308 88.9% 72382 114.8637 60... SELECT login_history
# 2 0xC9C1ED538105140F 746232.6800 8.0% 435206 1.7147 0.76 INSERT login_history
# 3 0xAAF4B9434E84127F 168280.1698 1.8% 14429 11.6626 13... SELECT user
# 4 0x813031B8BBC3B329 30772.1643 0.3% 55183 0.5576 5.50 COMMIT
# 5 0x0D0114A0FADCCBAA 10387.1341 0.1% 537 19.3429 28.00 SELECT order_queue_btc
# 6 0x0008723E6D9F6BB6 10318.3693 0.1% 1372 7.5207 30... SELECT finance_usd_history
# 15 0xCA761D78824B9B76 2247.0839 0.0% 493 4.5580 24.35 UPDATE order_queue_ltc
# 17 0x5360E7442950F908 2108.8105 0.0% 1010 2.0879 28.58 UPDATE wallet_status
# 18 0xEE091D9E56428964 1996.5714 0.0% 482 4.1423 17.99 UPDATE order_queue_btc
# 19 0x793A2D5DFD9A6D57 1815.0540 0.0% 83 21.8681 64... SELECT login_history
# 20 0xAA0C4A3435739574 1401.8884 0.0% 12 116.8240 55... SELECT finance_usd_history
# 21 0xB6D145CBB1D81B18 1376.4660 0.0% 301 4.5730 17.05 SELECT match_result_ltc
# 22 0x2AEDBFB7369C6EA7 1314.3095 0.0% 115 11.4288 69.43 SELECT match_result_btc
# 23 0xEA82C2B72F28D09A 1313.2121 0.0% 705 1.8627 0.35 INSERT notice_read
# 24 0xBC40C84EA5C70C9A 1304.9270 0.0% 272 4.7975 11.89 SELECT match_result_btc
# 25 0xA065C4701E81B7DC 1278.7655 0.0% 143 8.9424 96.14 SELECT match_result_ltc
# 26 0x7AD86284356A5C0C 1251.6529 0.0% 28 44.7019 11... SELECT config
A final line whose rank is shown as MISC contains aggregate statistics on the queries that were not included in the report, due to options such as --limit and --outliers. For details on the variance-to-mean ratio, please seehttp://en.wikipedia.org/wiki/Index_of_dispersion.
Next, the detailed query report is printed. Each query appears in a paragraph. Here is a sample, slightly reformatted so ‘perldoc’ will not wrap lines in a terminal. The following will all be one paragraph, but we’ll break it up for commentary.
# Query 2: 0.01 QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665
# Query 2(查询号为2) 0.01 QPS(QPS为0.01), 0.02x conc(并发数为0.02x), ID 0xFDEA8D2993C9CAF3 at byte 160665(ID号为10665字节上的0xFDEA8D2993C9CAF3)
实际示例为:
# Query 1: 0.10 QPS, 11.64x concurrency, ID 0x0650494FC4D5D592 at byte 28898746
# Query 2: 0.63 QPS, 1.08x concurrency, ID 0xC9C1ED538105140F at byte 28338359
# Query 3: 0.02 QPS, 0.25x concurrency, ID 0xAAF4B9434E84127F at byte 25010800
# Query 4: 0.07 QPS, 0.04x concurrency, ID 0x813031B8BBC3B329 at byte 25002907
# Query 5: 0.00 QPS, 0.01x concurrency, ID 0x0D0114A0FADCCBAA at byte 43266846
# Query 6: 0.00 QPS, 0.01x concurrency, ID 0x0008723E6D9F6BB6 at byte 29145571
This line identifies the sequential number of the query in the sort order specified by --order-by. Then there’s the queries per second, and the approximate concurrency for this query (calculated as a function of the timespan and total Query_time). Next there’s a query ID. This ID is a hex version of the query’s checksum in the database, if you’re using --review. You can select the reviewed query’s details from the database with a query like SELECT .... WHEREchecksum=0xFDEA8D2993C9CAF3.
If you are investigating the report and want to print out every sample of a particular query, then the following --filter may be helpful:
pt-query-digest slow.log \
--no-report \
--output slowlog \
--filter '$event->{fingerprint} \
&& make_checksum($event->{fingerprint}) eq "FDEA8D2993C9CAF3"'
Notice that you must remove the 0x prefix from the checksum.
Finally, in case you want to find a sample of the query in the log file, there’s the byte offset where you can look. (This is not always accurate, due to some anomalies in the slow log format, but it’s usually right.) The position refers to the worst sample, which we’ll see more about below.
Next is the table of metrics about this class of queries. 这类查询的查询指标表
# pct total min max avg 95% stddev median
# Count 0 2
# Exec time 13 1105s 552s 554s 553s 554s 2s 553s
# Lock time 0 216us 99us 117us 108us 117us 12us 108us
# Rows sent 20 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
# Rows exam 0 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
查询指标表在行中显示了总共的数量,执行时间,锁的时间,发出行,检查行,查询大小等内容,每项内容按照类别、百分比、总计、最小值、最大值、95%的情况、标准差和中位数。
实际操作示例为:
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 9 72382
# Exec time 88 8314062s 50ms 1195s 115s 875s 265s 4s
# Lock time 2 1729s 39us 2s 24ms 91ms 58ms 332us
# Rows sent 0 63.69k 0 1 0.90 0.99 0.30 0.99
# Rows examine 11 583.28M 0 23.66k 8.25k 20.37k 5.96k 8.46k
# Query size 8 8.19M 117 119 118.60 118.34 1 118.34
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 57 435206
# Exec time 7 746233s 51ms 68s 2s 4s 1s 2s
# Lock time 0 162s 0 4s 372us 366us 13ms 84us
# Rows sent 0 0 0 0 0 0 0 0
# Rows examine 0 0 0 0 0 0 0 0
# Query size 74 74.02M 160 180 178.35 174.84 0.03 174.84
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 1 14429
# Exec time 1 168280s 100ms 377s 12s 47s 40s 219ms
# Lock time 82 62543s 0 297s 4s 19s 19s 68ms
# Rows sent 0 11.92k 0 1 0.85 0.99 0.36 0.99
# Rows examine 0 11.92k 0 1 0.85 0.99 0.36 0.99
# Query size 1 1.67M 106 122 121.22 118.34 0.15 118.34
The first line is column headers for the table. The percentage is the percent of the total for the whole analysis run, and the total is the actual value of the specified metric. For example, in this case we can see that the query executed 2 times, which is 13% of the total number of queries in the file. The min, max and avg columns are self-explanatory. The 95% column shows the 95th percentile; 95% of the values are less than or equal to this value. The standard deviation shows you how tightly grouped the values are. The standard deviation and median are both calculated from the 95th percentile, discarding the extremely large values.
The stddev, median and 95th percentile statistics are approximate. Exact statistics require keeping every value seen, sorting, and doing some calculations on them. This uses a lot of memory. To avoid this, we keep 1000 buckets, each of them 5% bigger than the one before, ranging from .000001 up to a very big number. When we see a value we increment the bucket into which it falls. Thus we have fixed memory per class of queries. The drawback is the imprecision, which typically falls in the 5 percent range.
Next we have statistics on the users, databases and time range for the query.
# Users 1 user1
# Databases 2 db1(1), db2(1)
# Time range 2008-11-26 04:55:18 to 2008-11-27 00:15:15
The users and databases are shown as a count of distinct values, followed by the values. If there’s only one, it’s shown alone; if there are many, we show each of the most frequent ones, followed by the number of times it appears.
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms #####
# 100ms ####################
# 1s ##########
# 10s+
查询时间分布,从1us到10s+划定了多个范围,统计该语句执行时间处于那个时间段的最多。
实际情况示例为:
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms #
# 100ms ######################
# 1s ################################################################
# 10s+ ##############################
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms #
# 100ms ##########
# 1s ################################################################
# 10s+ #
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms ################################################################
# 1s ########
# 10s+ ################
The execution times show a logarithmic chart of time clustering. Each query goes into one of the “buckets” and is counted up. The buckets are powers of ten. The first bucket is all values in the “single microsecond range” – that is, less than 10us. The second is “tens of microseconds,” which is from 10us up to (but not including) 100us; and so on. The charted attribute can be changed by specifying --report-histogram but is limited to time-based attributes.
# Tables
# SHOW TABLE STATUS LIKE 'table1'\G
# SHOW CREATE TABLE `table1`\G
# EXPLAIN
SELECT * FROM table1\G
This section is a convenience: if you’re trying to optimize the queries you see in the slow log, you probably want to examine the table structure and size. These are copy-and-paste-ready commands to do that.
Finally, we see a sample of the queries in this class of query. This is not a random sample. It is the query that performed the worst, according to the sort order given by --order-by. You will normally see a commented # EXPLAIN line just before it, so you can copy-paste the query to examine its EXPLAIN plan. But for non-SELECT queries that isn’t possible to do, so the tool tries to transform the query into a roughly equivalent SELECT query, and adds that below.
If you want to find this sample event in the log, use the offset mentioned above, and something like the following:
tail -c + /path/to/file | head
See also --report-format.