目录结构
22.1 performance Schema 快速入门
-
22.2 Performance Schema 配置
- 22.2.1 mysql编译的时候 修改Performance Schema配置
- 22.2.2 mysql启动的时候 修改Performance Schema配置
- 22.2.3 mysql运行过程中 修改Performance Schema配置
22.3 Performance Schema 查询
22.4 Performance Schema Instrument Naming Conventions
22.5 Performance Schema Status 监控
22.6 Performance Schema 原子性,分子性 事件
22.7 Performance Schema Statement 诊断
22.8 Performance Schema 基本表特征
-
22.9 Performance Schema 表的描述
- 22.9.1 Performance Schema 表的索引
- 22.9.2 Performance Schema Setup 类型的表
- 22.9.3 Performance Schema Instance 类型的表
- 22.9.4 Performance Schema Wait Event 类型的表
- 22.9.5 Performance Schema Stage Event 类型的表
- 22.9.6 Performance Schema Statement Event 类型的表
- 22.9.7 Performance Schema Connection 类型的表
- 22.9.8 Performance Schema Connection Attribute 类型的表
- 22.9.9 Performance Schema Summary 类型的表
- 22.9.10 Performance Schema 其他类型的表
22.10 Performance Schema 变量与选项
22.11 Performance Schema 命令选项
22.12 Performance Schema 系统变量
22.13 Performance Schema status变量
22.14 Performance Schema 与插件
-
22.15 使用Performance Schema 来诊断问题
- 22.15.1 使用Performance Schema来替代profiling
心得: performance_schema的使用
架构图
Mysql 5.5 Performance schema
[图片上传失败...(image-4340e6-1526973085181)]
Mysql 5.6 Performance schema
[图片上传失败...(image-242fa8-1526973085181)]
Mysql 5.7 Performance schema
[图片上传失败...(image-39b53c-1526973085181)]
consumer 层次图
[图片上传失败...(image-5c72de-1526973085181)]
Statement 诊断
[图片上传失败...(image-589359-1526973085181)]
22.1 performance Schema 快速入门
这一节简单的讲解如何使用performance schema,并附上例子。如: Section 22.15, “Using the Performance Schema to Diagnose Problems”
如果要让mysql可用,必须在mysql编译的时候built进来。通过检查服务器的帮助信息,你也可以确认一下perf 是否可用。如果可用,会有一些提示如:
shell> mysqld --verbose --help
...
--performance_schema
Enable the performance schema.
--performance_schema_events_waits_history_long_size=#
Number of rows in events_waits_history_long.
...
如果以上变量没有出现在你的output上,那么说明你的mysql不支持performance schema。
请关注 Section 22.2, “Performance Schema Configuration”.
如果你performance schema被支持,那么在mysql5.6.6开始就已经默认打开的。
如果要显示打开关闭PS,那么就在mysql启动的时候加上performance_schema变量,
并且给予适当的值。比如:在你的my.cnf文件中
[mysqld]
performance_schema=ON
一旦这样配置后,当mysql启动时,就会自动初始化performance schema。为了验证初始化是否成功,
可以使用这样的语句:
mysql> SHOW VARIABLES LIKE 'performance_schema';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| performance_schema | ON |
+--------------------+-------+
ON表示成功,OFF表示有错误,请查看相关error log 定位错误。
performance schema 是以存储引擎的方式实现的,你可以通过INFORMATION_SCHEMA.ENGINES 或者
show engines来确认:
mysql> SELECT * FROM INFORMATION_SCHEMA.ENGINES
-> WHERE ENGINE='PERFORMANCE_SCHEMA'\G
*************************** 1. row ***************************
ENGINE: PERFORMANCE_SCHEMA
SUPPORT: YES
COMMENT: Performance Schema
TRANSACTIONS: NO
XA: NO
SAVEPOINTS: NO
mysql> SHOW ENGINES\G
...
Engine: PERFORMANCE_SCHEMA
Support: YES
Comment: Performance Schema
Transactions: NO
XA: NO
Savepoints: NO
...
你可以像使用正常database一样使用performance schema。
比如:use performance_schema, 以及show语法。
performance_schema数据库名必须是小写。可以使用show create table 查看表:
mysql> SHOW CREATE TABLE setup_timers\G
*************************** 1. row ***************************
Table: setup_timers
Create Table: CREATE TABLE `setup_timers` (
`NAME` varchar(64) NOT NULL,
`TIMER_NAME` enum('CYCLE','NANOSECOND','MICROSECOND','MILLISECOND','TICK')
NOT NULL
) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8
也可以查询 INFORMATION_SCHEMA.COLUMNS来查看列。
performance_schema里面的表可以根据名字分类为:
- Current events
- event histories
- summaries
- object instances
- setup (configuration)
下面有些例子来简单的使用这些表,详细的使用请看:Section 22.9, “Performance Schema Table Descriptions”.
一开始,不是所有的instrument 和 consumer 都会被enable , 所以一开始他们不会收集所有的事件。
为了让他们都enable 或者 enable event timing。 执行以下两条语句
mysql> UPDATE setup_instruments SET ENABLED = 'YES', TIMED = 'YES';
Query OK, 338 rows affected (0.12 sec)
mysql> UPDATE setup_consumers SET ENABLED = 'YES';
Query OK, 8 rows affected (0.00 sec)
如果想查看某个时刻的等待事件,可以查询 events_waits_current表。它记录了每个thread最近的监控信息。
mysql> SELECT * FROM events_waits_current\G
*************************** 1. row ***************************
THREAD_ID: 0
EVENT_ID: 5523
EVENT_NAME: wait/synch/mutex/mysys/THR_LOCK::mutex
SOURCE: thr_lock.c:525
TIMER_START: 201660494489586
TIMER_END: 201660494576112
TIMER_WAIT: 86526
SPINS: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_TYPE: NULL
OBJECT_INSTANCE_BEGIN: 142270668
NESTING_EVENT_ID: NULL
OPERATION: lock
NUMBER_OF_BYTES: NULL
FLAGS: 0
...
这个事件说明 thread 0 在等待86526皮秒 来获得一个锁THR_LOCK::mutex,而这个锁是mysys子系统中。
以下是列的一些描述
- id: thread id
- event name:别监控的instrument名字
- timer 类型的列: 时间,以皮秒为基准单位
history 表包含了一些列相同事件的历史记录,就如同current-events 一样,不同的是,
有更多的记录来说明服务器最近做了什么,而不是当前做了什么。events_waits_history & events_waits_history_long 记录了每个thread最近10条和10000条event。
mysql> SELECT EVENT_ID, EVENT_NAME, TIMER_WAIT
-> FROM events_waits_history WHERE THREAD_ID = 13
-> ORDER BY EVENT_ID;
+----------+-----------------------------------------+------------+
| EVENT_ID | EVENT_NAME | TIMER_WAIT |
+----------+-----------------------------------------+------------+
| 86 | wait/synch/mutex/mysys/THR_LOCK::mutex | 686322 |
| 87 | wait/synch/mutex/mysys/THR_LOCK_malloc | 320535 |
| 88 | wait/synch/mutex/mysys/THR_LOCK_malloc | 339390 |
| 89 | wait/synch/mutex/mysys/THR_LOCK_malloc | 377100 |
| 90 | wait/synch/mutex/sql/LOCK_plugin | 614673 |
| 91 | wait/synch/mutex/sql/LOCK_open | 659925 |
| 92 | wait/synch/mutex/sql/THD::LOCK_thd_data | 494001 |
| 93 | wait/synch/mutex/mysys/THR_LOCK_malloc | 222489 |
| 94 | wait/synch/mutex/mysys/THR_LOCK_malloc | 214947 |
| 95 | wait/synch/mutex/mysys/LOCK_alarm | 312993 |
+----------+-----------------------------------------+------------+
如果这个表满了,那么新的event会被加进来,踢掉最老的那个。
summary 表提供了整个时间段的一些统计信息。他们统计事件的处理方式和之前都不一样。
如果想知道某个instrument 被执行的最频繁,或者发生的频率非常高,可以通过排序 events_waits_summary_global_by_event_name表,根据 COUNT_STAR 或者 SUM_TIMER_WAIT列。
mysql> SELECT EVENT_NAME, COUNT_STAR
-> FROM events_waits_summary_global_by_event_name
-> ORDER BY COUNT_STAR DESC LIMIT 10;
+---------------------------------------------------+------------+
| EVENT_NAME | COUNT_STAR |
+---------------------------------------------------+------------+
| wait/synch/mutex/mysys/THR_LOCK_malloc | 6419 |
| wait/io/file/sql/FRM | 452 |
| wait/synch/mutex/sql/LOCK_plugin | 337 |
| wait/synch/mutex/mysys/THR_LOCK_open | 187 |
| wait/synch/mutex/mysys/LOCK_alarm | 147 |
| wait/synch/mutex/sql/THD::LOCK_thd_data | 115 |
| wait/io/file/myisam/kfile | 102 |
| wait/synch/mutex/sql/LOCK_global_system_variables | 89 |
| wait/synch/mutex/mysys/THR_LOCK::mutex | 89 |
| wait/synch/mutex/sql/LOCK_open | 88 |
+---------------------------------------------------+------------+
mysql> SELECT EVENT_NAME, SUM_TIMER_WAIT
-> FROM events_waits_summary_global_by_event_name
-> ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
+----------------------------------------+----------------+
| EVENT_NAME | SUM_TIMER_WAIT |
+----------------------------------------+----------------+
| wait/io/file/sql/MYSQL_LOG | 1599816582 |
| wait/synch/mutex/mysys/THR_LOCK_malloc | 1530083250 |
| wait/io/file/sql/binlog_index | 1385291934 |
| wait/io/file/sql/FRM | 1292823243 |
| wait/io/file/myisam/kfile | 411193611 |
| wait/io/file/myisam/dfile | 322401645 |
| wait/synch/mutex/mysys/LOCK_alarm | 145126935 |
| wait/io/file/sql/casetest | 104324715 |
| wait/synch/mutex/sql/LOCK_plugin | 86027823 |
| wait/io/file/sql/pid | 72591750 |
+----------------------------------------+----------------+
以上说明THR_LOCK_malloc 非常hot。
instance 表记录了什么类型的object被instrumented。一个被instrumented的object,当被server使用时就会产生一个event。这些表提供了event名以及简单的阐述。
比如:file_instances 就列出了file IO相关的instance of instrument。
xxx:performance_schema> show tables like '%_instances'
-> ;
+--------------------------------------------+
| Tables_in_performance_schema (%_instances) |
+--------------------------------------------+
| cond_instances |
| file_instances |
| mutex_instances |
| rwlock_instances |
| socket_instances |
+--------------------------------------------+
5 rows in set (0.00 sec)
mysql> SELECT * FROM file_instances\G
*************************** 1. row ***************************
FILE_NAME: /opt/mysql-log/60500/binlog.000007
EVENT_NAME: wait/io/file/sql/binlog
OPEN_COUNT: 0
*************************** 2. row ***************************
FILE_NAME: /opt/mysql/60500/data/mysql/tables_priv.MYI
EVENT_NAME: wait/io/file/myisam/kfile
OPEN_COUNT: 1
*************************** 3. row ***************************
FILE_NAME: /opt/mysql/60500/data/mysql/columns_priv.MYI
EVENT_NAME: wait/io/file/myisam/kfile
OPEN_COUNT: 1
...
setup表用来配置和显示监控信息的。 例如:什么样的timer 被使用,
请查询setup_timers
mysql> SELECT * FROM setup_timers;
+-----------+-------------+
| NAME | TIMER_NAME |
+-----------+-------------+
| idle | MICROSECOND |
| wait | CYCLE |
| stage | NANOSECOND |
| statement | NANOSECOND |
+-----------+-------------+
setup_instruments 列出了哪些event会被收集与监控:
mysql> SELECT * FROM setup_instruments;
+------------------------------------------------------------+---------+-------+
| NAME | ENABLED | TIMED |
+------------------------------------------------------------+---------+-------+
...
| wait/synch/mutex/sql/LOCK_global_read_lock | YES | YES |
| wait/synch/mutex/sql/LOCK_global_system_variables | YES | YES |
| wait/synch/mutex/sql/LOCK_lock_db | YES | YES |
| wait/synch/mutex/sql/LOCK_manager | YES | YES |
...
| wait/synch/rwlock/sql/LOCK_grant | YES | YES |
| wait/synch/rwlock/sql/LOGGER::LOCK_logger | YES | YES |
| wait/synch/rwlock/sql/LOCK_sys_init_connect | YES | YES |
| wait/synch/rwlock/sql/LOCK_sys_init_slave | YES | YES |
...
| wait/io/file/sql/binlog | YES | YES |
| wait/io/file/sql/binlog_index | YES | YES |
| wait/io/file/sql/casetest | YES | YES |
| wait/io/file/sql/dbopt | YES | YES |
...
相应了解如何翻译这些instrument 的名字,请看:Section 22.4, “Performance Schema Instrument Naming Conventions”.
为了控制哪些event是不是instrument,可以给enabled设置yes or no。
mysql> UPDATE setup_instruments SET ENABLED = 'NO'
-> WHERE NAME = 'wait/synch/mutex/sql/LOCK_mysql_create_db';
performance schema 使用收集的events 来更新performance_schema 数据库的那些表,这些表扮演着事件信息消费者的角色。setup_consumers 列出了可用的消费者 以及哪些是enabled。
mysql> SELECT * FROM setup_consumers;
+--------------------------------+---------+
| NAME | ENABLED |
+--------------------------------+---------+
| events_stages_current | NO |
| events_stages_history | NO |
| events_stages_history_long | NO |
| events_statements_current | YES |
| events_statements_history | NO |
| events_statements_history_long | NO |
| events_waits_current | NO |
| events_waits_history | NO |
| events_waits_history_long | NO |
| global_instrumentation | YES |
| thread_instrumentation | YES |
| statements_digest | YES |
+--------------------------------+---------+
22.2 Performance Schema 配置
22.2.1 mysql编译的时候 修改Performance Schema配置
目前,一般的binary版本都会默认支持PS,不过最后还是想官方的provider 确认一下。
如果使用的是源码包,用cmake WITH_PERFSCHEMA_STORAGE_ENGINE 使其ok
shell> cmake . -DWITH_PERFSCHEMA_STORAGE_ENGINE=1
如何check是否支持performance schema?
第一:
shell> mysqld --verbose --help
...
--performance_schema
Enable the performance schema.
--performance_schema_events_waits_history_long_size=#
Number of rows in events_waits_history_long.
...
也可以
mysql> SHOW ENGINES\G
...
Engine: PERFORMANCE_SCHEMA
Support: YES
Comment: Performance Schema
Transactions: NO
XA: NO
Savepoints: NO
...
注意:show engines 中即便包含了PERFORMANCE_SCHEMA,只能说明已经支持,不代表已经开启。
如果要开启,必须在start up阶段设置,请看下一章节。
22.2.2 mysql启动的时候 修改Performance Schema配置
假设编译的时候已经支持ps,那么mysql5.6.6 版本以及以上版本都会默认enable 。
开启,只需要在my.cnf中
[mysqld]
performance_schema=ON
如果server没有足够多的内存用于初始化performance schema,那么它会自己disable 掉,然后没有instrument相关信息。
mysql5.6.4版本开始,允许instrument 和 consumer在server startup阶段配置。之前的版本都
只能在runtime阶段用update配置。
在startup阶段控制instrument,可以用这种形式:
--performance-schema-instrument='instrument_name=value'
这里面instrument_name 指的是类似wait/synch/mutex/sql/LOCK_open这种instrument,
value 就是下面其中一种:
- OFF,False, or 0 关闭这个instrument
- ON,TRUE,or 1 开启这个instrument
- counted:开启并且统计这个instrument
--performance-schema-instrument 只能指定一个instrument名字。多个instances可以在多instrument中配置。另外:模式也是被允许的。如:
--performance-schema-instrument='wait/synch/cond/%=COUNTED'
关闭所有instrument,使用:
--performance-schema-instrument='%=OFF'
在startup阶段控制consumer
--performance-schema-consumer-$consumer_name=$value
这里面$consumer_name 指的是consumer 如:events_waits_history,
$value 为:
- OFF,FALSE,or 0: 不收集这个consumer相关的事件
- ON,TRUE,or 1 : 收集这个consumer相关的事件
例如:开启events_waits_history consumer,
--performance-schema-consumer-events-waits-history=ON
合法的consumer的名字可以在setup_consumers表中找到。模式匹配是不允许的。
consumer的名字在setup_consumers中以下划线表示,但是在starup 阶段的my.cnf配置文件中,
下划线和横线都是被允许的。
performance schema的默认系统变量:
mysql> SHOW VARIABLES LIKE 'perf%';
+--------------------------------------------------------+---------+
| Variable_name | Value |
+--------------------------------------------------------+---------+
| performance_schema | ON |
| performance_schema_accounts_size | 100 |
| performance_schema_digests_size | 200 |
| performance_schema_events_stages_history_long_size | 10000 |
| performance_schema_events_stages_history_size | 10 |
| performance_schema_events_statements_history_long_size | 10000 |
| performance_schema_events_statements_history_size | 10 |
| performance_schema_events_waits_history_long_size | 10000 |
| performance_schema_events_waits_history_size | 10 |
| performance_schema_hosts_size | 100 |
| performance_schema_max_cond_classes | 80 |
| performance_schema_max_cond_instances | 1000 |
...
performance_schema ON 和 OFF 表示是否开启和关闭。其他变量表示:表sizes(number of rows)以及内存分配
一旦开启了performance schema,mysql就会分配内存,这个内存表示最小值,可能会更大,所以
需要根据自己的情况设置一个合理的值。
如果要改变系统变量的值,目前只能在startup阶段修改。如:在my.cnf中:
[mysqld]
performance_schema
performance_schema_events_waits_history_size=20
performance_schema_events_waits_history_long_size=15000
mysql5.6.6 以及以上版本,如果你不显示的设置系统变量,那么mysql自己会设置默认的值。
22.2.3 mysql运行过程中 修改Performance Schema配置
setup表:如果你有update权限,可以直接更改监控行为。
mysql> SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
-> WHERE TABLE_SCHEMA = 'performance_schema'
-> AND TABLE_NAME LIKE 'setup%';
+-------------------+
| TABLE_NAME |
+-------------------+
| setup_actors |
| setup_consumers |
| setup_instruments |
| setup_objects |
| setup_timers |
+-------------------+
如果要查看哪些事件选择了哪些时间,可以查看setup_timers
mysql> SELECT * FROM setup_timers;
+-----------+-------------+
| NAME | TIMER_NAME |
+-----------+-------------+
| idle | MICROSECOND |
| wait | CYCLE |
| stage | NANOSECOND |
| statement | NANOSECOND |
+-----------+-------------+
setup_instruments 列出了哪些instrument事件被收集
setup_consumers 列出了某一类consumer的instrument事件实际被收集
这是一个典型的生产者-消费者模式,即便你设置了setup_instruments=xx,
但是如果setup_consumers没有设置对应的内容(即没有消费者),那么实际的instrument
也不会收集数据到performance schema的表中。
22.2.3.1 performance_schema事件计时
计时事件,主要用于提供事件到底话费了多少时间。当然,你也可以配置不开启计时功能。
以下两个表主要提供timer信息:
- performance_timers: 列出了可用的timers和他们的特点
- setup_timers : 描述了哪些instrument对应哪些timers
setup_timer是里面的每一条记录都必须是performance_timers里面定义好的内容。
mysql> SELECT * FROM performance_timers;
+-------------+-----------------+------------------+----------------+
| TIMER_NAME | TIMER_FREQUENCY | TIMER_RESOLUTION | TIMER_OVERHEAD |
+-------------+-----------------+------------------+----------------+
| CYCLE | 2389029850 | 1 | 72 |
| NANOSECOND | NULL | NULL | NULL |
| MICROSECOND | 1000000 | 1 | 585 |
| MILLISECOND | 1035 | 1 | 738 |
| TICK | 101 | 1 | 630 |
+-------------+-----------------+------------------+----------------+
一般都采用picsecond皮秒 作为基准单位。
22.2.3.2 performance_schema 过滤事件
事件处理是一种典型的生产者-消费者模式
- Instrument 生产事件,然后被收集。setup_instruments表列出来哪些instrument 可以被收集。
mysql> SELECT * FROM setup_instruments;
+------------------------------------------------------------+---------+-------+
| NAME | ENABLED | TIMED |
+------------------------------------------------------------+---------+-------+
...
| wait/synch/mutex/sql/LOCK_global_read_lock | YES | YES |
| wait/synch/mutex/sql/LOCK_global_system_variables | YES | YES |
| wait/synch/mutex/sql/LOCK_lock_db | YES | YES |
| wait/synch/mutex/sql/LOCK_manager | YES | YES |
...
setup_instruments 提供了最基本的形式来控制事件的产生。其他的事件产生形式(如:基于某一个object或者thread)可以参考:Section 22.2.3.3, “Event Pre-Filtering”
- Performan schema里面的那些表,就是用来事件的目的地和消费事件用的。setup_consumers表列出了很多类型的consumer
mysql> SELECT * FROM setup_consumers;
+--------------------------------+---------+
| NAME | ENABLED |
+--------------------------------+---------+
| events_stages_current | NO |
| events_stages_history | NO |
| events_stages_history_long | NO |
| events_statements_current | YES |
| events_statements_history | NO |
| events_statements_history_long | NO |
| events_waits_current | NO |
| events_waits_history | NO |
| events_waits_history_long | NO |
| global_instrumentation | YES |
| thread_instrumentation | YES |
| statements_digest | YES |
+--------------------------------+---------+
对于监控事件的过滤分为不同的阶段:
- Pre-filtering: 修改Performance schema的配置(开启或者关闭相应的 instrument 或者 consumer),可以让一部分事件从生产者中收集然后只影响部分consumer. pre-filtering 完成后
会所有用户产生全局影响。
为什么要使用pre-filtering
* 减少负载。 虽然开启所有的事件监控也只会有少许压力负载。 但是你还想竟可能的降低负载,
比如:你可以关闭一些不用的功能,如计时。
* 避免填充你不关心的current-events or history表。比如:你只开启了文件相关的instrument,那么那些非文件的记录就不会填充在current-events or history表中。那么在post-filtering 中,也不会有相关记录。
* 避免维护一些你不关心的事件表。如果你关闭了某些consumer,服务器就不会话费事件来维护consumer对于的destination表。举例:如果你不关心历史记录事件,你就可以关闭 history table consumer 来提升性能。
- Post-filtering: 需要用到一些查询语句(where子句)来过滤。Post-filtering是基于用户的不同用户不一样。
22.2.3.3 Pre-Filtering 事件
Pre-Filtering 既可以用在 producer , 也可以用在 consumer 中
-
配置pre-filtering用在 producer上,主要有这些表
- setup_instruments : 如果对应的instrument关闭了,那么就不会收集对应event的信息,即便是production-related setup 表。
- setup_objects : 控制了是否监控特定对象
- threads : 是否每一个thread都会被监控
- setup_actors: 决定了哪些host,user,role的foreground thread被监控。
配置pre-filtering用在 consumer上,主要是setup_consumers 这张表。这决定了被收集的event会被发送的目的地。setup_consumers 也会潜在影响production。如果给定的事件没有被配置发送到任何地方(没有consumer),那么performance schema 也不会produce 它。
修改以上表,基本上都会立刻生效。但是也有一些例外:
- 某些setup_instruments 只在server startup阶段生效,即便你在running 阶段设置也是没有用的。
他们就是: mutexes, conditions, and rwlocks - setup_actors 只影响foreground thread,并且只会对新进来的thread有影响。已经存在的thread不受影响。
即便你修改了以上配置,performance schema不会flush 信息到history表。原来的记录还是存在current-events and history tables ,直到被新的event替换。关闭instrument,也是一样。
当然,你可以truncate table 来情况history历史表。
你也许也想使用truncate 来清空summary类型的表。遗憾的是truncate summary类型的表只会将summary列设置为0或者null,不会删除数据。 events_statements_summary_by_digest例外。
22.2.3.3.1 Pre-Filtering by Instrument
更改立刻生效,部分必须在startup阶段(mutexes, conditions, and rwlocks)
例子:
* Disable all instruments:
mysql> UPDATE setup_instruments SET ENABLED = 'NO';
Now no events will be collected.
* Disable all file instruments, adding them to the current set of disabled instruments:
mysql> UPDATE setup_instruments SET ENABLED = 'NO'
-> WHERE NAME LIKE 'wait/io/file/%';
* Disable only file instruments, enable all other instruments:
mysql> UPDATE setup_instruments
-> SET ENABLED = IF(NAME LIKE 'wait/io/file/%', 'NO', 'YES');
* Enable all but those instruments in the mysys library:
mysql> UPDATE setup_instruments
-> SET ENABLED = CASE WHEN NAME LIKE '%/mysys/%' THEN 'YES' ELSE 'NO' END;
* Disable a specific instrument:
mysql> UPDATE setup_instruments SET ENABLED = 'NO'
-> WHERE NAME = 'wait/synch/mutex/mysys/TMPDIR_mutex';
* To toggle the state of an instrument, “flip” its ENABLED value:
mysql> UPDATE setup_instruments
-> SET ENABLED = IF(ENABLED = 'YES', 'NO', 'YES')
-> WHERE NAME = 'wait/synch/mutex/mysys/TMPDIR_mutex';
* Disable timing for all events:
mysql> UPDATE setup_instruments SET TIMED = 'NO';
22.2.3.3.2 Pre-Filtering by Object
更改后,立刻生效。
例子1:
mysql> SELECT * FROM setup_objects;
+-------------+--------------------+-------------+---------+-------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | ENABLED | TIMED |
+-------------+--------------------+-------------+---------+-------+
| TABLE | mysql | % | NO | NO |
| TABLE | performance_schema | % | NO | NO |
| TABLE | information_schema | % | NO | NO |
| TABLE | % | % | YES | YES |
+-------------+--------------------+-------------+---------+-------+
例子2:
+-------------+---------------+-------------+---------+-------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | ENABLED | TIMED |
+-------------+---------------+-------------+---------+-------+
| TABLE | db1 | t1 | YES | YES |
| TABLE | db1 | t2 | NO | NO |
| TABLE | db2 | % | YES | YES |
| TABLE | db3 | % | NO | NO |
| TABLE | % | % | YES | YES |
+-------------+---------------+-------------+---------+-------+
22.2.3.3.3 Pre-Filtering by Thread
略。。。
22.2.3.3.4 Pre-Filtering by Consumer
配置修改,立刻生效
关于setup_consumer 的几个基本原则:
- 和consumer相关的destination 不会接收任何event,除非对于的consumer都开启了
- consumer只有等待他所依赖的consumer都enable,才会被check。
- 如果一个consumer没有被checked通过,或者check了,但是被disable了,那么依赖它的那些consumer不会被check.
- 如果一个event没有对应的destination(没有对应的consumer),那么会被自动disable.
以下列出consumer的等级关系图:
global_instrumentation --1 level
thread_instrumentation --2 level
events_waits_current --3 level
events_waits_history --4 level
events_waits_history_long --4 level
events_stages_current --3 level
events_stages_history --4 level
events_stages_history_long --4 level
events_statements_current --3 level
events_statements_history --4 level
events_statements_history_long --4 level
statements_digest --2 level
22.2.3.3.5 Example Consumer Configurations
略。。。
22.3 Performance Schema Queries
也就是post-filtering,只不过多了where 从句
22.4 Performance Schema Instrument命名规则
名字基本都由‘/’分割,从左到右基本都是从普通粒度到精细粒度。
-
最上层的instrument 组件
- idle
- stage
- statement
- wait
idle instrument 组件
idle event描述来自:socket_instances.STATE 列: Section 22.9.3.5, “The socket_instances Table”.
- stage instrument 组件
组成形式: stage/code_area/stage_name , code_area 一般是sql or myisam。
stage name 一般来自: SHOW PROCESSLIST,如:Sorting result ,Sending data
-
Statement instrument 组件
- statement/abstract/* : 一般都是早期的stage,在抽象sql都还没来得及解析的时候。
- statement/com: SQL 命令操作 如:statement/com/Connect
- statement/sql: SQL语句操作 如:statement/sql/create_db
-
Wait Instrument组件
-
wait/io : IO 等待事件
- wait/io/file : 文件IO等待事件。等待文件操作完成的时间如:fwrite()。
但是物理IO有可能因为缓存的原因调用fwrite时不会写磁盘。 - wait/io/socket: socket相关的IO等待
- wait/io/table : 表相关的IO等待。一般对于记录rows来说有fetch,insert,update,delete四种操作。
不像其他等待事件,table I/O 还包含了其他的等待事件。比如:table io可能包含了文件IO和内存IO。因为读取table rows的时候,有可能会去从文件读取数据。
- wait/io/file : 文件IO等待事件。等待文件操作完成的时间如:fwrite()。
-
* wait/lock
* wait/lock/table : 表操作的锁等待事件
* wait/synch
* wait/synch/cond :condition就是线程与线程之间的信号。
* wait/synch/mutex : mutex主要用来锁住一块共享资源。
* wait/synch/rwlock : 读写锁
22.5 performance schema 状态监控
- SHOW STATUS LIKE 'perf%'
mysql> SHOW STATUS LIKE 'perf%';
+-----------------------------------------------+-------+
| Variable_name | Value |
+-----------------------------------------------+-------+
| Performance_schema_accounts_lost | 0 |
| Performance_schema_cond_classes_lost | 0 |
| Performance_schema_cond_instances_lost | 0 |
| Performance_schema_digest_lost | 0 |
| Performance_schema_file_classes_lost | 0 |
| Performance_schema_file_handles_lost | 0 |
| Performance_schema_file_instances_lost | 0 |
| Performance_schema_hosts_lost | 0 |
| Performance_schema_locker_lost | 0 |
| Performance_schema_mutex_classes_lost | 0 |
| Performance_schema_mutex_instances_lost | 0 |
| Performance_schema_rwlock_classes_lost | 0 |
| Performance_schema_rwlock_instances_lost | 0 |
| Performance_schema_session_connect_attrs_lost | 0 |
| Performance_schema_socket_classes_lost | 0 |
| Performance_schema_socket_instances_lost | 0 |
| Performance_schema_stage_classes_lost | 0 |
| Performance_schema_statement_classes_lost | 0 |
| Performance_schema_table_handles_lost | 0 |
| Performance_schema_table_instances_lost | 0 |
| Performance_schema_thread_classes_lost | 0 |
| Performance_schema_thread_instances_lost | 0 |
| Performance_schema_users_lost | 0 |
+-----------------------------------------------+-------+
这些状态表明了由于内存限制,有哪些instrument不能被load或者create。
- SHOW ENGINE PERFORMANCE_SCHEMA STATUS
mysql> SHOW ENGINE PERFORMANCE_SCHEMA STATUS\G
...
*************************** 3\. row ***************************
Type: performance_schema
Name: events_waits_history.row_size
Status: 76
*************************** 4\. row ***************************
Type: performance_schema
Name: events_waits_history.row_count
Status: 10000
*************************** 5\. row ***************************
Type: performance_schema
Name: events_waits_history.memory
Status: 760000
...
*************************** 57\. row ***************************
Type: performance_schema
Name: performance_schema.memory
Status: 26459600
...
这个可以查看status的最大值(.size),以及当前以及达到多少量了(.count)
如:
| performance_schema | (table_share_hash).count | 12391 |
| performance_schema | (table_share_hash).size | 12500 |
使用心得: 当(table_share_hash).count=(table_share_hash).size,那么就会增加Performance_schema_table_instances_lost的值。这样新的表监控就不会被collect。这时候
可以drop 掉一些不用的表,那么(table_share_hash).count 就会下降。
22.8 Performance Schema General Table Characteristics
基本表只允许truncate,select,其他操作都不允许。
最后注意事项
- variables目前只能重启才能重新配置
- 重启mysql后,所有状态都会重置
实战
1) 源码:https://github.com/MarkLeith/mysql-sys/
2)这里面的代码参考Mark leith,有部分做过修改,也有部分对线上影响非常大。
3)对所有view都做过详细解释与实战测试,有部分坑里面说明
/*
* 所涉及的performance_schema的所有列表
* 1)file_summary_by_instance
* 2) file_summary_by_event_name
* 3) table_io_waits_summary_by_index_usage
* 4) events_statements_summary_by_digest
* 5) events_statements_summary_by_host_by_event_name
* 6) events_statements_summary_global_by_event_name
* 7) events_waits_summary_global_by_event_name
* 8) events_waits_summary_by_host_by_event_name
* 9) objects_summary_global_by_type
* 10)
*/
/*
* Create DATABASE v_monitor
*/
CREATE DATABASE IF NOT EXISTS v_monitor DEFAULT CHARACTER SET utf8;
USE v_monitor;
/*
* Create function format_time
*/
DROP FUNCTION IF EXISTS format_time;
DELIMITER
$$
CREATE DEFINER='xxx'@'%' FUNCTION format_time (
picoseconds BIGINT UNSIGNED
)
RETURNS VARCHAR(16) CHARSET UTF8
SQL SECURITY INVOKER
DETERMINISTIC
NO SQL
BEGIN
IF picoseconds IS NULL THEN RETURN NULL;
ELSEIF picoseconds >= 3600000000000000 THEN RETURN CONCAT(ROUND(picoseconds / 3600000000000000, 2), 'h');
ELSEIF picoseconds >= 60000000000000 THEN RETURN SEC_TO_TIME(ROUND(picoseconds / 1000000000000, 2));
ELSEIF picoseconds >= 1000000000000 THEN RETURN CONCAT(ROUND(picoseconds / 1000000000000, 2), ' s');
ELSEIF picoseconds >= 1000000000 THEN RETURN CONCAT(ROUND(picoseconds / 1000000000, 2), ' ms');
ELSEIF picoseconds >= 1000000 THEN RETURN CONCAT(ROUND(picoseconds / 1000000, 2), ' us');
ELSEIF picoseconds >= 1000 THEN RETURN CONCAT(ROUND(picoseconds / 1000, 2), ' ns');
ELSE RETURN CONCAT(picoseconds, ' ps');
END IF;
END
$$
DELIMITER ;
/*
* Create function format_statement
*/
DROP FUNCTION IF EXISTS format_statement;
DELIMITER
$$
CREATE DEFINER='xxx'@'%' FUNCTION format_statement (
statement LONGTEXT
)
RETURNS VARCHAR(65)
SQL SECURITY INVOKER
DETERMINISTIC
NO SQL
BEGIN
IF LENGTH(statement) > 64 THEN
RETURN REPLACE(CONCAT(LEFT(statement, 30), ' ... ', RIGHT(statement, 30)), '\n', ' ');
ELSE
RETURN REPLACE(statement, '\n', ' ');
END IF;
END
$$
DELIMITER ;
/*
* Create function format_bytes
*/
DROP FUNCTION IF EXISTS format_bytes;
DELIMITER
$$
CREATE DEFINER='xxx'@'%' FUNCTION format_bytes (
bytes BIGINT
)
RETURNS VARCHAR(16)
SQL SECURITY INVOKER
DETERMINISTIC
NO SQL
BEGIN
IF bytes IS NULL THEN RETURN NULL;
ELSEIF bytes >= 1125899906842624 THEN RETURN CONCAT(ROUND(bytes / 1125899906842624, 2), ' PiB');
ELSEIF bytes >= 1099511627776 THEN RETURN CONCAT(ROUND(bytes / 1099511627776, 2), ' TiB');
ELSEIF bytes >= 1073741824 THEN RETURN CONCAT(ROUND(bytes / 1073741824, 2), ' GiB');
ELSEIF bytes >= 1048576 THEN RETURN CONCAT(ROUND(bytes / 1048576, 2), ' MiB');
ELSEIF bytes >= 1024 THEN RETURN CONCAT(ROUND(bytes / 1024, 2), ' KiB');
ELSE RETURN CONCAT(bytes, ' bytes');
END IF;
END
$$
DELIMITER ;
/*
* Create function format_path
*/
DROP FUNCTION IF EXISTS format_path;
DELIMITER
$$
CREATE DEFINER='xxx'@'%' FUNCTION format_path (
path VARCHAR(260)
)
RETURNS VARCHAR(260) CHARSET UTF8
SQL SECURITY INVOKER
DETERMINISTIC
NO SQL
BEGIN
DECLARE v_path VARCHAR(260);
IF path LIKE '/private/%'
THEN SET v_path = REPLACE(path, '/private', '');
ELSE SET v_path = path;
END IF;
IF v_path IS NULL THEN RETURN NULL;
ELSEIF v_path LIKE CONCAT(@@global.datadir, '%') ESCAPE '|' THEN
RETURN REPLACE(REPLACE(REPLACE(v_path, @@global.datadir, '@@datadir/'), '\\\\', ''), '\\', '/');
ELSEIF v_path LIKE CONCAT(@@global.tmpdir, '%') ESCAPE '|' THEN
RETURN REPLACE(REPLACE(REPLACE(v_path, @@global.tmpdir, '@@tmpdir/'), '\\\\', ''), '\\', '/');
ELSE RETURN v_path;
END IF;
END
$$
DELIMITER ;
/*
* IO 相关,文件相关的延迟 , 事件相关的延迟, FILE IO
* View: io_global_by_file_by_bytes
* 使用说明: 统计每个文件的IO使用率, 比如读写次数,读写bytes
* 能解决什么问题?
* 1)可以清楚的知道那个文件是被访问的最频繁,压力最大,延迟最多的文件。
* 2)可以实时监控redo log,undo,datafile的变化,从而做针对性的优化和调整。当然,通过这个,也能知道瓶颈。
* 3)可以清楚的知道某个库级别,表级别的压力情况,这个比之前count sql数量,观察data size要靠谱,真实的多。
* 4)可以做前瞻性的规划,比如:拆库拆表,可以指导如何按照压力负载均衡的做到合理来优化拆分
* 5)通过表级别的count监控,还可以用做前端缓存的利用率监控。
* 6)可以知道哪些db.table是以只读为主,哪些db.test是以只写为主。从而又进一步做缓存和业务监控。
*xxx:v_monitor> select * from io_global_by_file_by_bytes limit 10;
*+--------------------------------------------------+------------+------------+-------------+---------------+------------+
*| file | count_read | total_read | count_write | total_written | total |
*+--------------------------------------------------+------------+------------+-------------+---------------+------------+
*| /data/mysql/var/ibdata1 | 0 | 0 bytes | 16952506 | 880.15 GiB | 880.15 GiB |
*| /data/mysql/var/ark_db/hp_pro_stats_hour_11.ibd | 0 | 0 bytes | 8782380 | 140.21 GiB | 140.21 GiB |
*| /data/mysql/var/ark_db/hp_pro_stats_hour_oth.ibd | 0 | 0 bytes | 5631256 | 89.36 GiB | 89.36 GiB |
*| /data/mysql/var/ark_db/hp_pro_stats_day_11.ibd | 0 | 0 bytes | 5017652 | 80.69 GiB | 80.69 GiB |
*| /data/mysql/var/ark_db/hp_pro_stats_11.ibd | 0 | 0 bytes | 3692307 | 60.88 GiB | 60.88 GiB |
*| /data/mysql/var/ark_db/hp_pro_stats_day_oth.ibd | 0 | 0 bytes | 2999151 | 49.81 GiB | 49.81 GiB |
*| /data/mysql/var/ark_db/hp_pro_stats_oth.ibd | 0 | 0 bytes | 2103639 | 36.41 GiB | 36.41 GiB |
*| /data/mysql/var/ib_logfile0 | 0 | 0 bytes | 29454850 | 21.94 GiB | 21.94 GiB |
*| /data/mysql/var/ib_logfile1 | 0 | 0 bytes | 29102130 | 21.60 GiB | 21.60 GiB |
*| /data/mysql/var/ark_db/hp_pro_click_oth.ibd | 0 | 0 bytes | 898190 | 14.53 GiB | 14.53 GiB |
*+--------------------------------------------------+------------+------------+-------------+---------------+------------+
*10 rows in set (0.00 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW io_global_by_file_by_bytes (
file,
count_read,
total_read,
count_write,
total_written,
total
) AS
SELECT v_monitor.format_path(file_name) AS file,
count_read,
v_monitor.format_bytes(sum_number_of_bytes_read) AS total_read,
count_write,
v_monitor.format_bytes(sum_number_of_bytes_write) AS total_written,
v_monitor.format_bytes(sum_number_of_bytes_read + sum_number_of_bytes_write) AS total
FROM performance_schema.file_summary_by_instance
ORDER BY sum_number_of_bytes_read + sum_number_of_bytes_write DESC;
/*
* 未经过转换的原始信息
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW origin_io_global_by_file_by_bytes (
file,
count_read,
total_read,
count_write,
total_written,
total
) AS
SELECT v_monitor.format_path(file_name) AS file,
count_read,
sum_number_of_bytes_read AS total_read,
count_write,
sum_number_of_bytes_write AS total_written,
sum_number_of_bytes_read + sum_number_of_bytes_write AS total
FROM performance_schema.file_summary_by_instance
ORDER BY sum_number_of_bytes_read + sum_number_of_bytes_write DESC;
/*
* IO 相关,文件相关的延迟 , 事件相关的延迟 FILE IO
* View: io_global_by_file_by_latency
* 使用说明: 统计每个文件的latency
* 能解决什么问题?
* 1)能解决的问题和io_global_by_file_by_bytes一样,只不过角度不一样。这里是以延迟来衡量
*xxx:v_monitor> select * from io_global_by_file_by_latency limit 10;
*+--------------------------------------------------+----------+---------------+------------+--------------+-------------+---------------+
*| file | total | total_latency | count_read | read_latency | count_write | write_latency |
*+--------------------------------------------------+----------+---------------+------------+--------------+-------------+---------------+
*| /data/mysql/var/ib_logfile0 | 59328704 | 1.40h | 0 | 0 ps | 29670415 | 00:03:05.51 |
*| /data/mysql/var/ib_logfile1 | 58204245 | 1.38h | 0 | 0 ps | 29102130 | 00:03:01.40 |
*| /data/mysql/var/ibdata1 | 18492358 | 00:54:07.96 | 0 | 0 ps | 17039981 | 00:06:49.20 |
*| /data/mysql/var/ark_db/hp_pro_stats_hour_11.ibd | 9369453 | 00:08:23.82 | 0 | 0 ps | 8832634 | 00:01:19.78 |
*| /data/mysql/var/ark_db/hp_pro_stats_day_11.ibd | 5548183 | 00:05:52.43 | 0 | 0 ps | 5052084 | 45.78 s |
*| /data/mysql/var/ark_db/hp_pro_stats_hour_oth.ibd | 6172184 | 00:05:46.43 | 0 | 0 ps | 5667656 | 54.00 s |
*| /data/mysql/var/ark_db/hp_pro_stats_11.ibd | 4126313 | 00:04:59.54 | 0 | 0 ps | 3717949 | 32.73 s |
*| /data/mysql/var/ark_db/hp_pro_stats_oth.ibd | 2376313 | 00:04:14.98 | 0 | 0 ps | 2116758 | 17.49 s |
*| /data/mysql/var/ark_db/hp_pro_stats_day_oth.ibd | 3412908 | 00:03:58.03 | 0 | 0 ps | 3018950 | 26.98 s |
*| /data/mysql/var/ark_db/hp_pro_click_oth.ibd | 1101398 | 00:01:12.65 | 0 | 0 ps | 902427 | 9.63 s |
*+--------------------------------------------------+----------+---------------+------------+--------------+-------------+---------------+
*10 rows in set (0.01 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW io_global_by_file_by_latency (
file,
total,
total_latency,
count_read,
read_latency,
count_write,
write_latency
) AS
SELECT v_monitor.format_path(file_name) AS file,
count_star AS total,
v_monitor.format_time(sum_timer_wait) AS total_latency,
count_read,
v_monitor.format_time(sum_timer_read) AS read_latency,
count_write,
v_monitor.format_time(sum_timer_write) AS write_latency
FROM performance_schema.file_summary_by_instance
ORDER BY sum_timer_wait DESC;
/*
* IO 相关,classes相关的延迟 , 事件相关的延迟 Event IO
* View: io_global_by_wait_by_bytes
* 使用说明: 统计每个事件的IO使用率,单位bytes,counts。 Top Waits By bytes
* 能解决什么问题?
* 1)专注的不再只是文件一个点,包括服务器层和引擎层。
* 2)如果relaylog非常大,说明同步有问题。
* 3)如果binlog比较大,说明binlog有问题。
* 4)如果sql/FRM 比较大,Tune table_open_cache / table_definition_cache
* 5)如果sql/file_parse比较大,如果在5.5比较高,那就升级mysql到5.6
* 6)如果query_log比较大,那就disable genery log
* 7)如果slow log比较大,那就调整slow阈值。
* 8)还有很多值得挖掘~~
*xxx:v_monitor> select * from io_global_by_wait_by_bytes limit 10;
*+-------------------------+------------+---------------+------------+------------+-------------+---------------+-----------------+
*| event_name | total | total_latency | count_read | total_read | count_write | total_written | total_requested |
*+-------------------------+------------+---------------+------------+------------+-------------+---------------+-----------------+
*| innodb/innodb_data_file | 116249025 | 3.14h | 1179948 | 25.17 GiB | 100838192 | 2.35 TiB | 2.37 TiB |
*| sql/relaylog | 4401630943 | 4.67h | 640398925 | 403.44 GiB | 3120571330 | 403.44 GiB | 806.88 GiB |
*| innodb/innodb_log_file | 117730283 | 2.78h | 0 | 0 bytes | 58871213 | 43.77 GiB | 43.77 GiB |
*| myisam/dfile | 5077011 | 5.72 s | 3376135 | 841.29 MiB | 13950 | 1.61 GiB | 2.43 GiB |
*| sql/FRM | 565451 | 14.43 s | 231981 | 44.23 MiB | 111230 | 11.97 MiB | 56.21 MiB |
*| sql/binlog_index | 38898 | 1.89 s | 5978 | 203.68 KiB | 0 | 0 bytes | 203.68 KiB |
*| sql/file_parser | 1273 | 26.81 ms | 50 | 83.65 KiB | 33 | 105.21 KiB | 188.86 KiB |
*| myisam/kfile | 4910 | 65.62 ms | 342 | 70.79 KiB | 3421 | 70.17 KiB | 140.96 KiB |
*| sql/binlog | 78 | 111.03 ms | 7 | 24.23 KiB | 56 | 34.59 KiB | 58.82 KiB |
*| sql/slow_log | 2 | 10.13 ms | 0 | 0 bytes | 2 | 476 bytes | 476 bytes |
*+-------------------------+------------+---------------+------------+------------+-------------+---------------+-----------------+
*10 rows in set (0.06 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW io_global_by_wait_by_bytes (
event_name,
total,
total_latency,
count_read,
total_read,
count_write,
total_written,
total_requested
) AS
SELECT SUBSTRING_INDEX(event_name, '/', -2) event_name,
count_star AS total,
v_monitor.format_time(sum_timer_wait) AS total_latency,
count_read,
v_monitor.format_bytes(sum_number_of_bytes_read) AS total_read,
count_write,
v_monitor.format_bytes(sum_number_of_bytes_write) AS total_written,
v_monitor.format_bytes(sum_number_of_bytes_write + sum_number_of_bytes_read) AS total_requested
FROM performance_schema.file_summary_by_event_name
WHERE event_name LIKE 'wait/io/file/%'
AND count_star > 0
ORDER BY sum_number_of_bytes_write + sum_number_of_bytes_read DESC;
/*
* IO 相关,classes相关的延迟 , 事件相关的延迟 Event IO
* View: io_global_by_wait_by_latency
* 使用说明: 统计每个事件的IO latency
* 能解决什么问题?
* 1)和io_global_by_wait_by_bytes一样,只是维度不同。 -- 延迟的维度
*xxx:v_monitor> select * from io_global_by_wait_by_latency limit 10;
*+-------------------------+------------+---------------+--------------+---------------+------------+------------+-------------+---------------+
*| event_name | total | total_latency | read_latency | write_latency | count_read | total_read | count_write | total_written |
*+-------------------------+------------+---------------+--------------+---------------+------------+------------+-------------+---------------+
*| sql/relaylog | 4402876075 | 4.67h | 00:34:17.72 | 3.56h | 640580183 | 403.56 GiB | 3121453858 | 403.55 GiB |
*| innodb/innodb_data_file | 116267316 | 3.14h | 00:18:01.37 | 00:22:40.06 | 1179948 | 25.17 GiB | 100854878 | 2.35 TiB |
*| innodb/innodb_log_file | 118276625 | 2.79h | 0 ps | 00:06:09.20 | 0 | 0 bytes | 59144386 | 43.97 GiB |
*| sql/FRM | 565457 | 14.43 s | 11.19 s | 186.53 ms | 231983 | 44.23 MiB | 111230 | 11.97 MiB |
*| myisam/dfile | 5077011 | 5.72 s | 2.47 s | 2.32 s | 3376135 | 841.29 MiB | 13950 | 1.61 GiB |
*| sql/binlog_index | 38898 | 1.89 s | 19.19 ms | 0 ps | 5978 | 203.68 KiB | 0 | 0 bytes |
*| sql/binlog | 78 | 111.03 ms | 20.85 us | 671.02 us | 7 | 24.23 KiB | 56 | 34.59 KiB |
*| myisam/kfile | 4910 | 65.62 ms | 572.33 us | 39.10 ms | 342 | 70.79 KiB | 3421 | 70.17 KiB |
*| archive/data | 6931 | 30.00 ms | 0 ps | 0 ps | 0 | 0 bytes | 0 | 0 bytes |
*| sql/file_parser | 1274 | 26.81 ms | 189.70 us | 597.35 us | 50 | 83.65 KiB | 33 | 105.21 KiB |
*+-------------------------+------------+---------------+--------------+---------------+------------+------------+-------------+---------------+
*10 rows in set (0.06 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW io_global_by_wait_by_latency (
event_name,
total,
total_latency,
read_latency,
write_latency,
count_read,
total_read,
count_write,
total_written
) AS
SELECT SUBSTRING_INDEX(event_name, '/', -2) AS event_name,
count_star AS total,
v_monitor.format_time(sum_timer_wait) AS total_latency,
v_monitor.format_time(sum_timer_read) AS read_latency,
v_monitor.format_time(sum_timer_write) AS write_latency,
count_read,
v_monitor.format_bytes(sum_number_of_bytes_read) AS total_read,
count_write,
v_monitor.format_bytes(sum_number_of_bytes_write) AS total_written
FROM performance_schema.file_summary_by_event_name
WHERE event_name LIKE 'wait/io/file/%'
AND count_star > 0
ORDER BY sum_timer_wait DESC;
/*
* Table IO 相关
* View:io_global_by_table_by_latency
* 使用说明:Top Tables By Latency
* 能解决什么问题:
* 1) 从table的角度来衡量DB的压力。
* 2) 和io_global_by_file_by_latency类似,io_global_by_file_by_latency 考虑的是文件,是物理IO。而io_global_by_table_by_latency更多的是上层的压力分布。
* 3) 除了能解决io_global_by_file_by_latency的问题外,还可以发现并发的问题。
* 比如:如果这里table的total很低,但是total_latency 很高,这就能很好的说明,80%是由于这个file的并发访问造成的high latency
* 4) 它还能做一件非常niubility的事情,那就是查看哪些表已经不被使用,已经下线了。其实很多开发也会问哪些表不被使用了,这下就可以派上用场了。好处:a)优化业务 b)减少磁盘空间。 c)减少备份的压力。
*xxx:v_monitor> select * from io_global_by_table_by_latency limit 10;
*+-----------------+--------------------------------+----------+---------------+-------------+-------------+
*| table_schema | table_name | total | total_latency | avg_latency | max_latency |
*+-----------------+--------------------------------+----------+---------------+-------------+-------------+
*| anjuke_db | ajk_propertysale | 17053410 | 1.11h | 233.37 us | 1.40 s |
*| anjuke_db | ajk_members | 65246324 | 00:33:58.47 | 31.24 us | 323.66 ms |
*| anjuke_db | ajk_brokerextend | 51277324 | 00:23:31.12 | 27.52 us | 423.48 ms |
*| stats_db | list_acenter_consume_c | 73892219 | 00:13:44.54 | 11.16 us | 414.73 ms |
*| anjuke_db | log_broker_login_201406 | 25700871 | 00:13:00.67 | 30.38 us | 514.11 ms |
*| propertys_sh_db | ajk_propertys | 14348854 | 00:12:51.18 | 53.75 us | 758.49 ms |
*| stats_db | list_acenter_charge_c | 33333810 | 00:11:59.95 | 21.60 us | 321.49 ms |
*| anjuke_db | ajk_private_tag | 9723376 | 00:10:56.38 | 67.51 us | 428.58 ms |
*| anjuke_db | account_balance_log_sublist_06 | 24463818 | 00:10:38.03 | 26.08 us | 427.08 ms |
*| anjuke_db | ajk_property_data | 23114971 | 00:09:31.39 | 24.72 us | 305.56 ms |
*+-----------------+--------------------------------+----------+---------------+-------------+-------------+
*10 rows in set (0.01 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW io_global_by_table_by_latency (
table_schema,
table_name,
total,
total_latency,
avg_latency,
max_latency
) AS
SELECT object_schema AS table_schema,
object_name AS table_name,
count_star AS total,
v_monitor.format_time(sum_timer_wait) as total_latency,
v_monitor.format_time(sum_timer_wait / count_star) as avg_latency,
v_monitor.format_time(max_timer_wait) as max_latency
FROM performance_schema.objects_summary_global_by_type
ORDER BY sum_timer_wait DESC;
/*
* Table IO 相关
* View:io_global_by_table_detail_breakdown
* 使用说明: Table Usage Detailed Breakdown
* 可以解决的问题:
* 1) 可以精确到表级别的IOPS,TPS。为诊断问题性能问题提供可靠的粒度。
* 2) 可以指导通过数据来了解业务并且指导业务开发,为什么IUD很高,为什么S很高。
* 3) 有些表如果只有IDU,或者甚至只有I,那么这些表根本就不适合放入在线DB,so,可以提供在线DB的架构优化。
*xxx:v_monitor> select * from io_global_by_table_detail_breakdown limit 10;
*+-----------------+--------------------------------+----------+----------------+---------+----------------+----------+----------------+---------+----------------+
*| table_schema | table_name | selects | select_latency | inserts | insert_latency | updates | update_latency | deletes | delete_latency |
*+-----------------+--------------------------------+----------+----------------+---------+----------------+----------+----------------+---------+----------------+
*| anjuke_db | ajk_propertysale | 6532225 | 00:56:34.78 | 1105708 | 00:03:03.19 | 1944954 | 00:06:20.39 | 1322582 | 17.20 s |
*| anjuke_db | ajk_members | 16270425 | 00:11:49.35 | 60983 | 00:01:36.68 | 16267120 | 00:20:05.95 | 59 | 17.76 ms |
*| anjuke_db | ajk_brokerextend | 15377867 | 00:11:36.55 | 4083 | 3.56 s | 5142498 | 00:11:26.88 | 0 | 0 ps |
*| stats_db | list_acenter_consume_c | 16288296 | 00:03:28.84 | 8260912 | 00:06:12.97 | 8260912 | 00:02:30.94 | 7997084 | 00:01:08.67 |
*| anjuke_db | log_broker_login_201406 | 0 | 0 ps | 8568979 | 00:12:47.28 | 0 | 0 ps | 0 | 0 ps |
*| propertys_sh_db | ajk_propertys | 3963117 | 00:04:44.38 | 366787 | 00:01:46.98 | 2521943 | 00:05:33.12 | 397469 | 41.13 s |
*| stats_db | list_acenter_charge_c | 8292546 | 00:03:34.09 | 56871 | 7.38 s | 8292546 | 00:08:06.29 | 0 | 0 ps |
*| anjuke_db | ajk_private_tag | 3811793 | 00:09:27.39 | 158063 | 43.71 s | 3811793 | 44.06 s | 0 | 0 ps |
*| anjuke_db | account_balance_log_sublist_06 | 0 | 0 ps | 8156298 | 00:10:25.48 | 0 | 0 ps | 0 | 0 ps |
*| anjuke_db | ajk_property_data | 5598966 | 00:04:29.53 | 1104535 | 00:01:31.36 | 4277478 | 00:02:50.94 | 1321179 | 31.39 s |
*+-----------------+--------------------------------+----------+----------------+---------+----------------+----------+----------------+---------+----------------+
*10 rows in set (0.01 sec)
* 重点说明:以performance_schema.table_io_waits_summary_by_table举例
* 1) 几个常用的命令:
* show variables like 'perf%';
* show status like 'perf%';
* show engine performance_schema status;
* (table_share_hash).count --> performance_schema 监控 当前表的数量
* (table_share_hash).size --> performance_schema 监控 表打开的最大的数量
*
* 2)如果(table_share_hash).size 达到最大限制了会有什么影响?如何解决?
* a)如果达到最大限制,即(table_share_hash).count = (table_share_hash).size , 之后新的表就不能被统计进来。
* b)原来已经打开的表,任然受监控,不影响。
*
* 如何解决呢?
* a)如果可以停mysql的话,可以调大performance_schema_max_table_instances 值。
* b)如果不能停mysql,可以适当的删除一些表如: drop table xx; 这样做可以减少(table_share_hash).count的数量。
* c)注意: truncate table performance_schema.table_io_waits_summary_by_table 只能reset status = 0 ,不会删除里面的记录,所以这个行不通。
*
* 3) performance_schema.table_io_waits_summary_by_table 里面的count_fetch 需要注意:
* a)count_fetch, 指的是数据库底层com_fetch的量,千万别理解成select。
* b)哪些操作会导致count_fetch 增长呢?
* 1)insert 操作不会造成 count_fetch ++ , 但是update 会造成 count_fetch ++。
* 2)其实,这也很好理解,因为update语句,会从数据库中fetch数据到server层以及内存,然后在修改。
* 3)select 操作会造成 count_fetch 增长。
* c)一次select,count_fetch 增长 1 么?
* 1)显然不是,count_fetch 指的是: 一次底层fetch调用,增长一次。
* 2)比如:有一张表100条记录。
* SQL语句为:select * from table;
* rows_affected: 100rows
* 那么count_fetch 至少是100,而不是1.
*
* 为什么说至少是100呢?
* a)有可能会从二级索引中fetch 数据。
* b)然后再从主键索引中fetch 数据。
* c)以上两点都是我的推断。。--待验证,解决。
*
* d)特别注意: 有的时候count_update 不增长,也没有select 操作,为啥count_fetch 会增长呢?
* 1)如果理解了count_fetch,你就不难懂了。
* 2)fetch 指的是从数据库文件fetch的fetch 调用。举例子: table_A
*
* id name
* 1 'lan'
* 2 'chun'
* 一条SQL语句如: update table_A set name = 'test' where id = 3; 你猜,会怎样?
* 结果是:
* count_fetch 增加,count_update 没有增加。因为:根本就没有id=3的记录与之匹配,也就是说udpate根本就没有更改任何记录,
* 所以count_update 不会增加,但是有fetch操作,所以count_fetch 由此增长。
*
*
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW io_global_by_table_detail_breakdown (
table_schema,
table_name,
rows_io_total,
rows_reads,
rows_writes,
rows_fetchs,
rows_inserts,
rows_updates,
rows_deletes,
select_latency,
insert_latency,
update_latency,
delete_latency
) AS
SELECT object_schema AS table_schema,
object_name AS table_name,
count_star AS rows_io_total,
count_read AS rows_read,
count_write AS rows_write,
count_fetch AS rows_fetchs,
count_insert AS rows_inserts,
count_update AS rows_updates,
count_delete AS rows_deletes,
v_monitor.format_time(sum_timer_fetch) AS fetch_latency,
v_monitor.format_time(sum_timer_insert) AS insert_latency,
v_monitor.format_time(sum_timer_update) AS update_latency,
v_monitor.format_time(sum_timer_delete) AS delete_latency
FROM performance_schema.table_io_waits_summary_by_table
ORDER BY sum_timer_wait DESC ;
/*
* Table 索引IO 相关
* View:schema_index_statistics
* 用于检测索引的使用情况
* 1)哪些索引没有利用? 2)哪些索引使用率高
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW schema_index_statistics (
table_schema,
table_name,
index_name,
rows_fetched,
select_latency,
rows_inserted,
insert_latency,
rows_updated,
update_latency,
rows_deleted,
delete_latency
) AS
SELECT OBJECT_SCHEMA AS table_schema,
OBJECT_NAME AS table_name,
INDEX_NAME as index_name,
COUNT_FETCH AS rows_fetched,
v_monitor.format_time(SUM_TIMER_FETCH) AS select_latency,
COUNT_INSERT AS rows_inserted,
v_monitor.format_time(SUM_TIMER_INSERT) AS insert_latency,
COUNT_UPDATE AS rows_updated,
v_monitor.format_time(SUM_TIMER_UPDATE) AS update_latency,
COUNT_DELETE AS rows_deleted,
v_monitor.format_time(SUM_TIMER_INSERT) AS delete_latency
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
ORDER BY sum_timer_wait DESC;
/*
* 性能调优相关: 全表扫描的schema
* View: schema_tables_with_full_table_scans
* 使用说明: 找到全表扫描的表,以扫描的行数降序排列
* 能解决什么问题:
* 1)能找到哪些表被全表扫描的多,从而针对性的对这张表优化。
* 2)优化内存使用率。将full_scann减少,更加能够提高内存利用率。能够让更多合理的数据进入内存,从而进一步的减少了slow
*xxx:v_monitor> select * from schema_tables_with_full_table_scans limit 10;
*+---------------+-----------------------------+-------------------+
*| object_schema | object_name | rows_full_scanned |
*+---------------+-----------------------------+-------------------+
*| ark_db | hp_pro_stats_hour_11 | 6961159 |
*| ark_db | hp_pro_stats_hour_oth | 5957428 |
*| ark_db | log_rankprop_update_auction | 3047492 |
*| ark_db | hp_broker_stats_hour_oth | 979462 |
*| ark_db | ajk_propspread | 935123 |
*| ark_db | hp_broker_stats_hour_11 | 815424 |
*| ark_db | hp_comm_stats_hour_oth | 744698 |
*| ark_db | hp_comm_stats_hour_11 | 574816 |
*| ark_db | hp_pro_stats_day_11 | 483393 |
*| ark_db | hp_pro_stats_day_oth | 404916 |
*+---------------+-----------------------------+-------------------+
*10 rows in set (0.04 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW schema_tables_with_full_table_scans (
object_schema,
object_name,
rows_full_scanned
) AS
SELECT object_schema,
object_name,
count_read AS rows_full_scanned
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NULL
AND count_read > 0
ORDER BY count_read DESC;
/*
* 性能调优相关: 找出未使用过的index和schema
* View: schema_unused_indexes
* 能解决什么问题:
* 1)找出哪些表的使用有问题。
* 2)找出哪些索引是从来没有被用过,进而指导DBA&&开发优化没有使用的索引,可以提高写的性能。
*xxx:v_monitor> select * from schema_unused_indexes limit 10;
*+---------------+----------------------+--------------------+
*| object_schema | object_name | index_name |
*+---------------+----------------------+--------------------+
*| ark_db | ajk_propspread | idx_areacode_price |
*| ark_db | ajk_propspread | serialnumber |
*| ark_db | ajk_propspread | city_status_type |
*| ark_db | ajk_propspread | updated_datetime |
*| ark_db | ajk_propspread | idx_commId_price |
*| ark_db | ajk_propspread | broker_id |
*| ark_db | ajk_propspread | createtime |
*| ark_db | ajk_propspread_queue | stoptime |
*| ark_db | ajk_propspread_redo | plan_id |
*| ark_db | ajk_propspread_redo | createtime |
*+---------------+----------------------+--------------------+
*10 rows in set (0.05 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW schema_unused_indexes (
object_schema,
object_name,
index_name
) AS
SELECT object_schema,
object_name,
index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
AND count_star = 0
AND object_schema not in ('mysql','v_monitor')
AND index_name <> 'PRIMARY'
ORDER BY object_schema, object_name;
/*
* 语句statment相关
* View: statement_analysis
* 列出top N 个SQL的详细情况,以latency降序排列 。 兼容Mysql Enterprise Monitor
* 能解决什么问题:
* 1)这个能解决的问题就太多了,为什么? 因为这个就是Mysql Enterprise Monitor中的一个重要功能。
* 2)exec
* 3)err && warnings
* 4)latency
* 5)lock latency
* 6)rows sent && rows examed
* 7)tmp table ** tmp disk table
* 8)rows sort
* 9)sort merge
*xxx:v_monitor> select * from statement_analysis \G
*************************** 1\. row ***************************
* query: INSERT INTO `hp_pro_stats_hour ... PDATE `disnum` = `disnum` + ?
* db: ark_db
* full_scan:
* exec_count: 97704732
* err_count: 0
* warn_count: 0
* total_latency: 3.34h
* max_latency: 2.90 s
* avg_latency: 123.00 us
* lock_latency: 1.57h
* rows_sent: 0
* rows_sent_avg: 0
* rows_examined: 0
*rows_examined_avg: 0
* tmp_tables: 0
* tmp_disk_tables: 0
* rows_sorted: 0
*sort_merge_passes: 0
* digest: 4878125158ccfb0239731d889bce8221
* first_seen: 2014-05-20 10:27:36
* last_seen: 2014-06-18 16:31:01
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW statement_analysis (
query,
db,
full_scan,
exec_count,
err_count,
warn_count,
total_latency,
max_latency,
avg_latency,
lock_latency,
rows_sent,
rows_sent_avg,
rows_examined,
rows_examined_avg,
tmp_tables,
tmp_disk_tables,
rows_sorted,
sort_merge_passes,
digest,
first_seen,
last_seen
) AS
SELECT v_monitor.format_statement(DIGEST_TEXT) AS query,
SCHEMA_NAME AS db,
IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
COUNT_STAR AS exec_count,
SUM_ERRORS AS err_count,
SUM_WARNINGS AS warn_count,
v_monitor.format_time(SUM_TIMER_WAIT) AS total_latency,
v_monitor.format_time(MAX_TIMER_WAIT) AS max_latency,
v_monitor.format_time(AVG_TIMER_WAIT) AS avg_latency,
v_monitor.format_time(SUM_LOCK_TIME) AS lock_latency,
SUM_ROWS_SENT AS rows_sent,
ROUND(IFNULL(SUM_ROWS_SENT / NULLIF(COUNT_STAR, 0), 0)) AS rows_sent_avg,
SUM_ROWS_EXAMINED AS rows_examined,
ROUND(IFNULL(SUM_ROWS_EXAMINED / NULLIF(COUNT_STAR, 0), 0)) AS rows_examined_avg,
SUM_CREATED_TMP_TABLES AS tmp_tables,
SUM_CREATED_TMP_DISK_TABLES AS tmp_disk_tables,
SUM_SORT_ROWS AS rows_sorted,
SUM_SORT_MERGE_PASSES AS sort_merge_passes,
DIGEST AS digest,
FIRST_SEEN AS first_seen,
LAST_SEEN as last_seen
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC;
/*
* 语句statment相关
* View: statements_with_errors_or_warnings
* 使用说明: 统计有error或者warnings的top N SQL
* 能解决的问题:
* 1) 找出有error 或者 warning 的 SQL
* 2)有warnin的语句,迟早都会变成error,而且还是隐形的error。-- tom kyte oracle 资深DBA
* 3)找出频繁warning或者error的SQL,有助于预防SQL注入或者更高级别的SQL攻击。
*************************** 5\. row ***************************
* query: SELECT * FROM `events_waits_su ... `sum_timer_wait` DESC LIMIT ?
* db: sys
*exec_count: 3
* errors: 3
* warnings: 0
*first_seen: 2014-06-16 16:41:02
* last_seen: 2014-06-16 17:07:39
* digest: 3e9e2ddc267ff4ef679a5a49855176ea
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW statements_with_errors_or_warnings (
query,
db,
exec_count,
errors,
warnings,
first_seen,
last_seen,
digest
) AS
SELECT v_monitor.format_statement(DIGEST_TEXT) AS query,
SCHEMA_NAME as db,
COUNT_STAR AS exec_count,
SUM_ERRORS AS errors,
SUM_WARNINGS AS warnings,
FIRST_SEEN as first_seen,
LAST_SEEN as last_seen,
DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME not in ('mysql','v_monitor') and (SUM_ERRORS > 0 OR SUM_WARNINGS > 0)
ORDER BY SUM_ERRORS DESC, SUM_WARNINGS DESC;
/*
* 语句statment相关
* View: statements_with_full_table_scans
* 统计哪些SQL是全表扫描,按照扫描latency排序
* 能解决什么问题:
* 1)前面有个view可以找出被全表扫描的表的行数,这个可以找出具体的SQL。
* 2)同样,这种SQL会浪费内存,IO,cpu,应该立刻,马上制止 (Three star system 理论)
*************************** 2\. row ***************************
* query: SELECT SQL_NO_CACHE * FROM `ark_db` . `hp_pro_click_fees_oth`
* db: NULL
* exec_count: 5
* total_latency: 00:06:08.67
* no_index_used_count: 5
*no_good_index_used_count: 0
* no_index_used_pct: 100
* rows_sent: 36203370
* rows_examined: 36203370
* rows_sent_avg: 7240674
* rows_examined_avg: 7240674
* first_seen: 2014-05-23 13:02:58
* last_seen: 2014-05-23 14:35:18
* digest: 8042a32250334bff7b5f17eff13ac205
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW statements_with_full_table_scans (
query,
db,
exec_count,
total_latency,
no_index_used_count,
no_good_index_used_count,
no_index_used_pct,
rows_sent,
rows_examined,
rows_sent_avg,
rows_examined_avg,
first_seen,
last_seen,
digest
) AS
SELECT v_monitor.format_statement(DIGEST_TEXT) AS query,
SCHEMA_NAME as db,
COUNT_STAR AS exec_count,
v_monitor.format_time(SUM_TIMER_WAIT) AS total_latency,
SUM_NO_INDEX_USED AS no_index_used_count,
SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
ROUND(IFNULL(SUM_NO_INDEX_USED / NULLIF(COUNT_STAR, 0), 0) * 100) AS no_index_used_pct,
SUM_ROWS_SENT AS rows_sent,
SUM_ROWS_EXAMINED AS rows_examined,
ROUND(SUM_ROWS_SENT/COUNT_STAR) AS rows_sent_avg,
ROUND(SUM_ROWS_EXAMINED/COUNT_STAR) AS rows_examined_avg,
FIRST_SEEN as first_seen,
LAST_SEEN as last_seen,
DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME not in ('mysql','v_monitor') and (SUM_NO_INDEX_USED > 0 OR SUM_NO_GOOD_INDEX_USED > 0)
ORDER BY no_index_used_pct DESC, SUM_TIMER_WAIT DESC;
/*
* 语句statment相关
* View: statements_with_sorting
* 统计需要排序的top N SQL
* 能解决什么问题:
* 1) 找出排序延迟对多的SQL
* 2) 这种SQL,会非常好内存和cpu ,应该避免。 (Three star system 理论)
*************************** 1\. row ***************************
* query: SELECT * FROM `innodb_buffer_s ... _SIZE` = ? ) , ? , `ibp` . ...
* db: ark_db
* exec_count: 1
* total_latency: 00:04:58.40
*sort_merge_passes: 0
*sorts_using_scans: 2
* sort_using_range: 0
* rows_sorted: 1685768
* first_seen: 2014-06-03 15:18:20
* last_seen: 2014-06-03 15:18:20
* digest: 8819ae6337417a82fa5ada32dd5b8de2
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW statements_with_sorting (
query,
db,
exec_count,
total_latency,
sort_merge_passes,
sorts_using_scans,
sort_using_range,
rows_sorted,
first_seen,
last_seen,
digest
) AS
SELECT v_monitor.format_statement(DIGEST_TEXT) AS query,
SCHEMA_NAME db,
COUNT_STAR AS exec_count,
v_monitor.format_time(SUM_TIMER_WAIT) AS total_latency,
SUM_SORT_MERGE_PASSES AS sort_merge_passes,
SUM_SORT_SCAN AS sorts_using_scans,
SUM_SORT_RANGE AS sort_using_range,
SUM_SORT_ROWS AS rows_sorted,
FIRST_SEEN as first_seen,
LAST_SEEN as last_seen,
DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_SORT_ROWS > 0 and SCHEMA_NAME not in ('mysql','v_monitor')
ORDER BY SUM_TIMER_WAIT DESC;
/*
* 语句statment相关
* View: statements_with_temp_tables
* 统计需要临时表(磁盘,内存)TOP N SQL
* 能解决什么问题:
* 1) 找出使用临时表最多的SQL
* 2) 临时表,我想不用多少,可以让性能跌倒谷底(特别是disk 临时表) ,应该避免 (Three star system 理论)
*************************** 1\. row ***************************
* query: SELECT * FROM `schema_object_o ... MA` , `information_schema` ...
* db: sys
* exec_count: 3
* total_latency: 8.91 s
* memory_tmp_tables: 567
* disk_tmp_tables: 99
*avg_tmp_tables_per_query: 189
* tmp_tables_to_disk_pct: 17
* first_seen: 2014-06-03 15:40:44
* last_seen: 2014-06-04 15:42:36
* digest: ccc857fb69a9f151a1b8cb8687697b1a
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW statements_with_temp_tables (
query,
db,
exec_count,
total_latency,
memory_tmp_tables,
disk_tmp_tables,
avg_tmp_tables_per_query,
tmp_tables_to_disk_pct,
first_seen,
last_seen,
digest
) AS
SELECT v_monitor.format_statement(DIGEST_TEXT) AS query,
SCHEMA_NAME as db,
COUNT_STAR AS exec_count,
v_monitor.format_time(SUM_TIMER_WAIT) as total_latency,
SUM_CREATED_TMP_TABLES AS memory_tmp_tables,
SUM_CREATED_TMP_DISK_TABLES AS disk_tmp_tables,
ROUND(IFNULL(SUM_CREATED_TMP_TABLES / NULLIF(COUNT_STAR, 0), 0)) AS avg_tmp_tables_per_query,
ROUND(IFNULL(SUM_CREATED_TMP_DISK_TABLES / NULLIF(SUM_CREATED_TMP_TABLES, 0), 0) * 100) AS tmp_tables_to_disk_pct,
FIRST_SEEN as first_seen,
LAST_SEEN as last_seen,
DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_CREATED_TMP_TABLES > 0 and SCHEMA_NAME not in ('mysql','v_monitor')
ORDER BY SUM_CREATED_TMP_DISK_TABLES DESC, SUM_CREATED_TMP_TABLES DESC;
/*
* host相关的统计
* View: host_summary_by_statement_type
* 统计每个statment'type 的latency,count,by each host
* 能解决什么问题:
* 1)首先,这个是按照host来分类的。我们的需求,很多都是针对业务IP来的,比如:二手房,API,DFS,JOB。
* 2)可以清楚的知道来自那个IP,哪类业务的statment的情况,从而可以很轻松的从业务角度来衡量使用好坏。我想,这应该是代价最低,最精确的做法。
*xxx:v_monitor> select * from host_summary_by_statement_type limit 10;
*+------------+-------------------+-------+---------------+-------------+--------------+------------+---------------+---------------+------------+
*| host | statement | total | total_latency | max_latency | lock_latency | rows_sent | rows_examined | rows_affected | full_scans |
*+------------+-------------------+-------+---------------+-------------+--------------+------------+---------------+---------------+------------+
*| localhost | select | 7890 | 00:59:57.61 | 00:02:41.89 | 4.97 s | 1215285385 | 1215285385 | 0 | 7820 |
*| localhost | show_table_status | 34 | 7.33 s | 3.46 s | 4.18 ms | 14191 | 14191 | 0 | 34 |
*| localhost | show_create_table | 7890 | 3.98 s | 14.00 ms | 0 ps | 0 | 0 | 0 | 0 |
*| localhost | flush | 13 | 103.95 ms | 17.02 ms | 0 ps | 0 | 0 | 0 | 0 |
*| localhost | Binlog Dump | 9 | 70.03 ms | 25.89 ms | 0 ps | 0 | 0 | 0 | 0 |
*| localhost | set_option | 351 | 16.92 ms | 267.39 us | 0 ps | 0 | 0 | 0 | 0 |
*| localhost | show_binlogs | 1 | 15.62 ms | 15.62 ms | 0 ps | 0 | 0 | 0 | 0 |
*| localhost | show_databases | 7 | 8.25 ms | 3.95 ms | 788.00 us | 42 | 42 | 0 | 7 |
*| localhost | begin | 91 | 7.25 ms | 682.23 us | 0 ps | 0 | 0 | 0 | 0 |
*| localhost | Quit | 92 | 2.46 ms | 113.67 us | 0 ps | 0 | 0 | 0 | 0 |
*+------------+-------------------+-------+---------------+-------------+--------------+------------+---------------+---------------+------------+
*10 rows in set (0.01 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW host_summary_by_statement_type (
host,
statement,
total,
total_latency,
max_latency,
lock_latency,
rows_sent,
rows_examined,
rows_affected,
full_scans
) AS
SELECT host,
SUBSTRING_INDEX(event_name, '/', -1) AS statement,
count_star AS total,
v_monitor.format_time(sum_timer_wait) AS total_latency,
v_monitor.format_time(max_timer_wait) AS max_latency,
v_monitor.format_time(sum_lock_time) AS lock_latency,
sum_rows_sent AS rows_sent,
sum_rows_examined AS rows_examined,
sum_rows_affected AS rows_affected,
sum_no_index_used + sum_no_good_index_used AS full_scans
FROM performance_schema.events_statements_summary_by_host_by_event_name
WHERE host IS NOT NULL
AND sum_timer_wait != 0
ORDER BY host, sum_timer_wait DESC;
/*
* global 相关的statment级别的统计,包括延迟,exec count,rows scann
* View: global_summary_by_statement_type
* 统计每个statment'type 的latency,count,by total
* 可以解决什么问题:
* 1) 上面说了按照业务分,那么很明显这个就是总体的一个情况。
* 2) 可以分清楚的知道这台机器IOPS,TPS,甚至可以知道每年,每月这台DB的DDL变更的数量。
* 3) 最主要的还是延迟,如果show_status , show variables 的延迟很大,那么就要想办法优化。 为什么show status那么多,延迟那么高?
*xxx:v_monitor> select * from global_summary_by_statement_type limit 10;
*+----------------+-----------+---------------+-------------+--------------+------------+---------------+---------------+------------+
*| statement | total | total_latency | max_latency | lock_latency | rows_sent | rows_examined | rows_affected | full_scans |
*+----------------+-----------+---------------+-------------+--------------+------------+---------------+---------------+------------+
*| insert | 891988701 | 36.21h | 6.30 s | 15.59h | 0 | 0 | 2008411261 | 0 |
*| update | 79182572 | 3.35h | 2.62 s | 1.25h | 0 | 80241508 | 79084166 | 0 |
*| begin | 971583219 | 2.50h | 37.90 ms | 0 ps | 0 | 0 | 0 | 0 |
*| select | 177367 | 1.40h | 00:05:28.72 | 22.07 s | 1215746513 | 1262891559 | 0 | 50531 |
*| delete | 320751 | 00:09:52.35 | 442.94 ms | 28.86 s | 0 | 18367238 | 18365896 | 0 |
*| show_status | 463635 | 00:09:29.23 | 36.06 ms | 48.18 s | 158098899 | 158098899 | 0 | 463635 |
*| truncate | 82 | 00:01:24.88 | 15.45 s | 4.17 s | 0 | 0 | 0 | 0 |
*| drop_db | 2 | 00:01:12.38 | 45.16 s | 00:01:12.36 | 0 | 0 | 3100 | 0 |
*| show_variables | 84340 | 56.17 s | 43.63 ms | 7.35 s | 94226 | 94226 | 0 | 84340 |
*| create_table | 3787 | 26.76 s | 108.38 ms | 0 ps | 0 | 0 | 0 | 0 |
*+----------------+-----------+---------------+-------------+--------------+------------+---------------+---------------+------------+
*10 rows in set (0.00 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW global_summary_by_statement_type (
statement,
total,
total_latency,
max_latency,
lock_latency,
rows_sent,
rows_examined,
rows_affected,
full_scans
) AS
SELECT
SUBSTRING_INDEX(event_name, '/', -1) AS statement,
count_star AS total,
v_monitor.format_time(sum_timer_wait) AS total_latency,
v_monitor.format_time(max_timer_wait) AS max_latency,
v_monitor.format_time(sum_lock_time) AS lock_latency,
sum_rows_sent AS rows_sent,
sum_rows_examined AS rows_examined,
sum_rows_affected AS rows_affected,
sum_no_index_used + sum_no_good_index_used AS full_scans
FROM performance_schema.events_statements_summary_global_by_event_name
WHERE
sum_timer_wait != 0
ORDER BY sum_timer_wait DESC;
/*
* 等待相关的监控 classes
* View: wait_classes_global_by_latency
* 列出Top N 个 class等待event by latency
* 能解决什么问题:
* 1) 站在更高的角度来衡量,主要是table的io,文件的io,表的lock的wait事件延迟。
*xxx:v_monitor> select * from wait_classes_global_by_latency limit 10;
*+-----------------+------------+---------------+-------------+-------------+-------------+
*| event_class | total | total_latency | min_latency | avg_latency | max_latency |
*+-----------------+------------+---------------+-------------+-------------+-------------+
*| wait/io/table | 2217307512 | 11.45h | 72.29 ns | 18.59 us | 2.91 s |
*| wait/io/file | 4670586905 | 10.94h | 0 ps | 8.43 us | 746.08 ms |
*| wait/lock/table | 1402423408 | 00:14:28.26 | 117.97 ns | 619.11 ns | 12.23 ms |
*+-----------------+------------+---------------+-------------+-------------+-------------+
*3 rows in set (0.07 sec)
*/
CREATE OR REPLACE
ALGORITHM = TEMPTABLE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW wait_classes_global_by_latency (
event_class,
total,
total_latency,
min_latency,
avg_latency,
max_latency
) AS
SELECT SUBSTRING_INDEX(event_name,'/', 3) AS event_class,
SUM(COUNT_STAR) AS total,
v_monitor.format_time(SUM(sum_timer_wait)) AS total_latency,
v_monitor.format_time(MIN(min_timer_wait)) min_latency,
v_monitor.format_time(IFNULL(SUM(sum_timer_wait) / NULLIF(SUM(COUNT_STAR), 0), 0)) AS avg_latency,
v_monitor.format_time(MAX(max_timer_wait)) AS max_latency
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE sum_timer_wait > 0
AND event_name != 'idle'
GROUP BY SUBSTRING_INDEX(event_name,'/', 3)
ORDER BY SUM(sum_timer_wait) DESC;
/*
* 等待相关的监控 each host
* View: waits_by_host_by_latency
* 列出Top N 个 class等待event by latency by each host
* 能解决什么问题:
* 1) 按照业务ip进行统计,统计每个ip对应的event使用情况和压力情况
* 2) 如果wait/synch/mutex/innodb/buf_pool_mutex 比较高,那么需要增加innodb_buffer_pool_instances
* 3) 如果wait/synch/mutex/sql/Query_cache::structure_guard_mutex 比较高,那么需要disable query cache
* 4) 如果wait/synch/mutex/myisam/MYISAM_SHARE::intern_lock 比较高,那么请使用Myisam
* 5) 如果wait/io/file/sql/FRM 比较高,调整table_open_cache / table_definition_cache
* 6) 如果wait/io/file/sql/query_log and wait/io/file/sql/slow_log ,调整相应的general log和slow log
* 7) 如果 xxx , 那就 YYY , 等等。 对DB底层的知识越了解,那么可以监控的点就更多,这里应有竟有。
* xxx:v_monitor> select * from waits_by_host_by_latency limit 10;
* +------------+--------------------------------------+------------+---------------+
* | host | event | total | total_latency |
* +------------+--------------------------------------+------------+---------------+
* | localhost | wait/io/table/sql/handler | 1215188725 | 00:30:23.81 |
* | localhost | wait/io/file/innodb/innodb_data_file | 761405 | 00:11:22.06 |
* | localhost | wait/io/file/sql/FRM | 385645 | 2.42 s |
* | localhost | wait/io/file/csv/metadata | 132 | 633.20 ms |
* | localhost | wait/io/file/sql/query_log | 16567 | 199.15 ms |
* | localhost | wait/io/file/myisam/kfile | 906 | 67.62 ms |
* | localhost | wait/io/file/sql/slow_log | 215 | 62.09 ms |
* | localhost | wait/io/file/myisam/dfile | 725 | 41.31 ms |
* | localhost | wait/io/file/sql/binlog | 57 | 36.96 ms |
* | localhost | wait/lock/table/sql/handler | 15520 | 6.93 ms |
* +------------+--------------------------------------+------------+---------------+
* 10 rows in set (0.01 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW waits_by_host_by_latency (
host,
event,
total,
total_latency
) AS
SELECT host,
event_name AS event,
count_star AS total,
v_monitor.format_time(sum_timer_wait) AS total_latency
FROM performance_schema.events_waits_summary_by_host_by_event_name
WHERE event_name != 'idle'
AND host IS NOT NULL
AND sum_timer_wait > 0
ORDER BY host, sum_timer_wait DESC;
/*
* 等待相关的监控 global
* View: waits_global_by_latency
* 列出top n IO相关等待event by latency
* 能解决什么问题: 请参考waits_by_host_by_latency
*xxx:v_monitor> select * from waits_global_by_latency limit 10;
*+--------------------------------------+------------+---------------+-------------+-------------+
*| events | total | total_latency | avg_latency | max_latency |
*+--------------------------------------+------------+---------------+-------------+-------------+
*| wait/io/table/sql/handler | 2217608663 | 11.45h | 18.59 us | 2.91 s |
*| wait/io/file/sql/relaylog | 4419075002 | 4.69h | 3.82 us | 554.50 ms |
*| wait/io/file/innodb/innodb_data_file | 121477770 | 3.30h | 97.70 us | 746.08 ms |
*| wait/io/file/innodb/innodb_log_file | 125421910 | 2.96h | 84.99 us | 427.05 ms |
*| wait/lock/table/sql/handler | 1402672902 | 00:14:28.41 | 618.97 ns | 12.23 ms |
*| wait/io/file/sql/FRM | 565536 | 14.44 s | 25.54 us | 427.89 ms |
*| wait/io/file/myisam/dfile | 5077014 | 5.72 s | 1.13 us | 195.42 ms |
*| wait/io/file/sql/binlog_index | 38950 | 1.89 s | 48.49 us | 288.94 ms |
*| wait/io/file/sql/binlog | 78 | 111.03 ms | 1.42 ms | 110.20 ms |
*| wait/io/file/myisam/kfile | 4910 | 65.62 ms | 13.36 us | 11.39 ms |
*+--------------------------------------+------------+---------------+-------------+-------------+
*10 rows in set (0.09 sec)
*/
CREATE OR REPLACE
ALGORITHM = MERGE
DEFINER = 'xxx'@'%'
SQL SECURITY INVOKER
VIEW waits_global_by_latency (
events,
total,
total_latency,
avg_latency,
max_latency
) AS
SELECT event_name AS event,
count_star AS total,
v_monitor.format_time(sum_timer_wait) AS total_latency,
v_monitor.format_time(avg_timer_wait) AS avg_latency,
v_monitor.format_time(max_timer_wait) AS max_latency
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE event_name != 'idle'
AND sum_timer_wait > 0
ORDER BY sum_timer_wait DESC;
参考资料
- https://dev.mysql.com/doc/refman/5.6/en/performance-schema.html
- http://www.markleith.co.uk/2011/04/18/monitoring-table-and-index-io-with-performance_schema/
- 转自: https://yq.aliyun.com/articles/59262