记录一次MySQL故障处理

因为开发之初数据库没有进行好的设计,有很多表的查询字段没有创建索引,系统运行几年以后,表的记录数达到了几千万,这时性能问题突显出来。

诱因是在系统繁忙的时间,突发来了很多无索引的查询,导致一个关键的表被锁。这时候系统无法对外提供服务。

采取措施:
在mysql中kill所有的慢查询语句。但还是不停的有查询进来,停止应用,iptables封掉数据库端口,再kill。

这时出现了奇怪的状态,数据库自动重启了,查看数据库错误日志:
引用

mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crash
Number of processes running now: 0
100510 15:58:41  mysqld restarted
100510 15:58:41 [Warning] Changed limits: max_open_files: 65535  max_connections: 16384  table_cache: 24570
100510 15:58:41 [ERROR] Can't start server : Bind on unix socket: No space left on device
100510 15:58:41 [ERROR] Do you already have another mysqld server running on socket: /var/lib/mysql/mysql.sock ?
100510 15:58:41 [ERROR] Aborting

100510 15:58:41 [Note] /usr/sbin/mysqld: Shutdown complete

自动重启并没有成功启动,原因是磁盘空间不足。这时删除冗余数据,重新启动mysql成功。这时窃以为数据库正常了,没想到又遇到一个问题show processlist时,有很多查询的状态为statistics。

查找文档,解决方法为:
引用

CHECK TABLE tokens;
REPAIR TABLE tokens; -- if the CHECK was not 'ok'
ANALYZE TABLE tokens; -- maybe its stats need refreshing


这时数据库工作正常了,但此数据库的slave都不工作了,下篇文章写出slave解决过程。

你可能感兴趣的:(mysql,工作,unix,socket,UP)